Threshold-only scaling can be misleading
CPU spikes, low-level scheduler noise, or short bursts do not always mean a service should scale. Scaling the wrong component wastes replicas and still leaves the user-facing latency problem unresolved.
ThriveScale is built to move beyond threshold-only autoscaling. It combines service-level QoS, runtime dependency awareness, and low-level kernel evidence to determine whether a microservice should scale, or whether the delay is really coming from a downstream dependency.
ThriveScale focuses on microservice autoscaling when latency is user-visible, but the root cause is not obvious from coarse infrastructure signals alone.
CPU spikes, low-level scheduler noise, or short bursts do not always mean a service should scale. Scaling the wrong component wastes replicas and still leaves the user-facing latency problem unresolved.
A service may violate its SLO because it is waiting on another service, a datastore, or an external dependency. In that case, scaling the root service alone can make the system noisier rather than healthier.
If a framework scales automatically, operators need to know why. ThriveScale keeps decision traces, topology context, and bottleneck hints visible so the scaling outcome can be reviewed rather than guessed.
The current implementation scope is organized around the three gaps identified in the thesis and reflected in the live system.
ThriveScale collects kernel-side runtime evidence and service-level truth without requiring a service mesh as a mandatory part of the core decision path.
The framework combines QoS pressure, throughput context, dependency structure, service handling delay, dependency delay, and run queue evidence to decide whether scaling is locally useful.
Instead of requiring ML, DL, or RL model training at runtime, ThriveScale uses explicit deterministic decision logic, cooldown control, and stored decision traces that operators can inspect directly.
The live dashboard is the operator surface for monitoring, explanation, control, and support workflow in the current implementation.
View P90 latency, SLO target, throughput, service handling latency, dependency delay, external wait, run queue behavior, health state, and bottleneck hints for each service.
Inspect the live dependency map to understand how traffic flows and which components are likely contributing to downstream latency.
Start or stop controlled traffic, update SLO settings, scale a deployment to minimum, or set replicas manually when validation or intervention is needed.
Inspect alerts, decision traces, audit events, and support tickets so the autoscaler remains understandable, reviewable, and operationally usable.
The dashboard is not only a visual layer. It is how ThriveScale exposes the reasoning behind scaling decisions. Operators can see when latency is local, when a dependency is dominating, and when evidence is too weak for safe action.
Open the live dashboard to view metrics, dependency relationships, alerts, decision traces, manual controls, and the built-in Support Desk.
If you want to install, validate, or tune ThriveScale for your own cluster, start with the built-in support workflow and the deployment guidance already provided with the project.