Scalability

With a given amount of resource, as QPS increases, a system will eventually hit the maximum QPS it can support.

We can also look at how system’s output rate (response/sec) is related to its input rate (request/sec).

A system is scalable if by adding more resources to it, we can move the max throughput level higher.

We can scale up or scale out, the later is preferable.

Metrics

$\text{Throughput} = \frac{ \text{Total Successful Requests} }{ \text{Total Time} }$
$\text{Latency} = \frac{ \text{Total Response Time} }{ \text{Total Requests} }$
$\text{Utilization} = \frac{ \text{Used Resources} }{ \text{Allocated Resources} }$
$\text{Scaling Efficiency} = \frac{ \text{Performance Gain} }{ \text{Added Resources} }$

$\text{ Tenant Density } = \frac{ \text{Active Tenant Count} }{ \text{Resource Threshold} }$ . Beyond resource threshold (say 1000 CPUs), the system’s performance degrades. If our estimate of Resource Threshold is correct, it is an indirect ratio between active tenant count and max supportable tenant count. Here is how. Say, the system can support $X$ active tenants at the max. Then, $\text{Tenant Density} = \frac{X}{ \text{Resource Threshold} } = 1$ . So, $X = \text{ Resource Threshold }$ .

$\text{Per Region Latency} = \frac{ \sum_{i} \left( {\text{End To End Latency}}_i + {\text{Replication Latency}}_i + {\text{Network Throughput Impact}}_i \right) }{ \text{Region Count} }$ . We can measure $\text{Throughput Impact} = \frac{ \text{Latency With Data Transfer} }{ \text{ Latency Without Data Transfer } } - 1$ .