Server Concurrency and Throughput: Calculation and Factors

Diagram illustrating factors influencing server concurrency

Server concurrency and throughput are critical metrics for evaluating the performance of hosting or colocation solutions. Understanding how these metrics are calculated and the factors influencing them is essential for optimizing server performance. In this article, we’ll explore the nitty-gritty details of server concurrency and throughput.

Calculating Server Concurrency and Throughput

Concurrency refers to the number of simultaneous requests a server can handle at a given time. It’s calculated by counting the number of active connections or threads processing requests. On the other hand, throughput measures the number of requests processed per unit of time, typically expressed as requests per second (RPS).

Here’s a simple example of calculating concurrency and throughput using a load testing tool like Apache JMeter:


// JMeter test plan
Thread Group:
  - Number of Threads (users): 100
  - Ramp-up Period: 10 seconds
  - Loop Count: 10

// Results:
Samples: 1000
Average: 150 ms
Throughput: 100 requests/sec
Concurrency: 15

In this example, JMeter simulates 100 concurrent users sending a total of 1000 requests. The average response time is 150 milliseconds, resulting in a throughput of 100 requests per second and an average concurrency of 15.

It’s important to note that throughput represents a system’s capacity to handle user requests, indicating its ability to withstand pressure and load. The throughput of a system is typically determined by QPS (TPS) and concurrency. Each system has a relative limit for these two values, and once either reaches its maximum, the system’s throughput cannot increase further.

Understanding QPS, TPS, and RT

QPS (Queries Per Second) represents the number of queries that can be responded to per second. It’s important to note that a query refers to the number of times a user sends a request to the server and receives a successful response.

TPS (Transactions Per Second) is similar to QPS but focuses on the number of transactions processed per second. A transaction refers to the process of a client sending a request to the server and the server responding. When considering a single interface, TPS can be considered equivalent to QPS.

RT (Response Time) represents the time interval between the system’s input and output. It broadly represents the time difference between the client initiating a request and the server receiving the request and responding with all the data. The average response time is typically used.

Here’s an example to illustrate QPS calculation:

Let’s assume you have a large-scale distributed system with 100 services, each deployed on 20 machines with a standard configuration of 4 cores and 8GB memory. This means you have a total of 100 * 20 = 2000 service instances deployed on 2000 machines.

Each service instance has a Eureka Client component that requests the Eureka Server every 30 seconds to fetch the updated registration table. Additionally, each Eureka Client sends a heartbeat request to the Eureka Server every 30 seconds.

Calculate how many requests the Eureka Server, as a microservice registration center, receives per second and per day:

According to the standard algorithm, each service instance requests the registration table twice per minute and sends a heartbeat twice per minute.
Thus, a single service instance makes 4 requests per minute, and 2000 service instances make 8000 requests per minute.
Converting to requests per second, we have 8000 / 60 ≈ 133. We can roughly estimate that the Eureka Server receives 150 requests per second.
For a day, the calculation is 8000 * 60 * 24 = 11.52 million, indicating a daily request volume in the tens of millions.

Based on previous tests, a single 4-core 8GB machine can easily handle a few hundred requests per second for pure in-memory operations, even with some network overhead.

PV (Page View) and Machine Requirements

PV (Page View) is a commonly used metric to measure the traffic of a website or a single web page. It represents the number of times a web page is viewed.

The principle is that 80% of the daily visits are concentrated in 20% of the time, known as peak hours.

Formula: (Total PV * 80%) / (Seconds per day * 20%) = Requests per second (QPS) during peak hours.

Machines: QPS during peak hours / QPS per machine = Number of machines required.

For example, if a single machine handles 300,000 PV per day, how much QPS does this machine need?

(3,000,000 * 0.8) / (86,400 * 0.2) = 139 (QPS)

Generally, a peak value of 139 QPS is required. (2 million PV corresponds to 100 peak QPS)

Factors Influencing Server Concurrency

Several key factors impact a server’s concurrency capabilities:

1. CPU Processing Power

The CPU’s core count, frequency, and parallel processing capabilities directly influence concurrency. More cores and higher frequencies allow the server to handle more concurrent requests efficiently.

2. Memory Size and Speed

Adequate memory is crucial for handling concurrent connections. Insufficient memory can lead to swapping and performance degradation. Fast memory speeds also contribute to better concurrency.

3. Network Bandwidth and Latency

High network bandwidth and low latency enable the server to receive and respond to concurrent requests quickly. Network bottlenecks can limit concurrency. It’s important to understand that bandwidth has an upper limit determined by your network bandwidth size. However, you might still fall into the trap of exceeding this limit if you lack a clear understanding of the size and frequency of the data being transmitted.

For example, let’s say your network is built on a 100M bandwidth standard. If there are 10,240 data transmissions per second, with an average packet size of 10KB, will you encounter a bottleneck?

The theoretical transmission rate of a 100M bandwidth is 12.5MB/s. In this case, each second requires transmitting: 10,240 times/s * 10KB = 100MB/s. Clearly, the bandwidth is far from sufficient.

Regarding bandwidth limits, keep these points in mind:

The actual transmission rate is determined by the minimum value among the bandwidth provided by the service provider, network card, switch, and router.
Transmission rates differ between internal and external networks, with internal networks typically being faster due to the higher cost of bandwidth provided by service providers.
Nodes within the same LAN share the external network access, so minimizing external network data transmission helps reduce the occupation of the “single-plank bridge” external network.

4. Disk I/O Performance

Disk read/write speeds and I/O operations per second (IOPS) impact the server’s ability to handle concurrent requests that involve disk access.

5. Software Optimizations

Optimizations at the operating system and application level can significantly enhance concurrency:

Multithreading and multiprocessing designs
Asynchronous and non-blocking I/O models
Connection pooling and thread pooling techniques

Balancing Server Concurrency

While higher concurrency is generally desirable, it’s essential to strike a balance based on the specific requirements of your application. Extremely high concurrency can introduce challenges such as:

System stability risks
Data consistency issues
Increased resource consumption

It’s crucial to assess the actual concurrency needs of your application and allocate resources accordingly. Blindly pursuing higher concurrency numbers without considering the broader context can lead to suboptimal results.

Conclusion

Server concurrency and throughput are vital metrics for gauging server performance in hosting or colocation scenarios. By understanding the calculation methods and key influencing factors like CPU, memory, network, disk I/O, and software optimizations, you can make informed decisions to optimize your server’s concurrency capabilities. Remember, the goal is to find the right balance that meets your application’s specific requirements while ensuring stability and efficient resource utilization.