Technical Concepts > About Scaling and Tuning

About Scaling and Tuning

Understanding PTV xServer scaling behaviour and getting tips for tuning is relevant for you if one or more of the following points is true:

you operate or are planning to operate a cluster of more than one PTV xServer machines
your perceived performance is not satisfactory or you require a very good user experience
you have (hard or soft) real time requirements
your system is often busy, or you expect a high load

Basic Terminology

Response Time

The response time is the duration of a PTV xServer transaction, in the case of a simple transaction this is the time between request sending and response arrival.

Minimal response times are very important for a good user experience, especially for interactive uses.

Because of proxies the server might not be able to observe and log the client's response time but a shorter period: The server can only observe the communication start and end with the "nearest" proxy. Due to latency, the first parts of the requests and the last parts of the response may still be en route from or to the client.

Throughput

The throughput is a metric for the maximum load capacity of your server. It is measured as the number of completed transactions over time.

Maximal throughput is very important for mass use scenarios, especially batch processing.

Load

The system load is a measure for the work the system has to perform, usually in the form of CPU activity. The service load is a more abstract measure and corresponds to the transaction arrival rate.

A higher service load will cause an increase of system load. If the system load reaches 100%, response times will start to degrade but throughput starts to peak. Transactions have to wait and may have to be rejected: the system is then overloaded.

Benefits

Scaling your PTV xServer cluster properly and optimising it will:

increase user satisfaction by improving the level of service
minimise hardware requirements by utilising the hardware to the fullest

Prerequisites

First, familiarise yourself with the PTV xServer system architecture.

In order to achieve an optimally tuned system you will have to do your own measurements. PTV's reference benchmarks only give an indication of performance - numbers are valid for a given configuration and a given test set only. Your configuration and your requests will not be identical, and neither will be your performance numbers.

Optimization Goals

There is not one obvious optimization goal, but two: minimise response times, and maximise throughput. To minimise response times it is necessary to maximise hardware availability. This may mean that the system is often idle. To maximise throughput it is necessary to maximise hardware utilisation. This may mean that transactions sometimes have to wait.

Throughput and response times could only be optimal at the same time if requests arrive precisely when a response has been sent: no requests have to wait, and all workers can be active at all times. This is not only improbable but because of the inherent latency times involved it is also impossible.

It is however possible to strive for minimal response times until the system load becomes so high that available computing resources no longer allow to maintain this optimum. Then ones strive for maximum throughput to avoid loss of service: accepting a performance degradation is usually better than rejecting service requests. This optimization goal does not conflict with the mass use scenario where response times are secondary concerns: smaller response times will also increase throughput, and during periods of low load neither response times nor throughput are compromised.

The Role of the Operating System

The choice of the operating system has a certain influence on performance. Regardless of the choice of operating system, there are a couple of mechanisms that are especially important for PTV xServer performance.

Under Windows Server operating systems, make sure energy saving mode is set to “high performance”, not the factory default.

For the choice of operating system, performance will differ between versions, and also between platforms. The difference between platforms stems mostly from compilers. The advantages shift with compiler versions and options, and transaction types as well. Linux tends to have more robust file systems which rarely, if ever, block.

File Caching

Modern operating systems do not let "free" main memory left unused but use it to cache files. Access to cached files is massively faster than disk accesses and only slightly slower than direct memory access. These caches are only observable with specialised tools.

Since PTV xServer access map data using memory mapped files, file caches are fairly important and having sufficient free RAM is beneficial. The exact effects vary. As the operating system will attempt to keep the most frequently and most recently used files cached, the performance gains will usually diminish fast as the "hot zones" of the map files are kept in RAM anyway.

PTV xServer can run just fine with one GiB of "free" RAM but depending on your request mix it can profit from more, up to roughly the total size of all map files, as long as all parts of the map are regularly accessed.

Hyper-Threading

Modern operating systems assign processes to physical cores first, then use the so-called virtual cores. Thus, hyper-threading will not slow down your system when there is little load.

As a consequence, all cores can and should be used. Active virtual cores will not decrease response times while they do not have to be used but increase throughput when they are needed.

Background Processes

Operating systems assign CPUs to processes as they need execution. Letting processes remain idle for long periods does not impact the busy ones in any significant way as long as they can be kept in memory - otherwise I/O is required before the process can become busy.

As a consequence, given sufficient RAM you can start many more processes than you have CPU cores to execute without a noticeably loss of performance. Some deployment strategies can benefit from such a scenario.

Deployment Guide

Before you plan your deployment you need to understand the requirements: what service APIs do you need, what capacity, responsiveness, availability.

From these you can derive the overall amount of worker processes. After deciding the type of server hardware you can then plan the per-server deployment to find out how many hardware units you need in your server cluster.

Determine Required Overall Throughput

The first step is finding the required overall capacity for your system, in terms of overall throughput.

For batch processing scenarios, the required throughput is usually already given (e.g. 100 tours to plan in one hour).

For interactive uses, the number of users and number of concurrently active users can be estimated, the number of triggered transactions per user can be roughly estimated (the Poisson distribution may be a helpful model) but is best determined from logs of field tests or actual uses. Make sure you chose a suitably short time interval when measuring user transactions - requests per hour is not a suitable scale, requests per minute or second are better indicators for the capacity required to provide not only throughput but good response times as well. Also, you should plan with Winsorised results or an upper quantile (e.g. 90%) plus a buffer depending on the quality of service you want to provide at peak use times as well as the confidence you have in your estimations.

The required overall throughput has to be determined per service type. Distribution over all request types is also a useful information when deciding how much overlap is tolerable when mixing services (more on that later).

If several different usage scenarios are combined, you can either use dedicated clusters or setup one cluster that can handle all of the load. Dedicated clusters protect other clusters from overload situations, e.g. long running requests blocking all the workers, leaving no one for the short requests. A shared cluster on the other hand can better balance out peak load.

As an initial guess for the number of workers required you can measure or extrapolate the throughput of one worker process from the test run results. While the real throughput numbers depend on the final hardware as well as the final deployment, the initial guess helps to select the type of servers required.

Dimensioning Servers

How much throughput a single server has, how many servers you need, and how to ideally set it up depends on your server type. By now you have an idea about the number of processes you must run, and what type of server you are looking for.

If you have a lot of processing to do, using fewer but larger servers is usually more cost effective:

pro: you can make better use of your RAM
pro: you can make use of synergy effects (cache usage, hyper-threading) when mixing services
pro: you save energy
pro: you need less hosting space
pro: you save efforts in administration
con: redundancy can be more costly (e.g. for "four nines" availability you may need 2 backup servers if you require 3 for operations - 40% of your servers are redundant, while you would only need 3 backup servers if you require 6 for operations - 33% of your servers are redundant)
con: if you need only a little bit more capacity you either have to begin mixing server types or increase capacity only by a larger amount, both of which can increase costs

Once you have selected a cost efficient server type, you know the number of CPU cores per server.

Sizing Server Worker Pools

To obtain the number of servers required you need to determine the optimum number of worker processes per server.

Server Cluster Setup

Now that the number of workers required per service is determined, server hardware has been chosen, the maximum effective pool size has been measured and the deployment of workers in the cluster is defined, the actual number of server machines for the cluster is available.

Programming Guide

There are some dedicated programming tasks involved in scaling and tuning PTV xServer. First, you need to measure the performance to validate your tuning efforts. Second, you need to make sure that clients use the carefully planned cluster properly.

Proper Benchmarking

Doing your own benchmarks is a central point of your optimization efforts. Doing correct measurements is not complicated, but measuring the correct things can be. You should avoid the following pitfalls.

Do not run your benchmark clients on the server you measure.
Do not measure the warm-up: run your test suite more than once and ignore results from the first run.
Measure a scenario that actually utilises all of your CPUs. Systems behave differently when stressed.
To measure response times, your queue must be empty: number of concurrent transactions = pool size
To measure throughput, your queue must not be empty: pool size « number of concurrent transactions = queue size
Do not measure your network bandwidth by accident.
Do not measure your proxy settings by accident.
Do not measure your local hard drive performance by accident. Make sure you measure with the correct logging settings.

If you avoid the pitfalls above, obtaining correct results is not too hard.

Client Optimization Tips

Your client code can also impact performance. The following sections give you some important tips.