레이블이 thread인 게시물을 표시합니다. 모든 게시물 표시
레이블이 thread인 게시물을 표시합니다. 모든 게시물 표시

2016년 5월 12일 목요일

Thread Pool, Performance of Large scale Web Applications: Bottlenecks, Database, CPU, IO

Ref: http://highscalability.com/blog/2014/5/12/4-architecture-issues-when-scaling-web-applications-bottlene.html


performance of web application is used to mean several things. Most developers are primarily concerned with are response time and scalability.

Two ways of adding more hardware are
  • Scaling Up (vertical scaling) :– increasing the number CPUs or adding faster CPUs on a single box.
  • Scaling Out (horizontal scaling) :– increasing the number of boxes.

Scaling out is considered more important as commodity hardware is cheaper compared to cost of special configuration hardware (super computer). But increasing the number of requests that an application can handle on a single commodity hardware box is also important. An application is said to be performing well if it can handle more requests with-out degrading response time by just adding more resources.



Ref: http://venkateshcm.com/2014/05/How-To-Determine-Web-Applications-Thread-Poll-Size/


Thread Pool: In web applications thread pool size determines the number of concurrent requests that can be handled at any given time. If a web application gets more requests than thread pool size, excess requests are either queued or rejected.

Please note concurrent is not same as parallel. Concurrent requests are number of requests being processed while only few of them could be running on CPUs at any point of time. Parallel requests are number of requests being processed while all of them are running on CPUs at any point of time.

동시 요청이란 한 순간에 일부가 연산 수행되는 반면, 병렬 요청이랑 동시에 연산 수행되는 것을 의미함

In Non-blocking IO applications such as NodeJS, a single thread (process) can handles multiple requests concurrently. In multi-core CPUs boxes, parallel requests can be handled by increasing number of threads or processes.

In blocking IO applications such as Java SpringMVC, a single thread can handle only one request concurrently. To handle more than one request concurrently we have to increase the number of threads.


CPU Bound Applications
In CPU bound applications thread Pool size should be equal number of cpus on the box. Adding more threads would interrupt request processing due to thread context switching and also increases response time.

Non-blocking IO applications will be CPU bound as there are no thread wait time while requests get processed.


Little’s law applied to web applications

The average number of threads in a system (Threads = Users) is equal average web request arrival rate (WebRequests per sec = TPS), multiplied by the average response time (ResponseTime).

무조건 thread pool size를 크게 잡으면, context swiching이 너무 많이 일어나서 성능 저하가 발생한다. 

Execution Times vs Response Times

Reference: https://www.rapitasystems.com/blog/difference_between_execution_times_and_response_times

task: a piece of code that is to be run within a single thread of execution. A task issues a sequence of jobs to the processor which are queued and executed.

execution time: the time spent by the job actively using processor resources. The execution time of each job instance from the same task is likely to differ, due to path data dependencies (the path taken through the code depends on input parameters) and hard-to-predict hardware features such as branch prediction, instruction pipelining and caches.

response time: the time between when it becomes active (e.g. an external event or timer triggers an interrupt) and the time it completes.


priority inversion: A lower priority job can also prevent a job from running if it locks a shared resource before the higher priority job does.


High level system requirements will specify maximum response times for a task, known as a deadline. WCRTs are calculated using response time analysis which takes WCETs and a scheduling policy as inputs. This may lead to execution time budgets and a scheduling policy being derived as lower level requirements. This is usually in the context of worst-case execution times(WCETs) and worst-case response times (WCRTs).