docs/architecture.md
# Architecture
## Overview
![https://bit.ly/2iJuFky](images/puma-general-arch.png)
Puma is a threaded Ruby HTTP application server processing requests across a TCP
and/or UNIX socket.
Puma processes (there can be one or many) accept connections from the socket via
a thread (in the [`Reactor`](../lib/puma/reactor.rb) class). The connection,
once fully buffered and read, moves into the `todo` list, where an available
thread will pick it up (in the [`ThreadPool`](../lib/puma/thread_pool.rb)
class).
Puma works in two main modes: cluster and single. In single mode, only one Puma
process boots. In cluster mode, a `master` process is booted, which prepares
(and may boot) the application and then uses the `fork()` system call to create
one or more `child` processes. These `child` processes all listen to the same
socket. The `master` process does not listen to the socket or process requests -
its purpose is primarily to manage and listen for UNIX signals and possibly kill
or boot `child` processes.
We sometimes call `child` processes (or Puma processes in `single` mode)
_workers_, and we sometimes call the threads created by Puma's
[`ThreadPool`](../lib/puma/thread_pool.rb) _worker threads_.
## How Requests Work
![https://bit.ly/2zwzhEK](images/puma-connection-flow.png)
* Upon startup, Puma listens on a TCP or UNIX socket.
* The backlog of this socket is configured with a default of 1024, but the
actual backlog value is capped by the `net.core.somaxconn` sysctl value.
The backlog determines the size of the queue for unaccepted connections. If
the backlog is full, the operating system is not accepting new connections.
* This socket backlog is distinct from the `backlog` of work as reported by
`Puma.stats` or the control server. The backlog that `Puma.stats` refers to
represents the number of connections in the process' `todo` set waiting for
a thread from the [`ThreadPool`](../lib/puma/thread_pool.rb).
* By default, a single, separate thread (created by the
[`Reactor`](../lib/puma/reactor.rb) class) reads and buffers requests from the
socket.
* When at least one worker thread is available for work, the reactor thread
listens to the socket and accepts a request (if one is waiting).
* The reactor thread waits for the entire HTTP request to be received.
* Puma exposes the time spent waiting for the HTTP request body to be
received to the Rack app as `env['puma.request_body_wait']`
(milliseconds).
* Once fully buffered and received, the connection is pushed into the "todo"
set.
* Worker threads pop work off the "todo" set for processing.
* The worker thread processes the request via `call`ing the configured Rack
application. The Rack application generates the HTTP response.
* The worker thread writes the response to the connection. While Puma buffers
requests via a separate thread, it does not use a separate thread for
responses.
* Once done, the thread becomes available to process another connection in the
"todo" set.
### `queue_requests`
![https://bit.ly/2zxCJ1Z](images/puma-connection-flow-no-reactor.png)
The `queue_requests` option is `true` by default, enabling the separate reactor
thread used to buffer requests as described above.
If set to `false`, this buffer will not be used for connections while waiting
for the request to arrive.
In this mode, when a connection is accepted, it is added to the "todo" queue
immediately, and a worker will synchronously do any waiting necessary to read
the HTTP request from the socket.