Systems and Self-Defense

Published 2025-04-16

Despite what the first Avengers movie would tell us, a system can protect itself from itself if built intentionally. To understand how, let's start with the following:

Tyler’s Law:

“Any system will inevitably be used to 100% of its authorized capacity.”

Tyler's Corollary:

"If your authorized capacity is equal to your available capacity, your system will fail."

"Authorized" is a term different from "available" capacity and they are not interchangeable.

Denial of Service (DoS)

Authorized capacity for a computer is not generally controllable by the end user (unless you've got root access), so a single process can consume as much in resources as the computer has available (with very few limits). If one runs a command to duplicate a movie file of "Plan 9 from Outer Space" fifty thousand times, like

seq 50000 | xargs -I{} cp plan9.mov plan9copy{}.mov

the computer will dutifully use all its resources to accomplish that objective until the job completes or the disk is full.

Note, there is no interactivity once executed -- the command accepts one set of instructions, stops taking input, then executes without indications of progress until the job is done.

A service, on the other hand, accepts input from another source and persists. As the service does work on a job, it may be coded to accept more inputs, and reply to the requestor with already-completed work.

This poses a problem for managing resources: how many resources are to be used for in-flight operations? Does the service have enough resources to accept new work while processing a current job? How does the service tell the requestor it's not ready yet?

If the system is using all of its available resources to do work, then there is no resource left to respond to a client/user, to process added signals (like kill/term), or even provide telemetry to an observer.

If a service or a computer "goes silent," how are we sure it is functioning correctly, if at all?

The Problem

The key point to the previous section is this:

If we are able to give a service enough work that it uses all of its available resources, then we've achieved a Denial of Service condition.

This is bad. To allow a program to "cancel" an erroneous command or be triggered to produce telemetry / feedback, the program must be able to listen for signals from the operating system and act accordingly. The only exception is SIGKILL, which cannot be blocked or handled.

The goal of the operating system / kernel is to ensure that "authorized resources" never exceed "avialable resources", or the system will crash. This is why SIGKILL is unblockable -- it's an action of last resort by the operating system (OS) to protect itself.

But what if we have an interactive service? We don't want to terminate the process if it gets stuck -- we want it to keep running. So how do we protect it?

Self-defense

All programs and services practice a form of self-defense known as response-codes or error-codes. They provide signals to an operator or requestor that vary from "I'm still here" to "Please try again later" or "this broke something" or even "your request is broken and I won't do it."

What does this look like in practice though?

Service Example

Let's say I have a web server that accepts text, appends it to a file, and returns a line number to the client.

(Step 1) ClientA  --  "foo"  --> <Service> --> [write to disk]
(Step 2) ClientA <--   "1"   --  <Service>

Because this service has to write values sequentially, while it is performing work for Client-A, it cannot do anything else. This means that if Client-A gives sufficient work to Service, it can't do anything else.

So let's look at what happens when we introduce Client-B:

(Step 1) ClientA  --   "foo"   --> <Service> --> [write to disk]
(Step 2) ClientB  --   "bar"   --> <Service> --> [BLOCKED]
(Step 3) ClientB <-- "TIMEOUT" --  <Service>
(Step 4) ClientA <--    "1"    --  <Service> --  [write completes]

Because this service has to write values sequentially, while it is performing work for Client-A, if Client-B wants send "bar" to the service, we must tell Client-B to wait or come back later.

By default, a TCP connection to a service will CONNECT, send data, and then receive a response. The operating system can multiplex TCP CONNECT requests, so if one is already active, it will tell the underlying network hardware to "wait" until it can open up another port.

Your application, however, can't see this "wait" condition -- it just goes silent until either the kernel accepts your connection, or you time-out and the kernel evicts you from the queue.

This is not a great solution. What could we do instead?

Do Nothing?

No really, what if we do nothing?

That's a pretty great situation for the developer -- zero work needed and the kernel/OS does the multiplexing. This, however, is a horrible experience for clients and service operators. Before specific solutions like running a dedicated HTTP service or providing an http stack in-process, the answer to making a service network-available was using inetd. It was (and still is) incredibly slow and does not scale beyond very low traffic rates.

So, in doing nothing, the service appears inconsistent, with periodically high latency, and does not actually fix the problem (see "Denial of Service" above).

Application Layer Defense

Most network services operate on HTTP, even ones that bind to unix sockets. Even grpc operates on http2, so I think it's safe to say I can leverage HTTP status codes as an example of how to respond to a client without requiring too much translation to other stacks.

When an HTTP server is "busy", rfc6585 suggests responding with a code 429 which maps to "Too Many Requests."

Of note is this paragraph:

Note that this specification does not define how the origin server identifies the user, nor how it counts requests. For example, an origin server that is limiting request rates can do so based upon counts of requests on a per-resource basis, across the entire server, or even among a set of servers. Likewise, it might identify the user by its authentication credentials, or a stateful cookie.

Let's refer back to our Server example. If we have Client-A handling a request, then any additional requests during the time Client-A's request is being processed would be "too many requests" for the server to handle.

So what we should so is configure the server to do two things at once (or, at least, two things concurrently). As pseudocode:

var locked bool

fn writeLineToFile(s string, f file) -> (success bool, l int) {
  locked = true;
  n, err = os.Write(s, f);
  locked = false;

  if err != nil{
    return false, 0; // We failed
  }
  return true, n;
}

main(){
  f = open("filename")
  requests = http.Listen(port)

  for r in requests {
    if locked {
      r.reply(http-429) // Too Many Requests
    } else {
      ok, line = writeLineToFile(r.payload, f)
      if ok {
        r.reply(line)
      } else {
        r.reply(http-500) // Error
      }
    }
}

Let's take a second to look at this -- the main code opens a file to persist the data and listens for requests on the HTTP port. All seems normal, until we get to the if locked section. If the lock is active, immediately stop what you're doing and reply with a HTTP-429. (For those using GRPC, you can swap "HTTP-429" out for metadata status code UNAVAILABLE(14).)

You'll notice that we haven't done any additional code checking -- no computation, no querying the file, just check the boolean value, then act.

This is, computationally speaking, very cheap to do. The benefit of the positioning also means that we short-circuit the operation before we even start down the computationally-expensive path of accepting the payload and doing anything with the file.

Looking further into the writeLineToFile function, you'll also see that we "lock" the file for the minimum part of the operation -- the part where we actually write to the file. We don't lock the error-checking portion of the code and we don't lock the reply to the client. This means that while the code is doing things-not-writing-to-files, we can go as fast and as concurrently as we want.

If we increase our activity to three clients (A,B,C), our new operation graph looks something like this:

(Step 1)  ClientA  --  "foo"   --> <Service> --> [write to disk]
(Step 2)  ClientB  --  "bar"   --> <Service> --> [BLOCKED]
(Step 3)  ClientC  --  "quux"  --> <Service> --> [BLOCKED]
(Step 4)  ClientB <-- "E: 429" --  <Service>
(Step 5)  ClientC <-- "E: 429" --  <Service>
(Step 6)  ClientA <--   "1"    --  <Service> --  [write completes]
(Step 7)  ClientC  --  "quux"  --> <Service> --> [write to disk]
(Step 8)  ClientB  --  "bar"   --> <Service> --> [BLOCKED]
(Step 9)  ClientB <-- "E: 429" --  <Service>
(Step 10) ClientC <--   "2"    --  <Service> --  [write completes]

You'll notice at Step-7, ClientC tried again, faster than ClientB, and they won the race! As we reply to clients, they can decide when to try again, and since ClientA's request was fulfilled in Step-6 the lock was lifted, allowing ClientC to make a write.

Regardless of who wins the race to the next write, the service can only handle one active request at a time, and it protects itself from being forced to handle additional work by telling clients to try again.

"Authorized Capacity" != "Available Capacity"

The service described previously has an "Authorized Capacity" of "1 in-flight request." If the computer running this service was a single-core tiny computer, it might not have enough power to serve additional requests (even the HTTP-429 replies) but most computing devices will have either sufficient speed (allownig for concurrent processing) or additional cores to handle concurrent requests. As a result, authorized capacity is less than the total available capacity of the server.

When running an unmodified, uncontained process in a computer, the process is "authorized" to use nearly the entire set of resources available to the OS.

In a cloud environment, a process can run inside a virtual machine (VM) where the constraints are similar to a bare-metal computer, but (with few exceptions) are executing within even higher capacity bare-metal hardware. This means the VM is authorized to operate in a higher-availability environment.

If running in a container or bsd-jail, the constraints are set by the container manager (sometimes just a command line argument!) then the kernel grants (and constrains) those resources to the process inside.

If a process exceeds its authorized limits in any of these environments, the supervisor process (OS, container manager, VM, etc) will forcibly end the offending process ("kill"/"terminate") and reclaim its resources. It doesn't matter whether or not additional resources exist that the process could use -- the supervisory system will act.

Conclusions

All of this leads to a few heuristics when defining the runtime environment for an application:

Determine the maximum resource usage for a given process, then add excess capacity to ensure the system has more than it needs by an appreciable margin. A heuristic is a minimum of 25-15% at lower values, and as little as 5% for larger environments (to avoid wasting significant capacity).
Stress-test / Load-test your application to its "maximum" throughput for the resources you want to use, then set your maximum in-flight to 90% of that value. This will ensure that your instance is always capable of processing the maximum in-flight within limits.

Both of these will ensure your authorized resources (for the application, your container, and your process) will never exceed your available capacity and keep your service functioning at maximum throughput in the face of overwhelming requests.