Bits, Math, Hashes, Streams
A systems refresher covering bit operations, modular arithmetic and group theory, hashing, and streams — written after an AI assistant generated a test for me that contained a magic number I could no longer explain.
Why this post exists
An AI coding assistant generated a streaming-upload integration test for
me, and I asked it: where does the constant 2654435761 come from, and
why does it work? The answer pulled on a long thread — bit operations,
modular arithmetic, group theory, hashing, streams — and I realized I
had let most of those foundations rust. This post is the refresher I
wrote (with the assistant's help) afterwards, in the order I needed to
recover the ideas.
Table of contents
- Bit operations & two's complement
- Modular arithmetic — the bridge to group theory
- Hashing — theory and practice
- Streams vs buffers — the memory model
- Web streams vs Node streams — runtime reality
- Where it all comes together
Bit operations & two's complement
The mental model: integers are arrays of bits
A 32-bit integer is just 32 binary digits. The number in 32-bit binary is 00000000 00000000 00000000 00101010. Every integer operation is, at the hardware level, manipulation of these bits. Bit operations expose that directly.
The four boolean operators on integers
These work bitwise — for each bit position, apply a 1-bit truth table:
| Op | Symbol | Truth table |
|---|---|---|
| AND | & | 1 & 1 = 1, everything else 0 |
| OR | | | 0 | 0 = 0, everything else 1 |
| XOR | ^ | 1 if the inputs differ, else 0 |
| NOT | ~ | flips every bit |
Example with 4-bit numbers:
1010 ( = 10 )
& 0110 ( = 6 )
------
0010 ( = 2 )The five idioms you'll see everywhere
These come up in protocols, serialization, flags, hashing, and graphics:
// Mask / extract: keep only some bits, zero the rest
x & 0xff // keep low 8 bits → byte truncation
x & 0xffff0000 // keep high 16 bits
// Set: turn specific bits ON without disturbing others
flags = flags | FLAG_READABLE
// Clear: turn specific bits OFF
flags = flags & ~FLAG_READABLE
// Toggle: flip specific bits
flags = flags ^ FLAG_READABLE
// Test: is a bit set?
if ((flags & FLAG_READABLE) !== 0) { /* ... */ }Shifts
<< (left), >> (right arithmetic), and >>> (right unsigned — a JavaScript-only operator) move bits sideways:
00001010 ( = 10 )
<< 2
00101000 ( = 40, which is 10 × 2² )Left shift by = multiply by . Right shift by = divide by (rounded toward minus infinity).
Two's complement — why negative numbers are weird
Computers don't store a "minus sign." They use two's complement: the highest bit means . In 4-bit:
0001 = +1
0010 = +2
0111 = +7 ← largest positive
1000 = −8 ← smallest negative (the high bit dominates)
1001 = −7
1111 = −1To negate: flip all bits, add 1. So in 8-bit:
+5 : 00000101
flip: 11111010
+1 : 11111011 ← this is −5This is why ~x === -x - 1 in JavaScript — ~ flips bits, and -x flips bits and adds 1.
Why JavaScript has the bizarre >>> operator
JavaScript numbers are 64-bit floats. But bitwise operators secretly coerce to 32-bit signed integers, do the operation, and convert back. So:
2 ** 31 // 2147483648 — fine as a float
(2 ** 31) | 0 // -2147483648 — interpreted as signed!>>> (unsigned right shift) is the only operator that interprets the result as unsigned. Hence the idiom:
const seed = (index * 2654435761) >>> 0;>>> 0 does nothing arithmetically (it's a shift by zero) but forces the result to be re-interpreted as a 32-bit unsigned integer. Without it, large products wrap into negative territory and the byte derivation we'll do later goes off the rails.
Gotchas
- In JavaScript, bitwise ops always truncate to 32 bits.
(2 ** 33) | 0 === 0. Reach forBigIntwhen you need wider bit ops. &&≠&. Logical vs bitwise. The lint ruleno-bitwiseexists in some codebases for exactly this confusion.- Right shift on negative numbers:
-1 >> 1 === -1(sign-preserving), but-1 >>> 1 === 2147483647(treats as unsigned).
Modular arithmetic — the bridge to group theory
What "mod" actually is
a mod n is the remainder when you divide by . So , , (in math; in many languages -1 % 5 = -1 due to truncated division — JavaScript and Python differ here).
The set of integers mod is written (or ). It's . You can add, subtract, multiply within it; the result wraps around.
Why programmers care
Bytes are . 32-bit unsigned ints are . Hash table buckets are . Cyclic buffers are . Almost every "wrap-around" you see in code is a mod operation in disguise, often written x & (n-1) when is a power of 2 (because x & (n-1) === x mod n for power-of-2 ).
Multiplicative inverses
In regular arithmetic, every nonzero number has an inverse: . In , "division" doesn't always work — has no solution.
Theorem. has a multiplicative inverse in if and only if .
For , this means: odd numbers have inverses, even numbers don't (because ). This is why every multiplicative-hash constant you'll ever see is odd.
Bringing it back: why * 2654435761 is bijective
Define . When is odd:
- has an inverse (multiply by ), so it's invertible.
- An invertible function on a finite set is a bijection — every output corresponds to exactly one input.
So index → seed doesn't lose information. Different indices ALWAYS produce different seeds. That's the property a test like the one we'll see later relies on to detect "chunk-swap" bugs in a streaming pipeline.
Mini group theory in 60 seconds
A group is a set + an operation that is:
- Closed — combining two elements stays in the set.
- Associative — .
- Has an identity element — some with .
- Every element has an inverse.
Examples:
| Set | Operation | Identity | Inverse of |
|---|---|---|---|
| Integers | |||
| Nonzero rationals | |||
| Odd elements of (called ) | multiplicative inverse mod |
The last row is exactly the group we use when we multiply by 2654435761 mod . The "group size" is (the count of odd integers below ).
The order of an element is the smallest positive with . For our constant, 0x9E3779B1 has order somewhere around or . Multiplying by repeatedly cycles through that many distinct values before repeating. As long as the number of inputs we feed it is less than the order, we never collide.
Case study: where 2654435761 comes from
We've leaned on this constant repeatedly without saying where it comes from. The answer is short:
where is the golden ratio. This is Fibonacci hashing, popularized by Knuth in The Art of Computer Programming, Vol. 3, §6.4.
Why ? It is, in a precise sense, the most irrational number — its continued-fraction expansion is the slowest-converging of any real number, . The practical consequence: multiplying integers by and looking at the low bits produces outputs that, for consecutive inputs, are maximally spread out — there is no detectable pattern modulo any power of 2.
Loose intuition: imagine placing the points around a unit circle (mod 1). For any other irrational, the sequence eventually clusters at certain angles. The golden ratio is provably the one that fills the circle the most uniformly. Multiplying by in is the discrete version of that property — and the low byte we extract with & 0xff inherits it directly.
For contrast, all four of these multipliers are odd, so all four are bijective on :
const seed_v1 = (index * 1) >>> 0; // identity — terrible
const seed_v2 = (index * 31) >>> 0; // small prime — visible patterns in low byte
const seed_v3 = (index * 65537) >>> 0; // 2^16 + 1 — patterns at every 2-byte boundary
const seed_knuth = (index * 0x9E3779B1) >>> 0; // no pattern in any byteOnly the last one produces bytes that look statistically random when projected with & 0xff. Bijection is necessary, but the uniform-looking projection is the additional property we actually care about — and what 0x9E3779B1 was chosen for.
The same constant has gone on to show up in Linux's hash_32(), the Rust compiler's FxHash, Boost's hash_combine, and Java's ThreadLocalRandom — anywhere a single fast scrambling pass over an integer is wanted. It is not the only good choice (MurmurHash3 uses 0xcc9e2d51 and 0x1b873593; xxHash uses 0x9E3779B1 as one of its primes), but it is the one you reach for when you want one multiplication's worth of avalanche from an integer and no external library.
Why this matters in code
Wherever you see "modular wrap-around," "bijection on a finite set," "permutation of a fixed-size space" — you're doing group theory. Pseudo-random number generators (LCGs), hash table probing sequences, cryptographic primitives, error-correcting codes, even how /dev/urandom was historically built — all live in this neighborhood.
Hashing — theory and practice
A hash function, formally
A function where (the input space is much larger than the output space). Because of pigeonhole, there must be collisions — different inputs producing the same output.
Goal of a "good" hash: make collisions look random, even though they're deterministic.
The three properties we want (in varying degrees)
- Uniformity — outputs are evenly distributed across the output space. Drives hash table performance.
- Avalanche — flipping one input bit changes about half the output bits. Prevents adversaries from predicting collisions; also reinforces uniformity.
- Collision resistance — given any input, finding a different input with the same hash is computationally infeasible. Cryptography only.
The three families
| Family | Examples | Speed | Properties | Use cases |
|---|---|---|---|---|
| Trivial | x mod n, x itself | Instant | Uniformity only if input is already uniform | Hash tables when keys are random-ish |
| Non-cryptographic | Fibonacci, FNV-1a, MurmurHash3, xxHash, CityHash | Nanoseconds | Uniformity + avalanche | Hash tables, bloom filters, content-addressing without security needs |
| Cryptographic | MD5 (broken), SHA-1 (broken), SHA-256, SHA-3, BLAKE2/3 | Microseconds | All three, plus pre-image resistance | Passwords, signatures, content-addressed storage where adversaries matter |
The birthday paradox (why hash sizes matter)
With a hash of bits, you'd naively expect to need random inputs before seeing a collision. Wrong. You need only .
This is the birthday paradox: in a room of 23 random people, there's a >50% chance two share a birthday (not 183 — half of 365). For hashes:
| Hash size | Collisions expected after |
|---|---|
| 32-bit | ~65,000 inputs |
| 64-bit | ~4 billion inputs |
| 128-bit (MD5) | ~ inputs |
| 256-bit (SHA-256) | ~ — astronomical |
This is why content-addressed storage (Git, IPFS, etc.) uses 160-bit or larger hashes. 128-bit just barely beats brute-force; 256-bit is comfortable forever.
A worked example: verifying a streaming upload end-to-end
Suppose we're testing a system that uploads a 50 MB blob via multipart and want to assert byte-perfect round-trip. We generate the payload deterministically, hash it as we generate, then read the uploaded object back and hash that. If both digests match, every byte survived.
import { createHash } from 'node:crypto';
const PAYLOAD_BYTES = 50 * 1024 * 1024;
const CHUNK_BYTES = 1 * 1024 * 1024;
const TOTAL_CHUNKS = PAYLOAD_BYTES / CHUNK_BYTES;
// Deterministic, O(1)-memory generator. Each chunk's starting byte is
// scrambled via Knuth's multiplicative constant so adjacent chunks look
// nothing alike — this is what lets the test detect "chunk got swapped"
// bugs.
const makeChunk = (index: number): Uint8Array => {
const chunk = Buffer.alloc(CHUNK_BYTES);
const seed = (index * 2654435761) >>> 0;
for (let i = 0; i < CHUNK_BYTES; i++) {
chunk[i] = (seed + i) & 0xff;
}
return chunk;
};
const expectedHash = (() => {
let cached: string | undefined;
return () => {
if (cached != null) return cached;
const h = createHash('sha256');
for (let i = 0; i < TOTAL_CHUNKS; i++) h.update(makeChunk(i));
cached = h.digest('hex');
return cached;
};
})();Why SHA-256 specifically and not faster MurmurHash?
- It's in
node:cryptofor free. - Speed difference doesn't matter at 50 MB once per test (~80 ms).
- It's the universal "compare these two byte sequences" idiom — anyone reading the test recognizes it instantly.
The constant 2654435761 here is doing exactly the hashing job we discussed in the modular-arithmetic section: it scrambles index ∈ {0, 1, 2, …} into something that looks random byte by byte.
Streams vs buffers — the memory model
The fundamental tension
You have a 50 MB file. You want to read it from disk and send it over the network.
The naive approach:
const bytes = await fs.readFile('big.pdf'); // 50 MB now resident in RAM
await s3.upload(bytes); // SDK copies to networkAt peak, you're holding at least 50 MB in RAM, possibly more (the SDK might internally copy). Now imagine 100 of those happening concurrently in a worker. . OOM.
The streaming approach:
const stream = fs.createReadStream('big.pdf'); // ~64 KB buffer, lazy
await s3.uploadStream(stream); // SDK pulls 64 KB at a timePeak memory: ~64 KB. The same 100 concurrent uploads now use ~6.4 MB. Three orders of magnitude less.
What is a stream, really?
It's an object that implements some version of:
interface Stream {
read(): Promise<Chunk | EOF>; // pull-style: caller asks for next chunk
// or
on('data', (chunk) => /* ... */); // push-style: stream emits chunks at you
}The chunks are typically Uint8Array or Buffer. The stream maintains an internal buffer that fills from the source (file, network, etc.) and drains as the consumer reads. When the buffer is full, the source pauses; when empty, the consumer waits.
Backpressure — the magic word
This is the protocol by which a slow consumer tells a fast producer to slow down. Without it, a 100 MB/s producer feeding a 10 MB/s consumer would queue 90 MB/s in memory, and you'd OOM in seconds.
In Node streams, backpressure is implicit: pipe() checks the destination's buffer level and pauses the source when it's full. In web streams, it's explicit: the consumer calls reader.read() only when it's ready for the next chunk.
Pull vs push streams
| Style | Who drives the pace | Example |
|---|---|---|
| Pull | Consumer calls .read() to ask for next chunk | Web ReadableStream, async iterators |
| Push | Producer fires events; consumer must keep up | Node Readable ('data' event), EventEmitter |
Push streams are easier to write but harder to backpressure. Pull streams are the modern design. Web streams are pull; Node streams are nominally push but pipe() and pipeline() simulate pull behavior.
A textbook pull-stream consumer
Here's how you would drain a ReadableStream<Uint8Array> into a hash without ever holding more than one chunk:
const hashStream = async (
stream: ReadableStream<Uint8Array>,
): Promise<{ hash: string; size: number }> => {
const hasher = createHash('sha256');
let size = 0;
const reader = stream.getReader();
try {
while (true) {
const { done, value } = await reader.read();
if (done) break;
hasher.update(value);
size += value.byteLength;
}
} finally {
reader.releaseLock();
}
return { hash: hasher.digest('hex'), size };
};The loop holds at most ONE chunk at a time (value) — typically 64 KB to 1 MB. It feeds it into the hasher and discards it. Memory footprint is constant regardless of stream size.
If we had instead written const all = await new Response(stream).arrayBuffer(); hash(all);, we'd materialize the full payload and defeat the purpose.
Web streams vs Node streams — runtime reality
Two ecosystems collided
Node.js invented its own stream API in ~2010 (stream.Readable, 'data' events, pipe()). Browsers later standardized the WHATWG Streams spec (ReadableStream, WritableStream, TransformStream) for things like fetch() bodies and Response.body.
For years, Node and browsers had incompatible stream types. Around Node 16+, Node added bidirectional adapters. Bun supports both natively. The two APIs now coexist.
Type comparison
| Concept | Node streams | Web streams |
|---|---|---|
| Readable | import { Readable } from 'node:stream' | ReadableStream<Uint8Array> (global) |
| Writable | Writable | WritableStream |
| Transform | Transform | TransformStream |
| Iteration | for await (const chunk of nodeStream) | for await ... (modern Node) or reader.read() |
| Default chunk type | Buffer | Uint8Array |
Why ReadableStream<Uint8Array> is the modern preference
- Universal. Works in browsers, Node 18+, Bun, Deno, Cloudflare Workers, Vercel Edge. Node streams are Node-only.
- Simpler API. No
readable/paused/ "flowing-mode" confusion. - First-class in
fetch.(await fetch(url)).bodyis aReadableStream<Uint8Array>. Wiring it into anotherReadableStreamconsumer is direct. Wiring it into a NodeReadablerequires a conversion. - Pull-by-default. Backpressure is straightforward.
Conversion patterns
import { Readable } from 'node:stream';
// Node Readable → Web ReadableStream
const web = Readable.toWeb(nodeReadable);
// Web ReadableStream → Node Readable
const node = Readable.fromWeb(webReadable);
// Async iterable → Node Readable
const node = Readable.from(asyncIterable);
// Buffer / Uint8Array → Node Readable
const node = Readable.from(buffer);
// Buffer / Uint8Array → Web ReadableStream
const web = new Response(buffer).body; // or hand-construct via new ReadableStreamA real-world callsite that needs this: serving a local file through a service whose API takes a web stream:
const nodeStream = fs.createReadStream(filePath);
const body = Readable.toWeb(nodeStream) as ReadableStream<Uint8Array>;
return { body, contentType, contentLength: stat.size };The as ReadableStream<Uint8Array> cast exists because Node's type for Readable.toWeb returns ReadableStream<any> (it doesn't know the chunk type), but since fs.createReadStream produces Buffer chunks (which are Uint8Arrays), the cast is sound.
Practical guidance
- Designing a new API from scratch? Take and return
ReadableStream<Uint8Array>. Future-proof. - Working with existing Node APIs (
fs,http)? Use NodeReadablelocally and convert at boundaries withReadable.toWeb/Readable.fromWeb. - Async iteration? Both stream types support
for await (const chunk of stream)in modern runtimes — that's the cleanest way to consume. - Bun specifics?
Bun.file(path).stream()returns a webReadableStream<Uint8Array>directly. No conversion needed.
Where it all comes together
Here is the end-to-end pipeline that motivated this whole post — a streaming integrity test for an S3-style multipart uploader, with each component grounded in one of the concepts above:
Every piece sits on top of one of these refreshed concepts:
- Bit ops generate test data with maximum coverage per byte (
(seed + i) & 0xff). - Modular arithmetic guarantees the generator never repeats over the indices we use.
- Streams keep memory flat at ~1 MB regardless of payload size.
- Cryptographic hashing detects corruption with essentially-zero false positives.
- Web streams are the common currency that makes upload-then-download composable in one consumer loop.
Pull on any thread and you find another. That's been the most satisfying realization of writing this — the constants, the masks, the streams, the hashes aren't five unrelated tricks. They're five faces of the same underlying machinery.
Where to go deeper
If I had an hour per topic, in priority order:
- Bit ops & two's complement — most immediately useful for reading systems code. Chapters 1–4 of Hacker's Delight are the canonical reference.
- Streams & backpressure — directly relevant to day-to-day work. The WHATWG Streams explainer is short and good.
- Hashing fundamentals — Wikipedia's Hash function plus Birthday problem is probably enough.
- Modular arithmetic & group theory — only if you're curious. A Programmer's Introduction to Mathematics by Jeremy Kun (chapters on modular arithmetic and groups) assumes nothing beyond high-school algebra and is excellent.
For modular arithmetic specifically, Concrete Mathematics by Graham/Knuth/Patashnik (chapter 4) is the canonical heavy reference.
A small reflection
An AI assistant can hand me a working test in seconds. What it can't do on my behalf is build the mental model that lets me read that test with understanding — to see the constants, the masks, the streams, and the hashes as a small composition of well-defined ideas rather than magic. That's what this writing was for. The assistant accelerated the recovery; the understanding still has to live in my head.