Pull Request Took 6 Months to Merge (And Why)

I remember the first time I saw it.

It wasn’t the size that made me uneasy—327 lines changed, a handful of files touched. I’ve seen worse. No, it was the feeling of a door half-open in a hallway no one used anymore.

PR #1842: “Refactor auth token refresh flow”
Opened: March 3
Last updated: March 3
Status: Open
Label: high-priority (the most cursed label in the building)

Six months later, it was still there, like a cold case pinned to a corkboard everyone pretended not to notice.

They called me in when the jokes stopped being funny.

1) Scene of the Crime (A Pull Request Stuck)

The repo was a familiar neighborhood: TypeScript backend, a sprinkle of Go services, and a monorepo layout that promised convenience and delivered dependency wars.

The PR description was clean—almost too clean.

“Fixes token refresh race condition causing sporadic 401s under load. Adds deduping to refresh calls. Includes tests.”

That kind of sentence reads like a confession and an alibi at the same time.

I pulled it locally.

git fetch origin pull/1842/head:case-1842
git checkout case-1842
pnpm test

Red.

Not catastrophic red. The worst kind: one flaky test in a suite of 2,400, failing every third run like a heartbeat you can’t trust.

And the CI logs? A ransom note written in YAML.

2) Suspects

I made a list. Every detective does.

Suspect A: The Reviewer Who Never Approves

You know the type—leaves comments like “Consider renaming x to y for clarity,” then disappears for two sprints.

Suspect B: The CI Pipeline

A labyrinth of caching, matrix builds, and secrets. Everyone blames it because it can’t defend itself.

Suspect C: The Architecture

Auth touches everything. A tiny change in token refresh can ripple into billing, sessions, notifications, and whatever runs on cron at 3 a.m.

Suspect D: The PR Author

Smart. Quiet. Writes code like a surgeon. And surgeons, I’ve learned, sometimes forget that the patient wakes up afterward.

Suspect E: The “Just One More Thing” Culture

The deadliest suspect. A PR doesn’t die from one bullet. It dies from a thousand polite suggestions.

3) The First Clue: A Race Condition With Good Manners

The bug was real. Under load, multiple requests hit an expired access token at the same time, and the system did what most systems do when surprised:

It panicked politely.

Before, the code did this:

Request fails with 401
Trigger refresh
Retry request

But if 20 requests fail together, you get:

20 refresh calls
last refresh wins
others retry with tokens that are already invalidated
chaos, intermittent failures, angry users

The PR’s solution was deduplication: one refresh in flight per user/session, everyone else awaits the result.

The core idea looked like this:

const refreshInFlight = new Map<string, Promise<TokenPair>>();

export async function getValidToken(sessionId: string): Promise<TokenPair> {
  if (!needsRefresh(sessionId)) return readTokens(sessionId);

  let p = refreshInFlight.get(sessionId);
  if (!p) {
    p = refreshTokens(sessionId).finally(() => refreshInFlight.delete(sessionId));
    refreshInFlight.set(sessionId, p);
  }
  return p;
}

Clean. Sensible. Elegant.

And yet: six months.

Elegant code doesn’t die like this unless someone helped.

4) Code Review Hell Has a Sound

I opened the PR timeline. It read like a slow-motion argument between ghosts.

“Could we avoid a global map?”
“Does this leak memory?”
“What about multi-process deployments?”
“Should this be in Redis?”
“Could we make this a generic ‘singleflight’ helper?”
“What about retries? What about backoff? What about cancellation?”
“Can we rename refreshInFlight to tokenRefreshPromisesBySessionId?”

Every comment was reasonable in isolation. Together, they were a net.

By week three, the PR author added a small cleanup. Then another. Then tests for the tests. Then a config toggle. Then docs.

By month two, the PR was no longer “Fix token refresh race condition.” It was “Fix auth forever.”

By month four, half the team had opinions, and none of them were wrong enough to dismiss.

That’s the trick with review hell: it’s built out of correct statements.

5) The Second Clue: The Flaky Test Wasn’t Flaky

I re-ran that failing test under a microscope: timestamps, concurrency, forced scheduling.

It failed when the clock moved.

Not time zones. Not locale.

Monotonic vs wall time.

Somewhere, a test assumed that Date.now() would behave like a stopwatch. In CI, on shared runners, it behaved like… a clock. NTP adjustments, VM pauses, jitter. The test’s “token expires in 60s” logic occasionally expired early in simulated time.

The PR author tried to stabilize it with larger buffers. Reviewers asked for smaller buffers. Someone suggested faking time. Someone else said “Don’t over-mock.”

The PR became a philosophical debate about time itself.

So I patched the test properly: inject a clock.

export interface Clock { now(): number }
export const SystemClock: Clock = { now: () => Date.now() };

// in tests
let t = 0;
const FakeClock: Clock = { now: () => t };

Deterministic time. Deterministic results.

The test stopped failing.

The PR still didn’t merge.

Which meant the flaky test was never the real problem.

It was a distraction. A decoy. Someone wanted the team staring at the clock while the real culprit walked out the back door.

6) The Third Clue: “Works on My Machine” Was True

I deployed the branch to staging. No 401 storms. No regressions. Metrics looked calm.

Then I did something most people forget: I checked production topology.

Production wasn’t one Node process.

It was twelve.

Across three regions.

Behind a load balancer that didn’t guarantee stickiness.

And the dedupe map? It lived in memory.

Meaning:

Request A hits instance 1 → triggers refresh
Request B hits instance 2 → triggers refresh
…repeat twelve times

Still better than twenty refreshes per user, but not the single-flight guarantee the PR promised.

A reviewer had commented on this in week one:

“This won’t work cross-instance.”

Someone else replied:

“We can accept that for now.”

And yet that single comment haunted the PR like a draft under a door.

Because it wasn’t just a technical issue.

It was an identity issue.

If the PR merged as-is, it would quietly admit something the team didn’t want to say out loud:

They were relying on a distributed system to behave like a single machine.

That’s when I realized what I was actually investigating.

Not a PR.

A fear.

7) The Trap: The “Perfect” Fix Would Take Another 6 Months

The “correct” solution was obvious:

distributed lock (Redis)
or singleflight via shared cache
or sticky sessions
or central auth service

But each option came with its own shadow:

Redis adds latency and failure modes
locks can deadlock under network splits
sticky sessions reduce resilience
centralizing auth is a migration project disguised as a bug fix

A straightforward PR had become a fork in the road for architecture.

So the PR sat there, because merging it meant choosing a direction.

And nobody wanted to be the one who chose wrong.

8) The Twist: It Merged in 12 Seconds

On September 7, I opened PR #1842 again.

It was… gone.

Not closed.

Not merged.

Just… irrelevant.

A new commit had landed on main the night before:

“Switch token refresh to client-managed OAuth flow; deprecate server refresh endpoint.”

The company had changed identity providers. The refresh race condition no longer existed in the same form. The endpoint this PR fixed was now behind a feature flag set to off for all tenants.

The PR didn’t take six months to merge.

It took six months for the world around it to move on.

And the final insult?

The author of the new change was the same person who opened PR #1842.

They never fought the review war.

They walked around it.

9) Why It Took 6 Months (Why Pull Requests Take So Long)

Here’s the autopsy, clean and clinical:

The PR touched a boundary system (auth). Boundary systems attract debate because they’re where assumptions live.
Review feedback was “correct” but unbounded. Without a firm scope, every good idea becomes a requirement.
A flaky test provided endless low-risk motion. Everyone could “help” without addressing the uncomfortable architecture question.
The solution didn’t match production topology. In-memory dedupe was a partial fix; partial fixes trigger perfection instincts.
The business changed direction. Sometimes the fastest merge is a strategic pivot.