The Future of AI is Serial

11 Feb, 2026

The dominance of the transformer architecture lies not only in its performance, but also its efficiency. Unlike its most successful predecessor, the LSTM, training the transformer is simple. There is no iterating across the sequence. There is no backpropagation through time. The transformer ingests the entire sequence at once, feeding hungry GPU cores with no interruptions.

However, this advantage may prove to be the transformer's fatal limitation.

The Serial Scaling Hypothesis

The serial scaling hypothesis formalizes this concern.

Recent Developments

COT reasoning
Iterative models for ARC
TTT sequence modelling architectures

The Future

scaling along the batch dimension
training for longer amounts of time with fewer machines
a new computing architecture?

#blog