evolution banner

Adam Klein

The Future of AI is Serial

The dominance of the transformer architecture lies not only in its performance, but also its efficiency. Unlike its most successful predecessor, the LSTM, training the transformer is simple. There is no iterating across the sequence. There is no backpropagation through time. The transformer ingests the entire sequence at once, feeding hungry GPU cores with no interruptions.

However, this advantage may prove to be the transformer's fatal limitation.

The Serial Scaling Hypothesis

The serial scaling hypothesis formalizes this concern.

Recent Developments

The Future

#blog