Thoughts, learnings, and deep dives from my AI and full-stack journey.
Before 2017, models like RNNs and LSTMs processed text one word at a time, which meant: - Slow training - Poor long-term memory -No parallel processing Everything changed by Google 's paper “Attention Is All You Need” - Instead of reading sequentially - The model looks at all words at once and focuses on what matters most using Attention. This idea led to the Transformer, the foundation of modern models like GPT and BERT. In this article, we’ll quickly understand the paper and how Transformers works.
Ask Kowsik
Online
Hi! 👋 I'm Kowsik Y. Ask me anything — about my projects, skills, background, or how to get in touch!