What are LLMs?
The Emergence of Foundational Models
I wanted to talk a bit about LLMs and how they are integrating in a new vision for the future of artificial intelligence.
With GPT-4 being announced soon, let’s take a look at large language models, sometimes called LLMs are and why they will be transformative.
In fact, the celebrated writer and future analystin an intro to his GPT-4 article wrote:
It’s been 29 months since OpenAI launched GPT-2, its large-language model which demonstrated the power of transformers-based neural networks. GPT-2 impressed with the quality of its natural text generation. Its successor GPT-3, a bigger, more complex model, delivered even more powerful results.
Since GPT-3 was released in 2020, we have witnessed a wave of new innovations and products built on it and similar models. It isn’t just about feeding a model a text prompt and having it spew out lots of plausible-sounding (but not necessarily true) copy. We’re witnessing the same prompt-based approach to images, movies and more.
The best introduction online I could find to LLMs is actually by Co:here.
Given how language is key to how humanity progressed, it’s not surprising to find large language models proving important in A.I.’s development.
Language is important. It’s how we learn about the world (e.g. news, searching the web or Wikipedia), and also how we shape it (e.g. agreements, laws, or messages). Language is also how we connect and communicate — as people, and as groups and companies.
1. Discussion of GPT-1 paper (Improving Language Understanding by Generative Pre-training).
2. Discussion of GPT-2 paper (Language Models are unsupervised multitask learners) and its subsequent improvements over GPT-1.
3. Discussion of GPT-3 paper (Language models are few shot learners) and the improvements which have made it one of the most powerful models NLP has seen till date.
Many of expect GPT-4 to launch between February and June, 2023.
Sam Altman’s rants on Twitter about Startups are almost as painful to read as Reid Hoffman’s “Newsletter” on LinkedIn, and details of GPT-4 are murky at best.
The year was 2017, and the Transformer came on the scene. The transformer neural network is a novel architecture that aims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. It was first proposed in the paper “Attention Is All You Need.” and is now a state-of-the-art technique in the field of NLP.
A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the words in this sentence.
Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect subtle ways even distant data elements in a series influence and depend on each other.
First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. They’re driving a wave of advances in machine learning some have dubbed transformer AI.
Keep reading with a 7-day free trial
Subscribe to Datascience Learning Center to keep reading this post and get 7 days of free access to the full post archives.