Part of what makes LoRA so effective is that - like other forms of fine-tuning - it’s stackable. Improvements like instruction tuning can be applied and then leveraged as other contributors add on dialogue, or reasoning, or tool use. While the individual fine tunings are low rank, their sum need not be, allowing full-rank updates to the model to accumulate over time.
This means that as new and better datasets and tasks become available, the model can be cheaply kept up to date, without ever having to pay the cost of a full run.
By contrast, training giant models from scratch not only throws away the pretraining, but also any iterative improvements that have been made on top. In the open source world, it doesn’t take long before these improvements dominate, making a full retrain extremely costly.
We should be thoughtful about whether each new application or idea really needs a whole new model. If we really do have major architectural improvements that preclude directly reusing model weights, then we should invest in more aggressive forms of distillation that allow us to retain as much of the previous generation’s capabilities as possible.
Google "We Have No Moat, And Neither Does OpenAI"
from Dylan Patel
Filed under:
Same Source
Related Notes
- Deep and shallow modules: The best modules are deep: they allow a ...from John Ousterhout
- The upshot for the industry at large, is: the **LLM-as-Moat model h...from Steve Yegge
- What I built isn’t an ActivityPub system as much as a Mastodon-comp...from Tom MacWright
- Any software is considered free software so long as it upholds the ...from writefreesoftware.org
- > Software with fewer concepts composes, scales, and evolves mor...from oilshell
- We don't quite know what to do with language models yet. But we...from Maggie Appleton
- These are wonderful non-chat interfaces for LLMs that I would total...from Maggie Appleton
- combining search and AI chat is actually the wrong way to go and I ...from Garbage Day