Use a small model to generate a 'draft' output, then use a larger and smarter model to score the 'draft', then use a rejection sampling scheme to accept the tokens which are agreed by the small and large models.
In tests, they find that a draft model can give them speedups ranging between 1.92X (on a summarization benchmark called XSum) and 2.46X on a code generation task called HumanEval.
Fwd: Import AI 317: DeepMind speeds up language model sampling; voice cloning tech gets abused; more scaling laws for RL
from Josh Beckman ✉️
Filed under:
Same Source
Related Notes
- In Eliyahu M. Goldratt's "Theory of Constraints", you...from ycombinator.com
- More things than you would think are dynamic strategic problems. If...from marcelo.rinesi
- Nathan's four Laws of Software: 1. **Software is a gas** ...from Jeff Atwood
- To quote McLuhan: "Man becomes, as it were, the sex organs of ...from ycombinator.com
- It is funny to imagine an end state here in which markets are entir...from Matt Levine
- > Software with fewer concepts composes, scales, and evolves mor...from oilshell
- As these stories pop up people act like they’re an incredible marve...from Garbage Day
- few days ago I did something that I never thought I’d do again, and...from brandur.org