Skip to content

llamafile v0.8.13 Release Notes

by Mozilla

on

llamafile 0.8.13 supports the latest models, has various quality improvements, and has new commands to try like whisperfile (speech to text / translation) and sdfile (image generation). The performance of the new HTTP server for embeddings has tripled.

Other updates include:

  • Support for other new model architectures, e.g. Open ELM, GPT NEOX, Arctic, DeepSeek2, ChatGLM, BitNet, T5, JAIS, Poro, Viking, Tekken, and CodeShell.
  • Mistral Nemo compatibility. Get the fresh llamafiles here.
  • You can now use Gemma 2B. This model was released by Google a few weeks ago. It’s very snappy, even on CPU, thanks to the new high-quality vectorized GeLU implementation.
  • Better llamafiles for LLaMA v3.1 have been uploaded. It can now scale to the full 128k context window. Your prompt can be a whole book that you can ask questions about.

Join the conversation in Discord!