Ollama (CloudMonk.io)

Ollama

Return to LLMs

Get up and running with large language models. Run Llama 3.1, Phi 3, Mistral, Gemma 2, and other LLM models. Customize and create your own LLM models.

Ollama is a tool used to run large language models (LLMs) locally on your machine. It serves as a wrapper around the open-source `llama.cpp` library, which implements LLMs in pure C/C++ to maximize efficiency. Ollama supports various models, including the 8 billion parameter Llama 3 model, which can be run using the command `ollama run llama3`. This model requires approximately 16 GB of RAM. For machines with less RAM, smaller models like the 3.8 billion parameter `phi-3` model can be used [1].

[1] [Build a Large Language Model (From Scratch) (chapter-7) by Sebastian Raschka](https://livebook.manning.com/raschka/chapter-7)

Ollama supports a list of models available on https://ollama.com/library

Here are some example models that can be downloaded:

Model Parameters Size Download

* Llama 3.1 - 8B - 4.7GB - ollama run llama3.1
* Llama 3.1 - 70B - 40GB - ollama run llama3.1:70b
* Llama 3.1 - 405B - 231GB - ollama run llama3.1:405b
* Phi 3 Mini - 3.8B - 2.3GB - ollama run phi3
* Phi 3 Medium - 14B - 7.9GB - ollama run phi3:medium
* Gemma 2 - 2B - 1.6GB - ollama run gemma2:2b
* Gemma 2 - 9B - 5.5GB - ollama run gemma2
* Gemma 2 - 27B - 16GB - ollama run gemma2:27b
* Mistral - 7B - 4.1GB - ollama run mistral
* Moondream 2 - 1.4B - 829MB - ollama run moondream
* Neural Chat - 7B - 4.1GB - ollama run neural-chat
* Starling - 7B - 4.1GB - ollama run starling-lm
* Code Llama - 7B - 3.8GB - ollama run codellama
* Llama 2 Uncensored - 7B - 3.8GB - ollama run llama2-uncensored
* LLaVA - 7B - 4.5GB - ollama run llava
* Solar - 10.7B - 6.1GB - ollama run solar

Note

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.