Llama 2 is an updated version of the Llama language model by Meta AI, and is fully open-source and available to download and run locally. The Llama 2 large language model is free for both personal and commercial use, and has many improvements over its last iteration. The model is available in the following sizes and parameters:
|Llama 2 7B||Source – HF – GPTQ – ggml|
|Llama 2 7B Chat||Source – HF – GPTQ – ggml|
|Llama 2 13B||Source – HF – GPTQ – ggml|
|LLama 2 13B Chat||Source – HF – GPTQ – ggml|
|Llama 2 70B||Source – HF – GPTQ|
|Llama 2 70B Chat||Source – GPTQ|
The model you use will vary depending on your hardware. For good results, you should have at least 10GB VRAM at a minimum for the 7B model, though you can sometimes see success with 8GB VRAM. The 13B model can run on GPUs like the RTX 3090 and RTX 4090. The largest model, however, will require very powerful hardware like an A100 80GB.
Installing Llama 2 Model
To download the Llama 2 model, you will need to complete the registration form on the Meta AI page. They will then send you an activation key which will be required during installation of the model. The Llama models can be used on most LLM interfaces, such as Text Generation Web UI. We will be writing a full tutorial for installing the model soon, but in short: within the Text Generation Web UI, navigate to the models page and paste the Llama model HF space name you wish to use (7B is recommended, unless you have an extremely powerful GPU). For example, to install the 7B chat model, you would paste “meta-llama/Llama-2-7b-hf“. Keep in mind that the download and installation might take a few minutes.
Llama 2 Demo
There is also a demo version of the Llama2 70B Chatbot available at the following Hugging Face space: https://huggingface.co/spaces/ysharma/Explore_llamav2_with_TGI