7B | LLaMA 7b-hf | LLaMA-7B converted to work well with Transformers/HuggingFace. |
Llama-2-7B-32K-Instruct | An open-source, long-context chat model finetuned from Llama-2-7B-32K, over high-quality instruction and chat data, using Together API |
Llama-2-7b | Original base model. |
Llama-2-7b-hf | This is the repository for the 7B pretrained model, converted for the Hugging Face Transformers format. |
Llama-2-7b-chat | The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
|
Llama-2-7b-chat-hf | This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. |
7B Quantized version | LLaMA-7b-4bit | It's a 4-bit quantized version of the model. |
Llama-2-7B-GGUF | GGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible. |
LLaMA-7b-hf-int4 | It has been converted to int4 via GPTQ method and requires some special support code that is also highly experimental. Also converted to work with Transformers/HuggingFace. |
Llama-2-7B-GPTQ | Multiple GPTQ parameter permutations are provided; This version provide details of the options, their parameters, and the software used to create them. |
Llama-2-7B-GGML | GGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. |
Llama-2-7b-Chat-GPTQ | Multiple GPTQ parameter permutations are provided; This version provide details of the options, their parameters, and the software used to create them. |
Llama-2-7B-Chat-GGML | GGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. |
13B | LLaMA -13b | Original LLaMA-13b model |
LLaMA-13b-hf | LLaMA-13B converted to work with Transformers/HuggingFace. |
Llama-2-13b | Original Llama-2-13b model |
Llama-2-13b-hf | This is the repository for the 13B pretrained model, converted for the Hugging Face Transformers format. |
Llama-2-13b-chat | Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety. |
Llama-2-13b-chat-hf | This is the repository for the 13B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. |
13B Quantized version | LLaMa-13B-GGML | GGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. |
llama-13b-hf-int4 | This has been converted to int4 via GPTQ method. This requires some special support code that is also highly experimental. NOT COMPATIBLE WITH TRANSFORMERS LIBRARY. |
Llama-2-13B-GPTQ | Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements. |
Llama-2-13B-GGML | The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support. |
Llama-2-13B-chat-GGUF | GGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible. |
Llama-2-13B-chat-GPTQ | Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements. |
65B/70B | LLaMA-65b | Original LLaMA-65b model. |
LLaMA-65b-hf | LLaMA-65B converted to work with Transformers/HuggingFace. |
Llama-2-70b | Original Llama-2-70b model. |
Llama-2-70b-hf | This is the repository for the 70B pretrained model, converted for the Hugging Face Transformers format. |
Llama-2-70b-chat | Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety. |
Llama-2-70b-chat-hf | This is the repository for the 70B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. |
65B/70B Quantized version | LLaMA-65B-GGUF | GGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible. |
LLaMA-65b-hf-int4 | This has been converted to int4 via GPTQ method. This requires some special support code that is also highly experimental. NOT COMPATIBLE WITH TRANSFORMERS LIBRARY. |
Llama-2-70B-GPTQ | Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements. |
Llama-2-70b-hf-onnx-int4 | This is the repository of INT4 weight only quantization for the 70B fine-tuned model in ONNX format, powered by Intel® Neural Compressor and Intel® Extension for Transformers. |
Llama-2-70B-chat-GPTQ | Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
|
Llama-2-70b-chat-hf-onnx-int4 | This is the repository of INT4 weight only quantization for the 70B fine-tuned model in ONNX format, powered by Intel® Neural Compressor and Intel® Extension for Transformers. |