跳到主要內容區塊

text-to-text

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Model sizeLLaMA modelExplanation
7BLLaMA 7b-hfLLaMA-7B converted to work well with Transformers/HuggingFace.
Llama-2-7B-32K-InstructAn open-source, long-context chat model finetuned from Llama-2-7B-32K, over high-quality instruction and chat data, using Together API
Llama-2-7bOriginal base model.
Llama-2-7b-hfThis is the repository for the 7B pretrained model, converted for the Hugging Face Transformers format.
Llama-2-7b-chatThe tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
Llama-2-7b-chat-hfThis is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
7B Quantized versionLLaMA-7b-4bitIt's a 4-bit quantized version of the model.
Llama-2-7B-GGUFGGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
LLaMA-7b-hf-int4It has been converted to int4 via GPTQ method and requires some special support code that is also highly experimental. Also converted to work with Transformers/HuggingFace.
Llama-2-7B-GPTQMultiple GPTQ parameter permutations are provided; This version provide details of the options, their parameters, and the software used to create them.
Llama-2-7B-GGMLGGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models.
Llama-2-7b-Chat-GPTQMultiple GPTQ parameter permutations are provided; This version provide details of the options, their parameters, and the software used to create them.
Llama-2-7B-Chat-GGMLGGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models.
13BLLaMA -13bOriginal LLaMA-13b model
LLaMA-13b-hfLLaMA-13B converted to work with Transformers/HuggingFace.
Llama-2-13bOriginal Llama-2-13b model
Llama-2-13b-hfThis is the repository for the 13B pretrained model, converted for the Hugging Face Transformers format.
Llama-2-13b-chatLlama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety.
Llama-2-13b-chat-hfThis is the repository for the 13B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
13B Quantized versionLLaMa-13B-GGMLGGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models.
llama-13b-hf-int4This has been converted to int4 via GPTQ method. This requires some special support code that is also highly experimental. NOT COMPATIBLE WITH TRANSFORMERS LIBRARY.
Llama-2-13B-GPTQMultiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
Llama-2-13B-GGMLThe GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.
Llama-2-13B-chat-GGUFGGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
Llama-2-13B-chat-GPTQMultiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
65B/70BLLaMA-65bOriginal LLaMA-65b model.
LLaMA-65b-hfLLaMA-65B converted to work with Transformers/HuggingFace.
Llama-2-70bOriginal Llama-2-70b model.
Llama-2-70b-hfThis is the repository for the 70B pretrained model, converted for the Hugging Face Transformers format.
Llama-2-70b-chatLlama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety.
Llama-2-70b-chat-hfThis is the repository for the 70B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
65B/70B Quantized versionLLaMA-65B-GGUFGGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
LLaMA-65b-hf-int4This has been converted to int4 via GPTQ method. This requires some special support code that is also highly experimental. NOT COMPATIBLE WITH TRANSFORMERS LIBRARY.
Llama-2-70B-GPTQMultiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
Llama-2-70b-hf-onnx-int4This is the repository of INT4 weight only quantization for the 70B fine-tuned model in ONNX format, powered by Intel® Neural Compressor and Intel® Extension for Transformers.
Llama-2-70B-chat-GPTQMultiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
Llama-2-70b-chat-hf-onnx-int4This is the repository of INT4 weight only quantization for the 70B fine-tuned model in ONNX format, powered by Intel® Neural Compressor and Intel® Extension for Transformers.
Original ModelModel nameExplanation
LLaMAlawyer-llama-13b-beta1.0It is based on Chinese-LLaMA-13B, has not undergone legal corpus continual training, uses general instruction and legal instruction for SFT, and is equipped with a marriage-related legal retrieval module.
chinese-alpaca-plus-13b-hfCompared with the basic version, the training data has been further expanded. LLaMA has been expanded to 120G text, Alpaca has been expanded to 4.3M command data, and data in the scientific field has been added.
This model is decapoda-research/llama-13b-hf as the base model. It merges the two LoRA weights of ziqingyang/chinese-llama-plus-lora-13b and ziqingyang/chinese-alpaca-plus-lora-13b and converts them into HuggingFace version weights.
Wizard-Vicuna-30B-Uncensored-GPTQThis is wizard-vicuna-13b trained with a subset of the dataset - responses that contained alignment / moralizing were removed. Multiple GPTQ parameter permutations are provided.
LLaVA-Lightning-MPT-7B-previewLLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna/MPT on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Demo: https://llava-vl.github.io/
beaver-7b-v1.0-costThe Beaver Cost model is a preference model trained using the PKU-SafeRLHF dataset. Fine-tuned from model: LLaMA, Alpaca.
alpaca-nativeThis is a replica of Alpaca by Stanford' tatsu. Trained using the original instructions with a minor modification in FSDP mode
Llama2Taiwan-LLaMa-v1.0Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional Mandarin applications.
This model pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional mandarin.
YuLan-Chat-2-13b-fp16It can support 8k Chinese context now, and it was developed by the researchers in GSAI, Renmin University of China.
Chinese-LLaMA-2-13B-hfThe Chinese basic model is based on LLaMA and Falcon, and uses Chinese and Chinese-English parallel corpora for incremental pre-training to extend its language capabilities in English to Chinese.
ORCA_LLaMA_70B_QLoRAThis model uses three datasets including: Dolphin, Open-Platypus, and OpenOrca.
LLaMA2-13B-Holomax-GPTQThis is an expansion merge to the well praised Mythomax model from Gryphe (60%) using MrSeeker's KoboldAI Holodeck model (40%) The goal of this model is to enhance story writing capabilities while preserving the desirable traits of the Mythomax model as much as possible (It does limit chat reply length).
CodeLlama-34b-Instruct-hfThis is the repository for the 34B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding.
Llama-2-7b-longlora-100k-ftAn efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost.
nsql-llama-2-7B-sharded-bf16-2GBIn this repository we are introducing a new member of NSQL, NSQL-Llama-2-7B. It's based on Meta's original Llama-2 7B model and further pre-trained on a dataset of general SQL queries and then fine-tuned on a dataset composed of text-to-SQL pairs.
Yarn-Llama-2-7b-64kNous-Yarn-Llama-2-7b-64k is a state-of-the-art language model for long context, further pretrained on long context data for 400 steps.
This model is the Flash Attention 2 patched version of the original model: https://huggingface.co/conceptofmind/Yarn-Llama-2-7b-64k
llama-2-70b-Guanaco-QLoRA-GPTQThis is a Llama-2 version of Guanaco. It was finetuned from the base Llama-70b model using the official training scripts found in the QLoRA repo.
llama-2-7b-int4-python-code-20kLlaMa-2 7b fine-tuned on the python_code_instructions_18k_alpaca Code instructions dataset by using the method QLoRA in 4-bit with PEFT library.
Nous-Hermes-Llama2-13bThis model stands out for its long responses, lower hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 4096 sequence length on an 8x a100 80GB DGX machine.
llama2-13b-orca-8k-3319This model is a fine-tuning of Meta's Llama2 13B model with 8K context size on a long-conversation variant of the Dolphin dataset (orca-chat).

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model.

In terms of capabilities, Falcon 180B achieves state-of-the-art results across natural language tasks. It tops the leaderboard for (pre-trained) open-access models and rivals proprietary models like PaLM-2. While difficult to rank definitively yet, it is considered on par with PaLM-2 Large, making Falcon 180B one of the most capable LLMs publicly known.

ModelModel nameExplanation
7BFalcon-7bFalcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
Falcon-7b-instructBased on Falcon-7B and finetuned on a mixture of chat/instruct datasets.
7B Quantized versiontiiuae-falcon-7b-instruct-ggufTiiuae-Falcon 7B instruct is the original instruction following Falcon model from Tiiuae, converted to gguf format.
Falcon-7b-instruct-GPTQThis repo contains an experimantal GPTQ 4bit model for Falcon-7B-Instruct.
It is the result of quantising to 4bit using AutoGPTQ.
Please note that performance with this GPTQ is currently very slow with AutoGPTQ.
WizardLM-Uncensored-Falcon-7B-GPTQThis repo contains an experimantal GPTQ 4bit model for Eric Hartford's WizardLM-Uncensored-Falcon-7B.
It is the result of quantising to 4bit using AutoGPTQ.
40BFalcon-40bFalcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora.
Falcon-40b-instructFalcon-40B-Instruct is based on Falcon-40B and finetuned on a mixture of Baize.
40B Quantized versionFalcon-40b-instruct-GPTQThis repo contains an experimantal GPTQ 4bit model for Falcon-40B-Instruct.
It is the result of quantising to 4bit using AutoGPTQ.
FalconLiteFalconLite is a quantized version of the Falcon 40B SFT OASST-TOP1 model, capable of processing long (i.e. 11K tokens) input sequences while consuming 4x less GPU memory.
Falcon-40b-gptqQuantized with GPTQ (on wikitext-2, 4bits, groupsize=128).
WizardLM-Uncensored-Falcon-40B-GPTQThis repo contains an experimental GPTQ 4bit model of Eric Hartford's WizardLM Uncensored Falcon 40B.
It is the result of quantising to 4bit using AutoGPTQ.
180BFalcon-180BFalcon-180B is a 180B parameters causal decoder-only model built by TII and trained on 3,500B tokens of RefinedWeb enhanced with curated corpora.
Falcon-180B-chatBased on Falcon-180B and finetuned on a mixture of Ultrachat, Platypus and Airoboros.
You will need at least 400GB of memory to swiftly run inference with Falcon-180B.
180B Quantized versionFalcon-180B-GGUFThis repo contains GGUF format model files for Technology Innovation Institute's Falcon 180B.
GGUF is a new format introduced by the llama.cpp team on August 21st 2023.
Falcon-180B-Chat-GPTQThis repo contains GPTQ model files for Technology Innovation Institute's Falcon 180B Chat.
Falcon-180B-Chat-GGUFThis repo contains GGUF format model files for Technology Innovation Institute's Falcon 180B Chat.
GGUF is a new format introduced by the llama.cpp team on August 21st 2023.
Other officials FalconFalcon-rw-1bFalcon-RW-1B is a 1B parameters causal decoder-only model built by TII and trained on 350B tokens of RefinedWeb.
This model is intended for use as a research artifact, to study the influence of training on web data alone.
Falcon-rw-7bFalcon-RW-7B is a 7B parameters causal decoder-only model built by TII and trained on 350B tokens of RefinedWeb.
This model is intended for use as a research artifact, to study the influence of training on web data alone.
Other officials quantized FalconFalcon-rw-1b-4bitGPTQ Algorithm with auto-gptq Integration.
This integration is intended for users who want to compress their transformer-based language models without significant performance loss.
ModelModel nameExplanation
7bh2ogpt-gm-oasst1-en-2048-falcon-7b-v3This model was trained using H2O LLM Studio, using falcon-7b as base model.
falcon-7b-instruct-shardedResharded version of https://huggingface.co/tiiuae/falcon-7b-instruct for low RAM enviroments (e.g. Colab, Kaggle) in safetensors.
falcon-7b-sft-top1-696This model is a fine-tuning of TII's Falcon 7B LLM. It was trained with 11,123 top-1 (high-quality) demonstrations of the OASST data set with 696 steps of checkpoint.
falcon-7b-sft-mix-2000This model is a fine-tuning of TII's Falcon 7B LLM. It was trained on a mixture of OASST top-2 threads, Dolly-15k and synthetic instruction datasets
gpt4all-falconAn chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories.
gorilla-falcon-7b-hf-v0Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla can write a semantically- and syntactically- correct API to invoke.
falcon-7b-openassistant-peftFalcon-7b-openassistant-peft is a chatbot model for dialogue generation. It was built by fine-tuning Falcon-7B on the OpenAssistant/oasst1 dataset. This repo only includes the LoRA adapters from fine-tuning with 🤗's peft package.
openbuddy-falcon-7b-v5-fp16Built upon Tii's Falcon model and Facebook's LLaMA model. OpenBuddy is a powerful open multilingual chatbot model aimed at global users, emphasizing conversational AI and seamless multilingual support for English, Chinese, and other languages.
Chinese-Falcon-7BThe Linly project team uses the Falcon model as a base to expand the Chinese word list, and uses parallel incremental pre-training of Chinese and Chinese-English to transfer the language capabilities of the model to Chinese to achieve Chinese-Falcon.
40bfalcon-40b-code-alpacaThis repo contains the full weights (16bit) for Falcon-40b fit on the Code Alpaca dataset.
falcon-40b-instruct-8bitThis model is the Falcon-40B-Instruct model quantized using bitsandbytes. This saves you around 40 GB of downloads.
falcon-40b-sft-top1-560This model is a fine-tuning of TII's Falcon 40B LLM. It was trained with top-1 (high-quality) demonstrations of the OASST data set with 560 steps of checkpoint.
tiiuae-falcon-40b-instruct-w4-g128-awqThis model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click here.
180bfalcon-180B-chat-asst-ds-loraFine-tuned version of Falcon-180B using PEFT LoRA + DeepSpeed ZeRO3 + Flash Attention + Activation Checkpointing.
Falcon-180B-Chat-GPTQThis repo contains GPTQ model files for Technology Innovation Institute's Falcon 180B Chat.