Text-to-text

text-to-text

/001/Upload/592/relpic/-1/592/59367222-1397-45d3-a708-e12af0dc4d1b.jpg

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Model size	LLaMA model	Explanation
7B	LLaMA 7b-hf	LLaMA-7B converted to work well with Transformers/HuggingFace.
	Llama-2-7B-32K-Instruct	An open-source, long-context chat model finetuned from Llama-2-7B-32K, over high-quality instruction and chat data, using Together API
	Llama-2-7b	Original base model.
	Llama-2-7b-hf	This is the repository for the 7B pretrained model, converted for the Hugging Face Transformers format.
	Llama-2-7b-chat	The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align to human preferences for helpfulness and safety.
	Llama-2-7b-chat-hf	This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
7B Quantized version	LLaMA-7b-4bit	It's a 4-bit quantized version of the model.
	Llama-2-7B-GGUF	GGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
	LLaMA-7b-hf-int4	It has been converted to int4 via GPTQ method and requires some special support code that is also highly experimental. Also converted to work with Transformers/HuggingFace.
	Llama-2-7B-GPTQ	Multiple GPTQ parameter permutations are provided; This version provide details of the options, their parameters, and the software used to create them.
	Llama-2-7B-GGML	GGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models.
	Llama-2-7b-Chat-GPTQ	Multiple GPTQ parameter permutations are provided; This version provide details of the options, their parameters, and the software used to create them.
	Llama-2-7B-Chat-GGML	GGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models.
13B	LLaMA -13b	Original LLaMA-13b model
	LLaMA-13b-hf	LLaMA-13B converted to work with Transformers/HuggingFace.
	Llama-2-13b	Original Llama-2-13b model
	Llama-2-13b-hf	This is the repository for the 13B pretrained model, converted for the Hugging Face Transformers format.
	Llama-2-13b-chat	Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety.
	Llama-2-13b-chat-hf	This is the repository for the 13B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
13B Quantized version	LLaMa-13B-GGML	GGML files are for CPU + GPU inference. The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models.
	llama-13b-hf-int4	This has been converted to int4 via GPTQ method. This requires some special support code that is also highly experimental. NOT COMPATIBLE WITH TRANSFORMERS LIBRARY.
	Llama-2-13B-GPTQ	Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
	Llama-2-13B-GGML	The GGML format has now been superseded by GGUF. As of August 21st 2023, llama.cpp no longer supports GGML models. Third party clients and libraries are expected to still support it for a time, but many may also drop support.
	Llama-2-13B-chat-GGUF	GGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
	Llama-2-13B-chat-GPTQ	Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
65B/70B	LLaMA-65b	Original LLaMA-65b model.
	LLaMA-65b-hf	LLaMA-65B converted to work with Transformers/HuggingFace.
	Llama-2-70b	Original Llama-2-70b model.
	Llama-2-70b-hf	This is the repository for the 70B pretrained model, converted for the Hugging Face Transformers format.
	Llama-2-70b-chat	Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety.
	Llama-2-70b-chat-hf	This is the repository for the 70B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.
65B/70B Quantized version	LLaMA-65B-GGUF	GGUF is a new format. It is a replacement for GGML. GGUF offers numerous advantages over GGML, such as better tokenisation, and support for special tokens. It is also supports metadata, and is designed to be extensible.
	LLaMA-65b-hf-int4	This has been converted to int4 via GPTQ method. This requires some special support code that is also highly experimental. NOT COMPATIBLE WITH TRANSFORMERS LIBRARY.
	Llama-2-70B-GPTQ	Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
	Llama-2-70b-hf-onnx-int4	This is the repository of INT4 weight only quantization for the 70B fine-tuned model in ONNX format, powered by Intel® Neural Compressor and Intel® Extension for Transformers.
	Llama-2-70B-chat-GPTQ	Multiple quantisation parameters are provided, to allow you to choose the best one for your hardware and requirements.
	Llama-2-70b-chat-hf-onnx-int4	This is the repository of INT4 weight only quantization for the 70B fine-tuned model in ONNX format, powered by Intel® Neural Compressor and Intel® Extension for Transformers.

Demo for Llama-2 13B Chat: https://huggingface.co/spaces/huggingface-projects/llama-2-13b-chat

HuggingFace- other useful Llama model available for you:

Original Model	Model name	Explanation
LLaMA	lawyer-llama-13b-beta1.0	It is based on Chinese-LLaMA-13B, has not undergone legal corpus continual training, uses general instruction and legal instruction for SFT, and is equipped with a marriage-related legal retrieval module.
	chinese-alpaca-plus-13b-hf	Compared with the basic version, the training data has been further expanded. LLaMA has been expanded to 120G text, Alpaca has been expanded to 4.3M command data, and data in the scientific field has been added. This model is decapoda-research/llama-13b-hf as the base model. It merges the two LoRA weights of ziqingyang/chinese-llama-plus-lora-13b and ziqingyang/chinese-alpaca-plus-lora-13b and converts them into HuggingFace version weights.
	Wizard-Vicuna-30B-Uncensored-GPTQ	This is wizard-vicuna-13b trained with a subset of the dataset - responses that contained alignment / moralizing were removed. Multiple GPTQ parameter permutations are provided.
	LLaVA-Lightning-MPT-7B-preview	LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna/MPT on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Demo: https://llava-vl.github.io/
	beaver-7b-v1.0-cost	The Beaver Cost model is a preference model trained using the PKU-SafeRLHF dataset. Fine-tuned from model: LLaMA, Alpaca.
	alpaca-native	This is a replica of Alpaca by Stanford' tatsu. Trained using the original instructions with a minor modification in FSDP mode
Llama2	Taiwan-LLaMa-v1.0	Taiwan-LLaMa is a full parameter fine-tuned model based on LLaMa 2 for Traditional Mandarin applications. This model pretrained on over 5 billion tokens and instruction-tuned on over 490k conversations both in traditional mandarin.
	YuLan-Chat-2-13b-fp16	It can support 8k Chinese context now, and it was developed by the researchers in GSAI, Renmin University of China.
	Chinese-LLaMA-2-13B-hf	The Chinese basic model is based on LLaMA and Falcon, and uses Chinese and Chinese-English parallel corpora for incremental pre-training to extend its language capabilities in English to Chinese.
	ORCA_LLaMA_70B_QLoRA	This model uses three datasets including: Dolphin, Open-Platypus, and OpenOrca.
	LLaMA2-13B-Holomax-GPTQ	This is an expansion merge to the well praised Mythomax model from Gryphe (60%) using MrSeeker's KoboldAI Holodeck model (40%) The goal of this model is to enhance story writing capabilities while preserving the desirable traits of the Mythomax model as much as possible (It does limit chat reply length).
	CodeLlama-34b-Instruct-hf	This is the repository for the 34B instruct-tuned version in the Hugging Face Transformers format. This model is designed for general code synthesis and understanding.
	Llama-2-7b-longlora-100k-ft	An efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost.
	nsql-llama-2-7B-sharded-bf16-2GB	In this repository we are introducing a new member of NSQL, NSQL-Llama-2-7B. It's based on Meta's original Llama-2 7B model and further pre-trained on a dataset of general SQL queries and then fine-tuned on a dataset composed of text-to-SQL pairs.
	Yarn-Llama-2-7b-64k	Nous-Yarn-Llama-2-7b-64k is a state-of-the-art language model for long context, further pretrained on long context data for 400 steps. This model is the Flash Attention 2 patched version of the original model: https://huggingface.co/conceptofmind/Yarn-Llama-2-7b-64k
	llama-2-70b-Guanaco-QLoRA-GPTQ	This is a Llama-2 version of Guanaco. It was finetuned from the base Llama-70b model using the official training scripts found in the QLoRA repo.
	llama-2-7b-int4-python-code-20k	LlaMa-2 7b fine-tuned on the python_code_instructions_18k_alpaca Code instructions dataset by using the method QLoRA in 4-bit with PEFT library.
	Nous-Hermes-Llama2-13b	This model stands out for its long responses, lower hallucination rate, and absence of OpenAI censorship mechanisms. The fine-tuning process was performed with a 4096 sequence length on an 8x a100 80GB DGX machine.
	llama2-13b-orca-8k-3319	This model is a fine-tuning of Meta's Llama2 13B model with 8K context size on a long-conversation variant of the Dolphin dataset (orca-chat).

Note: The ones with yellow background are all Chinese models.

/001/Upload/592/relpic/-1/592/e66b0288-626d-45c5-94b5-ae56224b8a9a.jpg

Source: https://huggingface.co/blog/falcon-180b?fbclid=IwAR2msC__e9Sa0pCQNKrlcjWOq0nbffLBr_FH0ta_RF6zkWAHfMgZ9fslFfM

Falcon family

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model.

In terms of capabilities, Falcon 180B achieves state-of-the-art results across natural language tasks. It tops the leaderboard for (pre-trained) open-access models and rivals proprietary models like PaLM-2. While difficult to rank definitively yet, it is considered on par with PaLM-2 Large, making Falcon 180B one of the most capable LLMs publicly known.

Model	Model name	Explanation
7B	Falcon-7b	Falcon-7B is a 7B parameters causal decoder-only model built by TII and trained on 1,500B tokens of RefinedWeb enhanced with curated corpora.
7B	Falcon-7b-instruct	Based on Falcon-7B and finetuned on a mixture of chat/instruct datasets.
7B Quantized version	tiiuae-falcon-7b-instruct-gguf	Tiiuae-Falcon 7B instruct is the original instruction following Falcon model from Tiiuae, converted to gguf format.
	Falcon-7b-instruct-GPTQ	This repo contains an experimantal GPTQ 4bit model for Falcon-7B-Instruct. It is the result of quantising to 4bit using AutoGPTQ. Please note that performance with this GPTQ is currently very slow with AutoGPTQ.
	WizardLM-Uncensored-Falcon-7B-GPTQ	This repo contains an experimantal GPTQ 4bit model for Eric Hartford's WizardLM-Uncensored-Falcon-7B. It is the result of quantising to 4bit using AutoGPTQ.
40B	Falcon-40b	Falcon-40B is a 40B parameters causal decoder-only model built by TII and trained on 1,000B tokens of RefinedWeb enhanced with curated corpora.
40B	Falcon-40b-instruct	Falcon-40B-Instruct is based on Falcon-40B and finetuned on a mixture of Baize.
40B Quantized version	Falcon-40b-instruct-GPTQ	This repo contains an experimantal GPTQ 4bit model for Falcon-40B-Instruct. It is the result of quantising to 4bit using AutoGPTQ.
	FalconLite	FalconLite is a quantized version of the Falcon 40B SFT OASST-TOP1 model, capable of processing long (i.e. 11K tokens) input sequences while consuming 4x less GPU memory.
	Falcon-40b-gptq	Quantized with GPTQ (on wikitext-2, 4bits, groupsize=128).
	WizardLM-Uncensored-Falcon-40B-GPTQ	This repo contains an experimental GPTQ 4bit model of Eric Hartford's WizardLM Uncensored Falcon 40B. It is the result of quantising to 4bit using AutoGPTQ.
180B	Falcon-180B	Falcon-180B is a 180B parameters causal decoder-only model built by TII and trained on 3,500B tokens of RefinedWeb enhanced with curated corpora.
180B	Falcon-180B-chat	Based on Falcon-180B and finetuned on a mixture of Ultrachat, Platypus and Airoboros. You will need at least 400GB of memory to swiftly run inference with Falcon-180B.
180B Quantized version	Falcon-180B-GGUF	This repo contains GGUF format model files for Technology Innovation Institute's Falcon 180B. GGUF is a new format introduced by the llama.cpp team on August 21st 2023.
	Falcon-180B-Chat-GPTQ	This repo contains GPTQ model files for Technology Innovation Institute's Falcon 180B Chat.
	Falcon-180B-Chat-GGUF	This repo contains GGUF format model files for Technology Innovation Institute's Falcon 180B Chat. GGUF is a new format introduced by the llama.cpp team on August 21st 2023.
Other officials Falcon	Falcon-rw-1b	Falcon-RW-1B is a 1B parameters causal decoder-only model built by TII and trained on 350B tokens of RefinedWeb. This model is intended for use as a research artifact, to study the influence of training on web data alone.
Other officials Falcon	Falcon-rw-7b	Falcon-RW-7B is a 7B parameters causal decoder-only model built by TII and trained on 350B tokens of RefinedWeb. This model is intended for use as a research artifact, to study the influence of training on web data alone.
Other officials quantized Falcon	Falcon-rw-1b-4bit	GPTQ Algorithm with auto-gptq Integration. This integration is intended for users who want to compress their transformer-based language models without significant performance loss.

Demo for Falcon-180B: https://huggingface.co/spaces/ysharma/falcon-180b-demo

HuggingFace- other useful Falcon model available for you:

Model	Model name	Explanation
7b	h2ogpt-gm-oasst1-en-2048-falcon-7b-v3	This model was trained using H2O LLM Studio, using falcon-7b as base model.
	falcon-7b-instruct-sharded	Resharded version of https://huggingface.co/tiiuae/falcon-7b-instruct for low RAM enviroments (e.g. Colab, Kaggle) in safetensors.
	falcon-7b-sft-top1-696	This model is a fine-tuning of TII's Falcon 7B LLM. It was trained with 11,123 top-1 (high-quality) demonstrations of the OASST data set with 696 steps of checkpoint.
	falcon-7b-sft-mix-2000	This model is a fine-tuning of TII's Falcon 7B LLM. It was trained on a mixture of OASST top-2 threads, Dolly-15k and synthetic instruction datasets
	gpt4all-falcon	An chatbot trained over a massive curated corpus of assistant interactions including word problems, multi-turn dialogue, code, poems, songs, and stories.
	gorilla-falcon-7b-hf-v0	Gorilla enables LLMs to use tools by invoking APIs. Given a natural language query, Gorilla can write a semantically- and syntactically- correct API to invoke.
	falcon-7b-openassistant-peft	Falcon-7b-openassistant-peft is a chatbot model for dialogue generation. It was built by fine-tuning Falcon-7B on the OpenAssistant/oasst1 dataset. This repo only includes the LoRA adapters from fine-tuning with 🤗's peft package.
	openbuddy-falcon-7b-v5-fp16	Built upon Tii's Falcon model and Facebook's LLaMA model. OpenBuddy is a powerful open multilingual chatbot model aimed at global users, emphasizing conversational AI and seamless multilingual support for English, Chinese, and other languages.
	Chinese-Falcon-7B	The Linly project team uses the Falcon model as a base to expand the Chinese word list, and uses parallel incremental pre-training of Chinese and Chinese-English to transfer the language capabilities of the model to Chinese to achieve Chinese-Falcon.
40b	falcon-40b-code-alpaca	This repo contains the full weights (16bit) for Falcon-40b fit on the Code Alpaca dataset.
	falcon-40b-instruct-8bit	This model is the Falcon-40B-Instruct model quantized using bitsandbytes. This saves you around 40 GB of downloads.
	falcon-40b-sft-top1-560	This model is a fine-tuning of TII's Falcon 40B LLM. It was trained with top-1 (high-quality) demonstrations of the OASST data set with 560 steps of checkpoint.
	tiiuae-falcon-40b-instruct-w4-g128-awq	This model is a 4-bit 128 group size AWQ quantized model. For more information about AWQ quantization, please click here.
180b	falcon-180B-chat-asst-ds-lora	Fine-tuned version of Falcon-180B using PEFT LoRA + DeepSpeed ZeRO3 + Flash Attention + Activation Checkpointing.
180b	Falcon-180B-Chat-GPTQ	This repo contains GPTQ model files for Technology Innovation Institute's Falcon 180B Chat.

抱抱臉(Hugging Face)應用生成式AI於教學場景

Text-to-text

text-to-text