Inference support for T5 and FLAN-T5 model families (#5763)

* llama : add inference support and model types for T5 and FLAN-T5 model families

* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()

* common, llama-cli, llama-batched : add support for encoder-decoder models

* convert-hf : handle shared token embeddings tensors in T5Model

* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)

* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model

* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

This commit is contained in:

fairydreaming

2024-07-04 15:46:11 +02:00

• committed by

GitHub

parent f8c4c0738d

commit 807b0c49ff

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

33 changed files with 946 additions and 31 deletions

1

models/ggml-vocab-falcon.gguf.out

View file

 @ -31,6 +31,7 @@
 40
 4932
 23 291 18 436 12 1265 362 299 8196 207 204 42 50087 123 2727 20300 32022 133 234 17419 30137 28 7858 181 133 236

Rows
Columns

Inference support for T5 and FLAN-T5 model families (#5763)

1 models/ggml-vocab-falcon.gguf.out Unescape Escape View file

1

models/ggml-vocab-falcon.gguf.out

View file