Inference support for T5 and FLAN-T5 model families (#5763)

* llama : add inference support and model types for T5 and FLAN-T5 model families

* llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token()

* common, llama-cli, llama-batched : add support for encoder-decoder models

* convert-hf : handle shared token embeddings tensors in T5Model

* convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models)

* convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model

* convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5

---------

Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

This commit is contained in:

fairydreaming

2024-07-04 15:46:11 +02:00

• committed by

GitHub

parent f8c4c0738d

commit 807b0c49ff

No known key found for this signature in database

GPG key ID: B5690EEEBB952194

33 changed files with 946 additions and 31 deletions

1

models/ggml-vocab-qwen2.gguf.out

View file

 @ -31,6 +31,7 @@
 284
 11385
 11 379 64848 0 2585 525 498 26525 223 937 104100 18493 22377 99257 16 18 16 19 16 20 16 35727 21216
 2928
 
 18
 18 18

Rows
Columns

Inference support for T5 and FLAN-T5 model families (#5763)

1 models/ggml-vocab-qwen2.gguf.out Unescape Escape View file

1

models/ggml-vocab-qwen2.gguf.out

View file