Inference support for T5 and FLAN-T5 model families (#5763)

* llama : add inference support and model types for T5 and FLAN-T5 model families * llama : add new API functions to support encoder-decoder models: llama_encode(), llama_model_has_encoder(), llama_model_decoder_start_token() * common, llama-cli, llama-batched : add support for encoder-decoder models * convert-hf : handle shared token embeddings tensors in T5Model * convert-hf : add support for SentencePiece BPE tokenizer in T5Model (for Pile-T5 models) * convert-hf : add MT5ForConditionalGeneration and UMT5ForConditionalGeneration to architectures supported by T5Model * convert : add t5 tokenizer tests, use "slow" HF tokenizer for t5 --------- Co-authored-by: Stanisław Szymczyk <sszymczy@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2024-07-04 15:46:11 +02:00 · 2024-07-04 15:46:11 +02:00 · 807b0c49ff
commit 807b0c49ff
parent f8c4c0738d
33 changed files with 946 additions and 31 deletions
--- a/models/ggml-vocab-gpt-2.gguf.out
+++ b/models/ggml-vocab-gpt-2.gguf.out
@ -31,6 +31,7 @@
 198 796
 6 6980
 15496 11 331 6 439 0 1374 389 345 30325 223 5633 22755 239 46349 111 28839 101 18040 32432 98 43291 1485 1415 24309 25465 171 121 252
+ 13896 3228
 18
 2091
 20370