Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090 https://ift.tt/hmnDzHo

December 27, 2023

Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090 https://ift.tt/hmnDzHo

Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090 I was running into an issue with a vLLM bug that affected multiple GPUs and I needed a stand-in while that bug was getting fixed that used the same API format but had better performance than the API on text-generation-webui. It's very rough. I'm not a coder by trade. But it's very fast once you have many simultaneous connections. https://ift.tt/cxsTDKM December 27, 2023 at 01:22AM

Search This Blog

The_News📰

Show HN: Made a batching LLM API for a project. Mistral 200 tk/s on RTX 3090 https://ift.tt/hmnDzHo

Comments

Post a Comment

Popular Posts

Show HN: Resizer2 – i3/KDE window movement on Windows https://ift.tt/9C8eSjV

छत्रपती संभाजी महाराजांनी लिहीलेल्या "नखशिख" या ग्रंथामधिल केलेले श्री गणेशाचे वर्णन "