Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/fF7pkqr

February 26, 2024

Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/fF7pkqr

Show HN: Continuous-eval – Granular evaluation of GenAI pipelines Hi HN - we are the creators of “continuous-eval”, an open-source tool to test and evaluate generative AI apps. "Continuous-eval" came from our efforts to measure, validate and improve the reliability of a finance AI copilot we were developing for banks. End-to-end evaluation was not enough for us. We wanted to have granular evaluations that help pinpoint the bottlenecks and identify what / how to improve. We’ve since developed more metrics and made the framework more flexible so it can evaluate components like agent tool use, code change, retrieval steps, etc. Let us know what you think of our approach to GenAI App evaluation. https://ift.tt/5tA8OcK February 26, 2024 at 12:11AM

Search This Blog

The_News📰

Show HN: Continuous-eval – Granular evaluation of GenAI pipelines https://ift.tt/fF7pkqr

Comments

Post a Comment

Popular Posts

Show HN: Resizer2 – i3/KDE window movement on Windows https://ift.tt/9C8eSjV

छत्रपती संभाजी महाराजांनी लिहीलेल्या "नखशिख" या ग्रंथामधिल केलेले श्री गणेशाचे वर्णन "