Why Is It Hard To Evaluate GenAI Applications?

TL;DR If you don’t have time to read the whole article, the following four takeaways are a concise version. You can navigate to the corresponding section in the article for details. Lack of framework: A GenAI application is not a GenAI foundation model; different frameworks are required to evaluate them. There may be a lack of clarity on the difference between the two tasks. Unstructured data: The unstructured output of a GenAI application makes evaluation more difficult than a traditional ML system. Foundation model unpredictability: GenAI foundation model usually introduces extra unpredictability into the evaluation process. Longer and more costly iteration: GenAI application evaluation is expensive and time consuming, because building evaluation dataset and running tests on GenAI application require more resources. Introduction I have spent the last two and a half years listening to what businesses want from GenAI, building GenAI applications, and delivering value from the applications. It has been an interesting journey, as I realized the advent of ChatGPT constitutes a paradigm shift for ML/AI practitioners like me. I started to believe that GenAI would change our lives, similar to personal computers in the 90s or the modern search engine in the 2000s. ...

a day scene of Hong Kong in pixel

Rebuild My Website With GenAI's Assistance

Introduction When I was in graduate school, I set up a personal blog to showcase my project and share thoughts. I planned to keep developing that site but it has since then taken a backseat while I became busy with work. I have got more time lately and decided to pick up this project. It turned out to be so much fun, and I want to share how I have re-built this website with GenAI tools. If you are only interested in the part related to GenAI, check out the GenAI coding tools section. ...