In 2022 ChatGPT riveted the world's attention. Finally a system that we can actually speak with! This breakthrough kicked off numerous projects applying GPT and other large language models (LLMs) to a vast number of use cases. And this includes cases where relevant data is not part of the LLM's original training corpus. An immediate example of this would be questions about recent news, but what we focus on here is when the data is a dynamic set of facts, for example weather data, or the value of a stock portfolio, or the intricate details of a unfolding business deal or work process. In short, we are exploring the topic of LLMs meet databases. And this raises many questions: Will LLMs let us chat directly with our databases? Will such systems actually work? Will such systems exploit the common sense knowledge latent in the LLM? How do we mitigate the problem of hallucination? Can LLMs define and populate databases on the fly and would that be useful? Can LLMs help existing databases better integrate? Should systems operate with autonomy or serve as assistants? The questions go on and on, but the overarching question is, how are we really going to get value out of LLMs in the data management context? While we do not yet have comprehensive answers, ultimately we will and this will likely have a profound impact on our economy and society.
The course gives technical students and IT practitioners the opportunity to take a deep dive into the transformer networks that back LLMs and explore their application to long standing problems in data management. It will be a fun course given in an open, playful and inquisitive spirit. We will all learn a lot.