Published: 2025-03-24

CIPHE—People's Interpretations Central in New Framework for Evaluating AI

NEWS How can we ensure that systems based on artificial intelligence (AI) perform tasks correctly? According to Anton Eklund, Department of Computer Science, Umeå University, humans must always be involved in the evaluation process. In his dissertation work, he has developed an evaluation framework to support organizations in these types of processes.

Text: Hanna Nordin

En man står på en bro med händerna i fickorna

Anton Eklund, Department of Computing Science.

ImageHanna Nordin

Should an article about pole vaulter Mondo Duplantis' world record be classified as sports, athletics, or pole vaulting? When Mondo Duplantis is mentioned twice—does this automatically make it a sports article? Most people would probably not call this a sports article, but an AI system can easily make that mistake, says Anton Eklund, industrial doctoral student at the Department of Computer Science, Umeå University.

Together with colleagues, he has therefore developed an evaluation framework called "Cluster Interpretation and Precision from Human Exploration" (CIPHE).

— Through CIPHE, we let people assess whether an AI system has grouped articles correctly or not. Participants in the assessment also characterize the articles based on human aspects such as emotional reaction or estimated societal impact.

Focus on Human Semantic Abilities for Quality

Anton explains that it is absolutely necessary to develop methods for evaluating AI systems so that they can be used with confidence in industry or as tools in the public sector.

— Continuous human involvement is needed somewhere in the chain, especially for tasks that lack definitive answers such as human perception, interpretation, or feeling, he says.

As an industrial doctoral student, Anton has been employed at the startup company Aeterna Labs. The company performs so-called contextual advertising, which means placing advertisements next to suitable articles based on their content. This differs from more conventional types of advertising where user data is analyzed and ads are presented based on previous preferences.

— To automatically categorize news into different subjects, I have used similar language models to those that ChatGPT is built on. Since the categorization is intended to be used for placing advertisements, the quality of the categories needs to be checked by humans before they can be sold to advertisers, says Anton.

Adaptable to Environment and Context

It is becoming increasingly common to evaluate AI systems using AI itself, but this also presents a challenge: there is often less insight into whether the system is doing the right thing from a human perspective. The quality of the evaluation is therefore not guaranteed, and it becomes more difficult to adjust and adapt the process. With the new framework, this is not a problem.

— In CIPHE, we can adjust what counts as approved categorizations, making it possible to tailor the framework for specific environments and contexts, says Anton.

About the Dissertation

On Thursday, April 3, Anton Eklund, Department of Computer Science, will defend his dissertation titled "Evaluation of Document Clusters through Human Interpretation." The defense will take place at 13:15 in UB.A.230 Lindellhallen 3.

Read the full dissertation.