A Dutch language assistant you can trust

SIDN Fund supports pilot project by Onze Taal

More and more people nowadays turn to AI for help with language-related questions. ChatGPT, Google AI, Claude: there are plenty of AI assistants about, and their advice sounds persuasive. But is it correct? By no means always, according to Onze Taal, a foundation with 18,000 members and a professional Dutch language advice service. With support from SIDN Fund, the foundation’s CEO Vibeke Roeper, and Fieneke Jochemsen, Language Advice Service Coordinator, ran a pilot project to see whether things could be improved. Could they build a reliable AI assistant for the Dutch language, drawing on Onze Taal’s own expertly curated knowledge bank?

From noise to question

Fieneke Jochemsen, coordinator of the language advisory service of the Onze Taal (Our Language) Assocation
Fieneke Jochemsen, coordinator of the language advisory service of the Onze Taal (Our Language) Assocation

Onze Taal’s Dutch language advice service has been active for many years. People can contact Onze Taal by phone, chat platform or website for answers to questions about spelling, grammar and style. In recent years, however, Fieneke has noticed a change. “Increasingly, we find ourselves refuting AI-generated advice. People call and ask things like, my AI assistant says this, but is it right? While that’s a source of frustration, it has also made us curious.”

After all, AI’s ability to produce a seemingly fluent answer to a language question is itself remarkable. And, if you could combine the power of AI with Onze Taal’s authoritative databank of 60,000 questions and answers, what would you get? That was the question that the project set out to explore.

How does OT’s language assistant work?

For the pilot, Onze Taal teamed up with the Dutch Language Institute to develop a prototype that works on the basis of RAG – retrieval augmented generation. Fieneke explains: “You link a generic language model to a large database of reliable sources. Then, when prompted by a user, the model looks for the most relevant items in the database and uses them to formulate an answer.” In the case of the language assistant, the database consists of previous language-related questions and the answers to them provided by Onze Taal.

In essence, it’s a simple idea: instead of letting AI do its own thing, you prime it with source material that’s been vetted by professionals. “We thought, we’ve got an amazing database. How hard can it be?” adds Vibeke with a smile. Harder than expected, it turned out!

What the pilot showed

The language assistant is better than ChatGPT or Google AI, the pilot concluded. However, that encouraging bottom line came with caveats. “Sometimes,” says Fieneke, “an answer would begin well, with material that we recognised from our own database, but then go off at a strange tangent at the end. Or information would be included that, while technically correct, simply wasn’t relevant to the question.”

The nature of language advice itself also presented challenges. Language isn’t like maths. “Sometimes, there’s a formal rule, but it’s disregarded in certain circumstances. Or a pragmatic solution may be more appropriate in a particular context,” Fieneke explains. A language advisor is constantly reasoning, making comparisons, drawing conclusions based on multiple sources and taking the question’s context into account. The ability to do that is difficult to automate.

Example of how the language assistant works

Question: What’s the origin of the Dutch expression ‘ik ken je van haver tot gort’ (‘I know you inside out’)?

The language assistant’s answer: The expression – which literally means ‘I know you from oats to pearl barley’ – dates back to the time when oats and pearl barley were common grain products in the Netherlands. As well as being fed to horses, oats were often eaten by people. Pearl barley is barley with the outer skin, or ‘husk’ removed. It used to be an everyday foodstuff.

Used together, as in the expression, the two terms suggest a wide range, like saying ‘from A to Z’.

People would have understood that suggestion more easily than we do now, because oats and pearl barley would have been very familiar to them in the home and in agriculture. The expression is therefore a metaphor for knowing everything about a person.

Although that answer is very persuasive, it’s completely untrue. Originally, the expression had nothing to do with oats and pearl barley. The expression began as ‘van avere tot avere’, an old Dutch phrase meaning ‘from forefather to forefather,’ or ‘from father to son’. ‘Aver’ is an archaic Dutch word meaning ‘kinsman’, which fell out of use long ago. Because people no longer knew what the words meant, ‘van avere tot avere’ became corrupted, first to ‘van haver tot haver’, then to variants such as ‘van haver tot klaver’ (‘from oats to clover’) and ultimately to ‘van haver tot gort’.

The wrong answer was given by the language assistant, even though the correct answer is readily available online: Read more about ‘van haver tot gort’ here.

Persuasive, but not always reliable

Vibeke Roeper, director of the Onze Taal (Our Language) Assocation
Vibeke Roeper, director of the Onze Taal (Our Language) Assocation

One of the pilot’s most striking findings didn’t relate to the quality of the answers, but to how people rated them. The researchers had the AI-generated responses assessed, both by professional language advisors and by external assessors with language expertise but no experience working for Onze Taal as language advisors. The external assessors often thought that answers provided by the generic model were better than those given by the language assistant, even if the generic answers were less correct.

“We think that’s down to how persuasive an answer sounds,” says Fieneke. “AI is good at saying things clearly and confidently. People aren’t good at deciding whether an answer is reliable on the basis of its content alone.” The same thing was observed with Google AI: some answers that weren’t actually correct included references to reliable sources such as onzetaal.nl. “It’s almost impossible for the average internet user to know that a seemingly reliable answer, complete with citations, is actually wrong.”

Such complications make it very difficult to come up with a reliable language assistant. A model that always gives an answer seems helpful, but can’t be relied upon to be right. While a model that answers only if it’s sure inevitably leaves users disappointed. “It’s a real challenge striking a balance,” says Fieneke. “We also tried training the model to say when there was no direct answer to the user’s question in the sources. But then almost every answer began with, ‘Your question isn’t explicitly answered in the source material.’ And that’s not really what users want either.”

Support from SIDN Fund

Jet Veldhuis, Project Coordinator SIDN Fund
Jet Veldhuis, Project Coordinator SIDN Fund

SIDN Fund made the pilot possible through its pioneering projects programme, which is designed to assist exploratory work. After the project got started, it became clear that additional funding was needed to bring the Dutch Language Institute onboard as a technical partner. Fortunately, the Taalunie agreed to step in with assistance. “SIDN Fund made the project possible, and the Taalunie was able give additional support,” says Vibeke.

Practical input was also obtained from SIDN Fund’s network. “We attended a meeting devoted entirely to AI. It was really good to talk to people working on so many different aspects of AI and the opportunities and challenges it poses. We learnt a lot about some of the basics: things to look out for and the privacy implications. And, through the voucher scheme, we were able to engage an external expert.”

Jet Veldhuis, Project Coordinator at SIDN Fund: “Reliability is vital when it comes to getting AI to answer language questions, and the Onze Taal pilot has pointed the way to achieving reliability in practice. The development of a language assistant that uses Onze Taal’s own expertly curated knowledge bank represents an important step towards reliable AI. At the same time, the project has shown how complicated it is to integrate human expertise into AI. Our grant helped to make this exploratory work possible, and has therefore contributed to the development of reliable and responsible AI applications.”

What’s next?

The pilot has been completed and the reports have been written. Onze Taal is now in discussion with partners, including the Taalunie and Flemish counterparts, about how to interpret the results and what the next step should be. “We’re also considering who should take this forward,” says Vibeke.

Another open question is who the language assistant should ultimately be designed to help. Vibeke believes it has potential as a professional tool, similar to the AI search assistants used by lawyers. “A professional uses a tool like that to identify relevant cases and get a feel for the landscape, as a starting point for their own research.” A tool for senior school pupils or the general public would need to do more, because those user groups require explanatory information about what they’ve got in front of them, what the tool can and can’t do, and how answers should be interpreted.

It’s clear that the pilot has yielded a great deal, if only in the form of a list of new questions. “We now know what we don’t know,” says Fieneke. “And that is extremely instructive and useful.”

Read more about the project at https://www.sidnfonds.nl/projecten/de-taalassistent.