Microsoft-Backed SaqWI Project Boosts Maltese Language on AI Platforms
A new initiative, SaqWI-QA/SaqWI, will expand Maltese language capability across AI platforms with $51,000 in funding, including $15,000 in Azure Credits via Microsoft's Lingua scheme. The announcement followed a meeting between Culture Minister Owen Bonnici and Microsoft General Manager for Malta, Greece and Cyprus Yana Andronopoulou.
Bonnici called the project an important step for Maltese in the digital sphere, noting continued collaboration between public and private entities to keep the language visible and useful for future generations.
What the project will build
The team will use Public Broadcasting Services (PBS) archives to create an open dataset built from local broadcasts-news, documentaries, and cultural shows. The target output includes at least 5,000 question-answer pairs and 100-150 hours of aligned transcription snippets to ensure cultural and historical relevance for Maltese users.
Transcription will be powered by Azure AI services, with human annotators refining outputs for cultural and linguistic accuracy. The dataset will be released under open licences for long-term sustainability, enabling new educational tools, audits of automated speech recognition (ASR) quality, and research on underrepresented languages.
Why it matters for developers
- Benchmark Maltese ASR performance and iterate models with clearer feedback loops (e.g., WER/CER and domain-specific error analysis).
- Build retrieval and Q&A systems grounded in local context using the curated question-answer sets.
- Stress-test models on real broadcast conditions: mixed speakers, varied intonation, and code-switching common in Malta.
- Accelerate inclusive subtitling and accessibility features for hearing-impaired users, older audiences, language learners, and the general public.
TVM pilot: real broadcast constraints, real gains
In partnership with TVM and Microsoft, the project will pilot ASR improvements on the 6pm news bulletin. The work targets code-switching, spelling variance, and intonation differences-pain points that typically degrade transcription accuracy and knock downstream search, summarization, and captioning pipelines off course.
Tooling and access
Azure AI services will drive the initial transcription workflow before human refinement. If you're standardizing your stack, review the capabilities of Azure's Speech services for batch and streaming workflows: Azure Speech to Text.
With open licensing planned, teams can fold the dataset into training or evaluation pipelines, set up continuous benchmarking jobs, and contribute improvements back to the community via shared error analyses and model configs.
Practical next steps for engineering teams
- Prepare ingestion pipelines for aligned audio-text snippets and Q&A pairs; define clear dataset versioning and provenance tags.
- Stand up evaluation harnesses (WER/CER, per-domain breakdowns) to compare ASR baselines against fine-tuned variants.
- Prototype Maltese-focused RAG/Q&A services using the released pairs; test domain drift against live news segments.
- Plan for code-switching: configure lexicons, punctuation rules, and post-processing to normalize mixed-language outputs.
For developers working on ASR and transcription, see Speech-To-Text for practical guides and tools. If you're building on Azure, you may also find Microsoft AI Courses helpful for speeding up deployment.
Your membership also unlocks: