Technology & science

Four technical work packages connect resources, reasoning, correction and audience adaptation in one iterative pipeline.

WP2 M1–M15

UOC

Corpus, terminology and data augmentation

Builds parallel and comparable life-science corpora, improves TBXTools and creates terminology databases with morphological information.

500K segments for EN–ES/CA/ET · 100K for Irish · 2,500 terms per EN/ES/CA/ET · 500 Irish terms

Tasks
  • T2.1 Corpus compilation
  • T2.2 Enhancement of TBXTools
  • T2.3 Terminological databases
  • T2.4 Term-substitution augmentation
Deliverables
  • D2.1 Corpora · M6
  • D2.2 TBXTools · M7
  • D2.3 Terminology databases · M9
  • D2.4 Augmented corpora · M15
WP3 M1–M34

BSC

Terminology-aware machine translation

Compares Large Reasoning Models with instruction-tuned LLMs, using glossaries, self-reflection and document context to preserve specialist terms.

5 languages · ≥3 scientific genres · ≥20 BLEU · ≤0.4 TER

Tasks
  • T3.1 Domain datasets
  • T3.2 Reasoning capabilities
  • T3.3 Emerging terminology
  • T3.4 Comparative experiments
Deliverables
  • D3.1 Instruction and CoT data · M18
  • D3.2 LRM model · M24
  • D3.3 Benchmark results · M33
WP4 M4–M33

University of Surrey

Quality estimation and automatic post-editing

Detects, explains and corrects terminology errors. QE and APE are trained jointly and integrated with the translation model for continuous refinement.

≥80% F1 terminology-span detection · ≥15% human-rated terminology accuracy improvement

Tasks
  • T4.1 Baselines and seed data
  • T4.2 QE-adapted decoding
  • T4.3 Joint QE and APE
  • T4.4 API and integration
Deliverables
  • D4.1 QE-integrated LRM · M18
  • D4.2 QE-adapted decoding · M24
  • D4.3 Joint QE/APE · M33
  • D4.4 Integrated API · M33
WP5 M16–M28

University of Tartu

Post-translation text augmentation

Rewrites already translated content for expert and non-expert audiences through simplification, terminology support, summaries and explanations.

2 use cases · scientific dissemination and public outreach · ≥75% positive user feedback

Tasks
  • T5.1 Augmentation strategies
  • T5.2 Audience-aware implementation
Deliverables
  • D5.1 Augmentation framework · M19
  • D5.2 Implemented strategies · M28