UOC
Corpus, terminology and data augmentation
Builds parallel and comparable life-science corpora, improves TBXTools and creates terminology databases with morphological information.
500K segments for EN–ES/CA/ET · 100K for Irish · 2,500 terms per EN/ES/CA/ET · 500 Irish terms
Tasks
- T2.1 Corpus compilation
- T2.2 Enhancement of TBXTools
- T2.3 Terminological databases
- T2.4 Term-substitution augmentation
Deliverables
- D2.1 Corpora · M6
- D2.2 TBXTools · M7
- D2.3 Terminology databases · M9
- D2.4 Augmented corpora · M15