
NLP Summit, Zürich 2024

Whiteboard session, MIT

Fieldwork, Oaxaca 2023
Cross-lab workshop, Berlin

Reading room, Cambridge

Network viz lab, Stanford
Associate Professor · MIT CSAIL · Open-Source Advocate
Dr. Mara Voss
Computational Linguist · Network Scientist · Open-Source Advocate
A career built at the intersections.
Disciplines & Methods
Institutional Affiliations
The body of work.
Cross-lingual Dependency Parsing via Latent Graph Alignment
We present a novel alignment mechanism that projects dependency structures across 47 language pairs using shared latent space representations, outperforming fine-tuned mBERT by 8.4 LAS points on the Universal Dependencies benchmark.
Co-Authors
Network Topology of Citation Graphs in Endangered Language Research
Applying small-world network analysis to 23 years of endangered-language citations reveals a hub-and-spoke topology concentrated in six institutions — with implications for knowledge equity and funding allocation.
Discourse Coherence in Low-Resource Languages: A Corpus Study
Using a newly annotated 2.3M-token corpus of Wolof and Bambara, we demonstrate that discourse coherence models trained on high-resource proxies degrade by 31% without language-specific bridging features.
Co-Authors
Open Treebank Infrastructure for Collaborative Annotation
LingHub — a GitHub-integrated treebank platform now supporting 94 languages — reduces annotation onboarding time from 6 weeks to 3 days while maintaining inter-annotator agreement above 0.87 Cohen's κ.
Co-Authors
Multilingual Coreference Resolution with Sparse Attention
A sparse-attention architecture trained jointly on 18 languages achieves state-of-the-art coreference F1 on OntoNotes while reducing compute by 40%, enabling deployment on resource-constrained research infrastructure.
Click any paper to expand abstract and co-author network nodes.
Across disciplines, across borders.
14 countries · 6 continents · 31 active co-investigators
Prof. Kenji Watanabe
Professor of Computational Linguistics
Morphologically rich languages, parsing

Dr. Leila Ahmadi
Assistant Professor, Linguistics
Network analysis, citation studies

Dr. Fatou Mbaye
Research Fellow, African Languages
Low-resource NLP, Wolof corpus

Dr. Priya Krishnamurthy
Associate Professor, CS
Coreference resolution, Indic languages

Prof. Elena Marchetti
Chair, Digital Humanities
Open annotation tools, treebanks
Prof. Lars Bäckström
Professor, Machine Learning
Sparse attention, multilingual models
A record of sustained investment.
Click any item to unfold the full record.
Funder
NSF Linguistics Program
Period
2025–2027
Total Award
$480,000
Key Outcomes
- Annotated corpus of 5M tokens across 8 under-resourced languages
- Open-source discourse parser achieving 0.79 F1 on PDTB-style relations
- 3 doctoral students funded through full completion
Funder
Harvard Radcliffe Institute
Period
2024
Key Outcomes
- Completed manuscript: 'The Grammar of Networks' (MIT Press, forthcoming)
- Organized cross-disciplinary working group on language and graph theory
Funder
Mellon Foundation
Period
2022–2024
Total Award
$310,000
Key Outcomes
- LingHub platform launched: 94 languages, 2,400 registered annotators
- Partnership with 7 African universities for sustainable maintenance
- 1.2M-token annotated corpus released under CC-BY
Funder
Max Planck Institute for Evolutionary Anthropology
Period
2023
Key Outcomes
- Joint project on language contact and borrowing in West African creoles
- Co-authored 2 papers with MPI evolutionary linguistics group
Funder
DARPA LORELEI
Period
2020–2023
Total Award
$1,100,000
Key Outcomes
- State-of-the-art cross-lingual parser across 47 language pairs
- 2 best-paper awards (ACL 2022, EMNLP 2023)
- Technology transfer to 3 humanitarian NLP applications
Funder
Linguistic Society of America
Period
2021
Key Outcomes
- Recognized for contributions to computational approaches to endangered language documentation
The work that needs you.
Explore Collaboration PortalLingHub 2.0 — Federated Annotation Infrastructure
Open-Source Tools · NLP / LLMs
Extending LingHub to a federated model where institutions host their own nodes while contributing to a shared annotation graph. Seeking computational infrastructure co-investigators and NLP engineers for a 2-year NSF proposal.
Currently seeking
Partners
MIT CSAIL · University of Lagos · KTH Stockholm
Proposal due: April 2026 · Project start: September 2026
View full proposalCitation Ecology in Endangered Language Scholarship
Network Science · Language Documentation
Mapping the full citation graph of endangered-language research from 1980–2025 to identify structural inequities in knowledge production. Currently coding 140,000 citation edges across 23 journals.
Currently seeking
Partners
Université Cheikh Anta Diop · SOAS University of London
Data collection: Q1 2026 · Analysis: Q3 2026
View full proposalDiscourse Coherence Across Typological Families
Discourse Analysis · Corpus Linguistics
Building a typologically balanced corpus of discourse-annotated texts spanning SOV, VSO, and free-word-order languages to test universalist claims in coherence theory. 3 of 8 target languages annotated.
Currently seeking
Partners
IIT Bombay · University of Cape Town · UNAM
Corpus complete: June 2026 · First paper: Q4 2026
View full proposalConsidering a PhD at MIT CSAIL?
Dr. Voss accepts 1–2 doctoral students per admissions cycle. Strong candidates combine linguistic fieldwork experience with programming skills and a specific language community in mind. Read the advising philosophy before reaching out.