Dr. Voss delivering a keynote lecture at a university auditorium, gesturing toward a projected network diagram

NLP Summit, Zürich 2024

Whiteboard covered in linguistic tree diagrams and handwritten equations during a research session

Whiteboard session, MIT

Fieldwork in a rural setting with a laptop open on a wooden table surrounded by handwritten notes

Fieldwork, Oaxaca 2023

Group of researchers collaborating around a conference table with papers and laptops spread out

Cross-lab workshop, Berlin

Portrait of a researcher at a desk surrounded by open books and highlighted papers in warm afternoon light

Reading room, Cambridge

Researcher presenting data visualizations on multiple screens in a dimly lit lab environment

Network viz lab, Stanford

Associate Professor · MIT CSAIL · Open-Source Advocate

Dr. Mara Voss

Computational Linguist · Network Scientist · Open-Source Advocate

SCROLL
Research Identity

A career built at the intersections.

CitationsGoogle Scholar
h-indexAs of Feb 2026
PublicationsPeer-reviewed
CountriesActive collaborations
Grant fundingTotal awarded

Disciplines & Methods

Institutional Affiliations

MI
MIT CSAIL
view projects →
MP
Max Planck Institute
view projects →
St
Stanford NLP Group
view projects →
AT
Alan Turing Institute
view projects →
UC
University of Cape Town
view projects →
Selected Publications

The body of work.

67 total publicationsFull list on Scholar →
2024

Cross-lingual Dependency Parsing via Latent Graph Alignment

Computational Linguistics·312 citations
NLP / LLMsCross-lingual TransferGraph Theory

We present a novel alignment mechanism that projects dependency structures across 47 language pairs using shared latent space representations, outperforming fine-tuned mBERT by 8.4 LAS points on the Universal Dependencies benchmark.

2023

Network Topology of Citation Graphs in Endangered Language Research

Journal of Linguistic Geography·189 citations
Network ScienceEndangered LanguagesCorpus Linguistics

Applying small-world network analysis to 23 years of endangered-language citations reveals a hub-and-spoke topology concentrated in six institutions — with implications for knowledge equity and funding allocation.

2023

Discourse Coherence in Low-Resource Languages: A Corpus Study

Language Resources & Evaluation·143 citations
Discourse AnalysisLanguage DocumentationSociolinguistics

Using a newly annotated 2.3M-token corpus of Wolof and Bambara, we demonstrate that discourse coherence models trained on high-resource proxies degrade by 31% without language-specific bridging features.

2022

Open Treebank Infrastructure for Collaborative Annotation

ACL Anthology·421 citations
Open-Source ToolsCorpus LinguisticsNLP / LLMs

LingHub — a GitHub-integrated treebank platform now supporting 94 languages — reduces annotation onboarding time from 6 weeks to 3 days while maintaining inter-annotator agreement above 0.87 Cohen's κ.

2024

Multilingual Coreference Resolution with Sparse Attention

EMNLP Proceedings·267 citations
NLP / LLMsMultilingual ModelsCross-lingual Transfer

A sparse-attention architecture trained jointly on 18 languages achieves state-of-the-art coreference F1 on OntoNotes while reducing compute by 40%, enabling deployment on resource-constrained research infrastructure.

Click any paper to expand abstract and co-author network nodes.

Collaboration Network

Across disciplines, across borders.

14 countries · 6 continents · 31 active co-investigators

Portrait of Prof. Kenji Watanabe, Professor of Computational Linguistics at Kyoto University

Prof. Kenji Watanabe

Professor of Computational Linguistics

JapanKyoto University

Morphologically rich languages, parsing

4projects
9papers
view projects →
Portrait of Dr. Leila Ahmadi, Assistant Professor, Linguistics at University of Toronto

Dr. Leila Ahmadi

Assistant Professor, Linguistics

CanadaUniversity of Toronto

Network analysis, citation studies

3projects
7papers
view projects →
Portrait of Dr. Fatou Mbaye, Research Fellow, African Languages at Université Cheikh Anta Diop

Dr. Fatou Mbaye

Research Fellow, African Languages

SenegalUniversité Cheikh Anta Diop

Low-resource NLP, Wolof corpus

2projects
5papers
view projects →
Portrait of Dr. Priya Krishnamurthy, Associate Professor, CS at IIT Bombay

Dr. Priya Krishnamurthy

Associate Professor, CS

IndiaIIT Bombay

Coreference resolution, Indic languages

3projects
6papers
view projects →
Portrait of Prof. Elena Marchetti, Chair, Digital Humanities at University of Bologna

Prof. Elena Marchetti

Chair, Digital Humanities

ItalyUniversity of Bologna

Open annotation tools, treebanks

2projects
4papers
view projects →
Portrait of Prof. Lars Bäckström, Professor, Machine Learning at KTH Stockholm

Prof. Lars Bäckström

Professor, Machine Learning

SwedenKTH Stockholm

Sparse attention, multilingual models

2projects
5papers
view projects →
Grants & Residencies

A record of sustained investment.

Click any item to unfold the full record.

Funder

NSF Linguistics Program

Period

2025–2027

Total Award

$480,000

Key Outcomes

  • Annotated corpus of 5M tokens across 8 under-resourced languages
  • Open-source discourse parser achieving 0.79 F1 on PDTB-style relations
  • 3 doctoral students funded through full completion

Funder

Harvard Radcliffe Institute

Period

2024

Key Outcomes

  • Completed manuscript: 'The Grammar of Networks' (MIT Press, forthcoming)
  • Organized cross-disciplinary working group on language and graph theory

Funder

Mellon Foundation

Period

2022–2024

Total Award

$310,000

Key Outcomes

  • LingHub platform launched: 94 languages, 2,400 registered annotators
  • Partnership with 7 African universities for sustainable maintenance
  • 1.2M-token annotated corpus released under CC-BY

Funder

Max Planck Institute for Evolutionary Anthropology

Period

2023

Key Outcomes

  • Joint project on language contact and borrowing in West African creoles
  • Co-authored 2 papers with MPI evolutionary linguistics group

Funder

DARPA LORELEI

Period

2020–2023

Total Award

$1,100,000

Key Outcomes

  • State-of-the-art cross-lingual parser across 47 language pairs
  • 2 best-paper awards (ACL 2022, EMNLP 2023)
  • Technology transfer to 3 humanitarian NLP applications

Funder

Linguistic Society of America

Period

2021

Key Outcomes

  • Recognized for contributions to computational approaches to endangered language documentation
Open Projects

The work that needs you.

Explore Collaboration Portal
01Recruiting

LingHub 2.0 — Federated Annotation Infrastructure

Open-Source Tools · NLP / LLMs

Extending LingHub to a federated model where institutions host their own nodes while contributing to a shared annotation graph. Seeking computational infrastructure co-investigators and NLP engineers for a 2-year NSF proposal.

Currently seeking

Co-PI (NLP/Systems)Postdoctoral ResearcherDoctoral Student

Partners

MIT CSAIL · University of Lagos · KTH Stockholm

Proposal due: April 2026 · Project start: September 2026

View full proposal
02Active

Citation Ecology in Endangered Language Scholarship

Network Science · Language Documentation

Mapping the full citation graph of endangered-language research from 1980–2025 to identify structural inequities in knowledge production. Currently coding 140,000 citation edges across 23 journals.

Currently seeking

Research Assistant (Python/NetworkX)Visiting Scholar

Partners

Université Cheikh Anta Diop · SOAS University of London

Data collection: Q1 2026 · Analysis: Q3 2026

View full proposal
03Active

Discourse Coherence Across Typological Families

Discourse Analysis · Corpus Linguistics

Building a typologically balanced corpus of discourse-annotated texts spanning SOV, VSO, and free-word-order languages to test universalist claims in coherence theory. 3 of 8 target languages annotated.

Currently seeking

Doctoral Student (fieldwork experience)Language Consultant

Partners

IIT Bombay · University of Cape Town · UNAM

Corpus complete: June 2026 · First paper: Q4 2026

View full proposal
Doctoral Mentorship

Considering a PhD at MIT CSAIL?

Dr. Voss accepts 1–2 doctoral students per admissions cycle. Strong candidates combine linguistic fieldwork experience with programming skills and a specific language community in mind. Read the advising philosophy before reaching out.