Sotabase
Home
Researchers
Career
·
Researcher in Natural Language Processing Group
,
Columbia University
2024–
·
PhD Student
,
Columbia University
2007–2011
·
Research Scientist
,
Columbia University
2003–2007
·
Intern
,
AT&T Labs, Inc.
2002–2002
·
Education
,
The Ohio State University
Publications
(57)
A Comparison of Features for Automatic Readability Assessment
International Conference on Computational Linguistics · 2010
282
cited
A Support Vector Approach to Censored Targets
Industrial Conference on Data Mining · 2007
154
cited
Restoring punctuation and capitalization in transcribed speech
IEEE International Conference on Acoustics, Speech, and Signal Processing · 2009
146
cited
Maximum Expected F-Measure Training of Logistic Regression Models
Human Language Technology - The Baltic Perspectiv · 2005
94
cited
Crowd-Sourced Speech Corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali
Workshop on Spoken Language Technologies for Under-resourced Languages · 2018
87
cited
Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems
International Conference on Language Resources and Evaluation · 2020
78
cited
A Step-by-Step Process for Building TTS Voices Using Open Source Data and Frameworks for Bangla, Javanese, Khmer, Nepali, Sinhala, and Sundanese
Workshop on Spoken Language Technologies for Under-resourced Languages · 2018
55
cited
Google's cross-dialect Arabic voice search
IEEE International Conference on Acoustics, Speech, and Signal Processing · 2012
47
cited
TTS for Low Resource Languages: A Bangla Synthesizer
International Conference on Language Resources and Evaluation · 2016
44
cited
Information Extraction from Voicemail Transcripts
Conference on Empirical Methods in Natural Language Processing · 2002
41
cited
OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language
North American Chapter of the Association for Computational Linguistics · 2009
40
cited
Rapid Development of TTS Corpora for Four South African Languages
Interspeech · 2017
39
cited
A Maximum Expected Utility Framework for Binary Sequence Labeling
Annual Meeting of the Association for Computational Linguistics · 2007
37
cited
Half & Half: Multiple Dispatch and Retroactive Abstraction for Java TM†
2002
29
cited
Parametric Models of Linguistic Count Data
Annual Meeting of the Association for Computational Linguistics · 2003
28
cited
WEB-derived pronunciations
IEEE International Conference on Acoustics, Speech, and Signal Processing · 2009
26
cited
A Unified Phonological Representation of South Asian Languages for Multilingual Text-to-Speech
Workshop on Spoken Language Technologies for Under-resourced Languages · 2018
24
cited
Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech
International Conference on Language Resources and Evaluation · 2020
22
cited
Search by voice in Mandarin Chinese
Interspeech · 2010
20
cited
Proceedings of the Workshop on Software
2005
18
cited
Show all 57 papers →
Sotabase
Martin Jansche | Researcher Profile | Sotabase | Sotabase