Daniel Varab, Ph.D. | NLP/ML Research Engineer
News
- 10/10/2023 - Giving a lecture on summarization at the IT University of Copenhagen.
- 15/07/2023 - On parental leave for the remainder of the year. On the lookout for work in the new year. Give me a ping if you're working on something cool!
- 05/07/2023 - Flying out to ACL 2023 🇨🇦 Reach out if you're in town!
- 14/06/2023 - I am a doctor! 🎓
- 01/06/2023 - We are hosting a workshop on language technology and society. Visit the link for more info!
- 02/05/2023 - Our paper on generative extractive summarization was accepted at ACL 2023!
Resources
- GenX: A generative extractive summarization algorithm .
- MassiveSumm: A summarization dataset covering 92 languages and 35 scripts.
- DaNewsroom: A Danish Summarization dataset (>1 million samples).
- Danish Gigaword: A billion-word corpus of Danish text, freely distributed with attribution.
- Uniparse: A framework for developing reproducible graph-based dependency parsers.
Publications
- Daniel Varab and Yumo Xu. 2023. Abstractive Summarizers are Excellent Extractive Summarizers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 330–339, Toronto, Canada. Association for Computational Linguistics.
- Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, and Barbara Plank. 2022. Experimental Standards for Deep Learning in Natural Language Processing Research. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2673–2692, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Leon Strømberg-Derczynski, Manuel Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Jens Madsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, and Daniel Varab. 2021. The Danish Gigaword Corpus. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 413–421, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
- Daniel Varab and Natalie Schluter. 2021. MassiveSumm: a very large-scale, very multilingual, news summarisation dataset. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10150–10161, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Daniel Varab and Natalie Schluter. 2020. DaNewsroom: A Large-scale Danish Summarisation Dataset. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6731–6739, Marseille, France. European Language Resources Association.
- Daniel Varab and Natalie Schluter. 2019. UniParse: A universal graph-based parsing toolkit. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 406–410, Turku, Finland. Linköping University Electronic Press.
- Natalie Schluter and Daniel Varab. 2018. When data permutations are pathological: the case of neural natural language inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4935–4939, Brussels, Belgium. Association for Computational Linguistics.
Resume
- (2020-2023) Industrial Researcher, Novo Nordisk
- (2019) Research Assistant, IT University of Copenhagen
- (2018-2019) Senior Machine Learning Engineer, Karnov Group
- (2017-2018) Research Assistant, IT University of Copenhagen
- (2017-2018) Assistant Lecturer, IT University of Copenhagen
- (2016-2017) Teaching Assistant, IT University of Copenhagen
- (2015-2016) Frontend Developer, Heartbeats
- (2014-2017) Full Stack Developer, International Association of Prosecutors
Education
- (2020-2023) Ph.D. in Natural Language Processing, IT University of Copenhagen
- (2014-2017) IT University of Copenhagen, MSc in IT
- (2013) École pour l’informatique et les techniques avancées
- (2011-2014) IT University of Copenhagen, BSc in IT
Reviewing
- ACL {2022, 2023}
- EMNLP {2021,2022,2023}
- WNUT {2020, 2021, 2022}
- NoDaLiDa {2021, 2022, 2023}
Contact
🐣 @danielvarab 📨 danielvarab[at]gmail.com