Daniel Varab

NLP Researcher

News

15/01/2024 - 🚨Joined DFKI Berlin 🇩🇪 to work on human-centered LLMs.
10/10/2023 - Giving a lecture on summarization at ITU.
05/07/2023 - Flying out to ACL 2023 🇨🇦 Reach out if you're in town!
14/06/2023 - I am a doctor! 🎓🎓🎓
01/06/2023 - Hosting a workshop on language technology and society.
02/05/2023 - Paper on extractive summarization accepted at ACL 2023!
02/05/2023 - New personal website!

Resume

(2020-2023) Researcher, German Research Center for Artificial Intelligence
(2020-2023) Industrial Researcher and Ph.D., Novo Nordisk
(2019) Research Assistant, IT University of Copenhagen
(2018-2019) Senior Machine Learning Engineer, Karnov Group
(2017-2018) Research Assistant, IT University of Copenhagen
(2017-2018) Assistant Lecturer, IT University of Copenhagen
(2016-2017) Teaching Assistant, IT University of Copenhagen
(2014-2017) Full Stack Developer, International Association of Prosecutors
(2015-2016) Frontend Developer, Heartbeats

Resources

GenX: A extractive summarization algorithm in the age of generative LLMs.
MassiveSumm: A summarization dataset covering 92 languages and 35 scripts.
DaNewsroom: A Danish summarization dataset (>1 million samples).
Danish Gigaword: A freely distributed billion-word corpus of Danish text.
Uniparse: A library to develop reproducible graph-based dependency parsers.

Publications

Daniel Varab and Yumo Xu. 2023. Abstractive Summarizers are Excellent Extractive Summarizers. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 330–339, Toronto, Canada. Association for Computational Linguistics.
Dennis Ulmer, Elisa Bassignana, Max Müller-Eberstein, Daniel Varab, Mike Zhang, Rob van der Goot, Christian Hardmeier, and Barbara Plank. 2022. Experimental Standards for Deep Learning in Natural Language Processing Research. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2673–2692, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
Leon Strømberg-Derczynski, Manuel Ciosici, Rebekah Baglini, Morten H. Christiansen, Jacob Aarup Dalsgaard, Riccardo Fusaroli, Peter Juel Henrichsen, Rasmus Hvingelby, Andreas Kirkedal, Alex Speed Kjeldsen, Claus Ladefoged, Finn Årup Nielsen, Jens Madsen, Malte Lau Petersen, Jonathan Hvithamar Rystrøm, and Daniel Varab. 2021. The Danish Gigaword Corpus. In Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa), pages 413–421, Reykjavik, Iceland (Online). Linköping University Electronic Press, Sweden.
Daniel Varab and Natalie Schluter. 2021. MassiveSumm: a very large-scale, very multilingual, news summarisation dataset. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 10150–10161, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
Daniel Varab and Natalie Schluter. 2020. DaNewsroom: A Large-scale Danish Summarisation Dataset. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6731–6739, Marseille, France. European Language Resources Association.
Daniel Varab and Natalie Schluter. 2019. UniParse: A universal graph-based parsing toolkit. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, pages 406–410, Turku, Finland. Linköping University Electronic Press.
Natalie Schluter and Daniel Varab. 2018. When data permutations are pathological: the case of neural natural language inference. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 4935–4939, Brussels, Belgium. Association for Computational Linguistics.

Education

(2020-2023) Ph.D. (NLP), IT University of Copenhagen
(2014-2017) IT University of Copenhagen, MSc in IT
(2013) École pour l’informatique et les techniques avancées
(2011-2014) IT University of Copenhagen, BSc in IT

Reviewing

COLM 2024
ACL {2022, 2023}
EMNLP {2021,2022,2023}
WNUT {2020, 2021, 2022}
NoDaLiDa {2021, 2022, 2023}

Contact

🐣 @danielvarab 📨 danielvarab[at]gmail.com