I do research in Deep Learning, Natural Language Processing, and Document Understanding
I am a Machine Learning researcher with a primary focus on Natural Language Processing and Document Understanding, presently involved in developing new architectures and conducting experiments to tackle previously unexplored business problems end-to-end.
This part of the research resulted in my Ph.D. thesis, “Span Identification and Key Information Extraction Beyond Sequence Labeling Paradigm,” and was conducted at Applica.ai, where I have worked since 2018, recently as Senior Research Scientist. This journey continues under the new banner after the acquisition of Applica by Snowflake.
At the same time, I am involved in developing models to prevent fake news, hoaxes, and disinformation at Adam Mickiewicz University. My previous work in this area resulted in winning the SemEval 2020 propaganda detection shared task and receiving the Best Paper Award at this venue.
Every once in a while I serve as a reviewer, recently for NeurIPS 2022, ICML 2022, CVPR 2022, ICLR 2022, NeurIPS 2021, EMNLP 2021, and COLING 2020.
Recently published papers I co-authored
Sparsifying Transformer Models with Trainable Representation Pooling2022.04
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer2021.09
Dynamic Boundary Time Warping for Sub-Sequence Matching with Few Examples2021.05
From Dataset Recycling to Multi-Property Extraction and Beyond2020.11
Contract Discovery: Dataset and a Few-Shot Semantic Retrieval Challenge with Competitive Baselines2020.11
ApplicaAI at SemEval-2020 Task 11: On RoBERTa-CRF, Span CLS and Whether Self-Training Helps Them2020.10
Since you’ve made it this far, you might be interested in my recent activity
It has been a while since I started my career in Computer Science, but at the beginning, i.e., in the middle of the 2000s, it was mainly software development stuff I find no longer attractive. In the early 2010s, I moved into applied Machine Learning and Natural Language Processing, whereas a few years ago, I started conducting serious research in the field. Here are some updates regarding the said recent period of my life.
Paper on sparsifying transformer models was presented at ACL. I published a preprint on text-to-table inference and defended my Computer Science Ph.D. thesis.
I publish a paper introducing the TILT model, which is a state-of-the-art Document Understanding solution. It wins ICDAR's 2021 InfographicsVQA shared task. Subsequently, I proposed the DUE benchmark (NeurIPS) and was invited by Huawei Research for a talk on End-to-End Document Understanding. Other pieces from this year tackle the problem of semantic retrieval (Expert Systems...) and trainable top-k mechanism (AAAI).
I published a paper on semantic retrieval from legal texts (EMNLP) and extraction of key information (CoNLL). My SemEval 2020 solution won the propaganda detection shared task, while the related paper received the Best Paper Award at this venue.
Since 2014, I have been involved in a couple of commercial and scientific Machine Learning projects. This resulted in several mildly interesting papers, and 2016 master's thesis focused on automating the segmentation of words into morphemes. I won the nationwide PolEval 2018 competition with my Named Entity Recognition model.
Between 2005 and 2013, I did boring stuff related to software development I self-taught myself. In the meantime, I became interested in general and quantitative linguistics, which ultimately led me to Natural Language Processing.