Kereső
Bejelentkezés
Kapcsolat
![]() |
Morphology in the Age of Pre-trained Language Models |
Tartalom: | https://eprints.sztaki.hu/10875/ |
---|---|
Archívum: | SZTAKI Repozitórium |
Gyűjtemény: |
Status = Submitted
Type = Thesis |
Cím: |
Morphology in the Age of Pre-trained Language Models
|
Létrehozó: |
Ács, Judit
|
Dátum: |
2025-02-06
|
Téma: |
QA Mathematics and Computer Science
|
Tartalmi leírás: |
The field of natural language processing (NLP) has adopted deep learning methods in the past 15 years. Nowadays the state-of-the-art in most NLP tasks is some kind of neural model, often the fine-tuned version of a pre-trained language model. The efficacy of these models is demonstrated on various English benchmarks and increasingly, other monolingual and multimultilingual benchmarks. In this
dissertation I explore the application of deep learning models on low level tasks, particularly morphosyntactic tasks in multiple languages.
The first part of this dissertation (Chapters 3 and 4) explores the application of deep learning
models for classical morphosyntactic tasks such as morphological analysis and generation in dozens
of languages with special focus on Hungarian.
The second part of this dissertation (Chapters 5 to 8) deals with pre-trained language models,
mostly models from the BERT family. I include some experiments on GPT-4o and GPT-4o-mini. These
models show excellent performance on various tasks in English and some high density languages.
However, their evaluation in medium and low density languages is lacking. I present a methodology
for generating morphosyntactic benchmarks in arbitrary languages and I analyze multiple BERT-like models in detail. My main tool for analysis is the probing methodology which I extend the with
perturbations, the systematic removal of certain information from the sentence. I use Shapley values
to further refine my analysis.
|
Nyelv: |
angol
magyar
angol
|
Típus: |
Thesis
NonPeerReviewed
|
Formátum: |
text
text
text
|
Azonosító: | |
Kapcsolat: |