Seminar on Speech and Language Technology Tools

A HunCLARIN event for the promotion of using language technology software tools and corpora for humanities and social sciences research 

Szeged, 19 October 2018


Organizing team

Tamás Erdei

Assistant lecturer
Department of Hungarian and Applied Linguistics
Gyula Juhász Faculty of Pedagogy
University of Szeged


Kinga Jelencsik-Mátyus

User Involvement coordinator of HunCLARIN
Research Group for Language Technology
Research Institute for Linguistics
Hungarian Academy of Sciences

Eszter Simon

Senior Research Fellow
Research Group for Language Technology
Research Institute for Linguistics
Hungarian Academy of Sciences



10:00 Opening session (Erzsébet Drahota-Szabó and Valéria Juhász from the University of
          Szeged and Tamás Váradi from HunCLARIN)
10:15 Veronika Vincze: Introduction to the world of corpora and linguistic databases SLIDES 
11:00 Bálint Sass: Corpus query SLIDES 
11:45 - 12:45 Lunch
12:45 Eszter Simon: Historical corpora for Hungarian SLIDES 
13:15 Iván Mittelholcz: Introduction to e-magyar SLIDES 
13:45 Valéria Juhász: The possibilities of MAXQDA content analysis software SLIDES
14:15 - 14:45 coffee break
14:45 Anna Babarczy: Child language corpora and resources SLIDES 
15:15 Róbert Péter: The big data challenge in Humanities: Introduction of some research
          tools in digital humanities SLIDES 
15:45 Closing session (Gábor Prószéky, head of the Hungarian Association of Applied Linguists and Language Teachers (MANYE))


Posters were also presented to introduce CLARIN, HunCLARIN and some important projects using or developing the HunCLARIN infrastructure in Szeged.

Kinga Jelencsik-Mátyus HunCLARIN - an Introduction
Kinga Jelencsik-Mátyus About CLARIN
Veronika Vincze       Language Resources at the University of Szeged
Ildikó Hoffmann Detecting Mild Cognitive Impairment by Exploiting Linguistic Information from Transcripts
Csilla Horváth    Open corpora of Uralic Languages 


Goal of the event

In Hungary there are several centers at universities and research institutions where digital linguistic corpora and softwares are developed (see the members of HunCLARIN). However, outside these centers digital linguistic services are still not very well known.

The main goal of the present event is to introduce the digital linguistic services to researchers, teachers and students already working on linguistic projects. The conference is planned to be followed by similar events in other cities as well as more practical workshops in Szeged.

Our aim is to show that, on the one hand, there are several well-developed, state-of-the-art corpora and software tools in the fields of historical linguistics, child linguistics (that are very popular and well-researched in Szeged), and other fields of HSS, to introduce the very basics of using these services, and, on the other, that at HunCLARIN most of these services can be found and professional help on how to use them is also available.

In the first section linguistic corpora, types and possibilities of usage are introduced, then practical examples of running simple and more complex searches in corpora is represented. In the second part, three presentations are dedicated to software tools, like e-magyar (digital language processing toolchain), MAXQDA (a content analysis software) and sometools specially developed for HSS research.



Veronika Vincze

Researcher at the HAS-SZTE Research Group on Artificial Intelligence and at the University of Szeged, Human Language Technology Group

Veronika Vincze supervises and coordinates the linguistic aspects of the projects in the groups she works in. Also takes part in the famous research conducted in Alzheimer's disease, aiming at detecting the earliest signs of AD from speech. Her research interests include corpus and ontology building, word sense disambiguation and the NLP treatment of multiword expressions. She is also interested in computational morphology, parsing and information extraction.


Bálint Sass

PhD research fellow at the Research Group for Language Technology, Research Institute for Linguistics, Hungarian Academy of Sciences

The fields of research of Bálint Sass are corpus query interfaces and corpus creation. He is one of the creators of the Hungarian National Corpus, the Old Hungarian Corpus and the corpus of Budapest Sociolinguistic Interview as well as the creator of query interfaces for all these corpora.  He is also interested in predicate-argument structure (built the Verb Argument Browser), in computer aided dictionary creation and in Hungarian contracted Braille script. He has held several NLP courses for university students.


Eszter Simon

Senior research fellow at the Research Group for Language Technology, Research Institute for Linguistics, Hungarian Academy of Sciences

Her fields of research include named entity recognition, morphological analysis, corpus building, annotation, development of historical corpora and computational linguistics for Uralic languages. She supervised and coordinated the computational linguistic works of several large-scale projects, like the Hungarian Generative Diachronic Syntax and the Syntax of Uralic Languages. She has held numerous university course in NLP.


Iván Mittelholcz

Software engineer at the Research Group for Language Technology, Research Institute for Linguistics, Hungarian Academy of Sciences

Ivan Mittelholcz started to deal with human language technology in 2006. He has worked on software development tasks in several research projects. His research areas include tokenization, spell checking, ontology building and supervised machine learning. Recently he has held several university courses on logic, human language technology and programming.


Valéria Juhász

PhD, Head of the Department of Hungarian and Applied Linguistics

Her fields of research interests include language and speech training, sociolinguistics, content analysis in media and computer assisted communication. She is a member of the Hungarian Reading Association (HUNRA), focusing on research in the field of reading skills development, and the promotion of the culture of reading in Hungary. Lately she has been working on and promoting MAXQDA, a content analysis software.


Anna Babarczy

Senior research fellow at the Research Group for Psycholinguistics, Research Institute for Linguistics, Hungarian Academy of Sciences and lecturer at the Department of Cognitive Science at BUTE

Anna Babarczy is a renowned researcher and lecturer of child language and pragmatics, with special interest in corpus linguistic methods in these fields. Her fields of research interests include experimental pragmatics, psycholinguistics of abstraction and automatic identification of literal versus non-literal meaning. She has been the leader of several large-scale projects.


Róbert Péter

Associate Professor in the Department of English at the University of Szeged

He is the general editor of the five-volume primary resource collection entitled British Freemasonry, 1717-1813 (New York: Routledge, 2016) that contains hitherto unexplored and rare masonic texts. He also has a strong interest in digital humanities, in particular the development and use of quantitative methods for exploring long-scale trends and patterns in historical and cultural processes by analysing the bibliographic and metadata of a vast number of texts. 


If you plan to organise a similar event

We had a really good experience with using a simple google form for registration. It helped a lot not only to keep track of the audience (of course), but also made it easier to keep in touch with them to inform them about the availabilty of the training materials (slides, posters added to the event page, proceedings, videos, next events).

However, our choice of the date was not so fortunate, as it was the last day before a long weekend, thus only some very dedicated students turned up.

Adverstising the event at all the major universites and institutes dealing with linguistics in Hungary was a very good choice, as we had quite a few participants from other cities. What is more, seeing the success of this event, the University of Debrecen has already started organising a very similar seminar. 




For more pictures of the event click here.