MTA Nyelvtudományi Intézet Finnugor és nyelvtörténeti osztály

Department of Finno-Ugric and Historical Lingustics, RIL HAS

  • A betűméret növelése
  • Alapértelmezett betűméret
  • A betűméret csökkentése
Home Kutatás / Research Morfológiai elemzők / Morphological analyzers Online Uralic morphological analyzers and word form generators

Online Uralic morphological analyzers and word form generators

E-mail Nyomtatás PDF
The cooperation between the Department of Finno-Ugristics of the Research Institute for Linguistics HAS and Morphologic Ltd. began in 2001 with the projekt NKFP-5/135/01: A Complex Uralic Linguistic Database. One module of this project was developing morphological analyzers for different Uralic languages. Later some of them were developed further during the projects OTKA T 048309: Linguistic databases for Permic languages and OTKA K 60807: The morphological analysis of Nganasan by computer. The scope of included dialects and transcriptions were broadened by the project OTKA NF 71707: Ob-Ugric morphological analyzers and corpora.

The purpose of the website Online Uralic morphological analyzers and word form generators is to provide possibility for testing the analyzers which are near to ready-to-use level. The analyzer can also be used to help translation and language learning. There are some texts for testing, but users can have their own texts analyzed. A virtual keyboard helps typing texts (in August 2010, the virtual keyboard is available exclusively for Nganasan). Using the generator, we can get a word form by giving the stem, the word class and the tags of the morphological features. A gadget helps to choose the proper tags (in August 2010, it works only for Nganasan). The web interface has been created by István Endrédy and Attila Novák and is hosted by MorphoLogic.

In August 2010, there are four analyzers and generators available.

Komi-Zyryan

An analyzer for the standard Komi-Zyryan with Cyrillic orthography based on the Humor morphological analyzer engine of MorphoLogic. It is important that Cyrillic і (U+0456), І (U+0406), ӧ (U+04E7) and Ӧ (U+04E6) must be used to get proper result. The lexicon is derived from the Коми-роч кывчукӧр (Komi-Russian Dictionary) by L. M. Beznosikova, Ye. A. Aybabina and R. I. Kosnyreva (2000, Коми небӧг лэдзанін / Коми книжное издательство, Syktyvkar). In August 2010, the meaning of the stems is not given in any languages, later they will be presented in Russian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist).

Acknowledgements:

  • G. V. Fedyuneva, Institute of language, literature and history of the Komi Sciences Center of the Uralic Division of the Russian Academy of Sciences for the electronic sources of the Komi-Russian Dictionary;
  • Nadezhda Manova and Ilya Mityushev for providing the first texts in an electronic format
  • Nikolay Kuznetsov, Tartu University, for consultation.

Mansi (WT)


An analyzer for the Northern Mansi dialect based on the Humor morphological analyzer engine of MorphoLogic. It uses the transcription of the text collection Wogulische Texte mit einem Glossar (1976, Akadémiai Kiadó, Budapest) by Béla Kálmán. The lexicon is based on the vocabulary of the same book. The meaning of the stems is given in English, German and Hungarian.  The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist). The English translation of the glosseswere added by Nóra Wenszky.

Acknowledgements:


Nganasan


It uses the transcription of the text collection Chrestomathia Nganasanica (2002, SZTE Finnugor Tanszék -- MTA Nyelvtudományi Intézet, Szeged -- Budapest)), edited by Beáta Wagner-Nagy. The lexicon is based on the vocabulary of the same book. The meaning of the stems is given in Hungarian. The analyzer was developed by Attila Novák (grammar and implemetation of the computational morphology) and Beáta Wagner-Nagy, Zsuzsa Várnai and Sándor Szeverényi (language specialists, lexicon).

Udmurt

An analyzer for the standard Udmurt with Cyrillic orthography based on the Humor morphological analyzer engine of MorphoLogic. The lexicon is derived from the Udmurt--magyar szótár (Udmurt--Hungarian Dictionary) by István Kozmács (2002, Savaria University Press, Szombathely). The meaning of the stems is given in Hungarian. The analyzer was developed by Attila Novák (technical background) and László Fejes (language specialist).

Acknowledgements:

  • István Kozmács, Szeged University, for the electronic sources of the Udmurt--Hungarian Dictionary;
  • Jorma Luutonen, University of Turku, for providing text in electronic format;
  • the publishing house Удмуртия (Udmurtiya) and the editorial offices of Kenesh, Udmurt Dunnye, Vordskem Kyl, Invozho and Kizili for providing text in electronic format;
  • Galina Lesnikova, Elena Rodionova, Olga Ignatyeva, and Olga Urasinova for consultation.
Módosítás dátuma: 2010. szeptember 03. péntek, 10:41