Tibor M. Pintér: „Borderless” Hungarian Language – the first structurised Hungarian Language Corpus Comprising of Cross-Border Hungarian Language Variations is Ready
After a three-year work The Hungarian Language Corpus of the Carpathian Basin is ready. Despite the fact that the material is just a fragment of the Hungarian version, it is a substantial progress in the field of Hungarian language corpuses, since with this work such corpus was created in the Linguistic Institution that includes even the variations of the Hungarian language outside the borders of Hungary, thus enabling perhaps a comparing reserach.
Although, with the creation of the Kmmnyk the works have not been finished. Two question have not been answered. The transcription and annotation of living languages have not been finished, work harmonisation and/or unification/annotation of records is not done. It is questionable if the work on the material Kmmnyk containing over the border Hungarian language variations will continue in the future. Whatever would the competition’s future be, it can be presumed that the research station will continue in collecting materials, since all the four research stations launched building in their own region its regional corpuses, and/or competes for the preparation of Wordject-project. Although, if under the supervision of MTA Linguistic Institution no other joint project is realised, then it is more probable that the material that is gradually collected in the research station will have different forms.
The Hungarian Over the Border Language Corpus comparing to the inicial conceptions has been changed. The change related to the sub-corpus, official language and personal communication.
Although the collection of official text has been gradual, but since the Hungarian language in minority position is only secondary, and its usage in officialy – due to the acts on languages – is limited, it is less probable that in the Over the Border Hungarian Language the requested proportions will be ever reached (because scientific, literary and publicistic sub-corpus is growing in a big extent, therefore the absolute numbers are gradually growing, and thus becoming unreachable).
The first public repserentation of the corpus was on 22nd November 2005 within the series of performations of the Hungarian Scientific Day. Personally, I can only hope that it will be known and used by many people for research and educational purposes.