Tibor Pintér: What should be known about the national corpuses
Definition of corpus linguistics. Research areas and tools of corpus linguistics researches. The issue of representativeness. Definition of the corpuses’ representativeness according to Douglas Biber. Quality and quantity of materials involved in the corpus. Computer processing on the basis of materials from the Internet coded in HTML format. Characteristics of the Hungarian Word-source in Slovakia. The use of corpuses in linguistics and education.
The corpus linguistics systematically and regularly deals with linguistic corpuses and with the tools that store and process them, as well, and during the examinations in order to recognise linguistic systems and linguistic functions better, and it also uses such tools that have been impossible before because of the underdevelopment of computing technology. Computational linguistics is the closest to corpus linguistics, we can say that corpus linguistics forms a boundary to computational linguistics and description linguistics, or social-linguistics.
The principal role of corpuses is to be a sample for descriptive and living language researches, thus the most important requirement towards their content and structure is to be representative, i.e. from the contextual and structural point of view the corpuses have to be as real as possible. Beside the quality of the material the quantity of materials involved in the corpus is also an important issue. This can vary according to the goal of corpuses, although the thesis that the corpuses should include the possibly highest amount of materials is very frequent.
Designers of corpuses provide processing of more hundred millions of words with the help of computers. This is made possible with the Internet, since there the materials are already in HTML format. The processors of the Hungarian Word-source in Slovakia also chose this format.
The corpuses can be used not only in linguistics, but also in a number of other scientific fields (according to some of the linguistics, everywhere where there is a work with words), like in education. The author hopes that corpus-oriented linguistics will be applied in Hungarian science in Slovakia, too, and that the opportunities given by the corpuses will be more widely used in the future. The most contributing would be using it in education.