 
                        
                    Georgian morphosyntactic computational analysis and tools for the annotation of universal syntactic dependencies
The representation of syntactic structure in the form of syntactic dependencies can be considered as the basis for syntactic parsing and, prerequisite for the syntactic computational research from a multilingual point of view. The selection of syntactic dependency schemes in case of Georgian depends, firstly on the theoretical issues of Georgian syntax and, secondly, on the existence of natural language processing resources (such as corpora, morphosyntactic analyzer, etc.); cross-linguistically, it depends on the typological differences between the existing annotation schemes and their relation to Georgian. The Constraint Grammar (CG) can be considered as theoretical framework, which belongs to the main methodological paradigm of NLP and specifies the development of computational grammar for a specific language by means of context-oriented rules. Its practical realization is possible by means of the following resources: 
•    Visual Interactive Syntax Learning (VISL), and,
•    Universal Dependencies (UD) tools.
    The VISL is used to produce two types of treebanks: a) small pedagogically structured treebanks with a small quantity of sentences, and, b) large treebanks over running text. The Georgian Language treebank does not exist yet - an exception is the attempt to create a parallel manually annotated treebank for Georgian-Russian-Ukrainian and German languages, but the amount of Georgian data is not sufficient for successful machine learning. 
    The UD tools serve to provide cross-linguistically consistent treebank annotation in the development of multilingual parser and cross-linguistic learning from a language typology perspective. And as there is not Georgian treebank, the research on the syntactic model of universal dependencies has not been conducted yet.
    Thus, the proposed project aims the achievement of the following tasks:
•    The disambiguation of grammatical homonymy and the application of CG approaches to Georgian. It is crucial: a) for assigning morphosyntactic information according to the context of each token in the text; b) for provision of analysis to every string taking into account that the text may contain orthographical errors, dialectal and phraseological forms; c) for maintaining alternative analyses if the ambiguity of grammatical homonymy cannot be solved;
•    The development of the annotation scheme for the Georgian syntactic relations, which is important: a) to determine the syntactic functions and relations of Georgian; b) to develop dependency annotation schemes; c) to implement automatic analysis of dependencies and d) to compile Georgian test treebank;
•    The development of universal dependencies scheme for Georgian, which ensures the compatibility of the annotation schemes cross-linguistically and enriches the universal dependencies with Georgian data.
Projects » View All