Abstract
Several Natural Language Processing applications
in a particular language consider POS tagging a necessary
component. To develop a new language specific POS tagger
targeting such particular language is a tedious job for
unstructured data due to the variation in text, type and
complexity of text. For that reason, it impacts the precision of
tagging as a result of the variety of a certain language. Current
research focused on the thought of reusability of a popular
language specific Part of speech tagger, for example, Stanford
Part of speech Tagger can be employed for tagging non-Engish
phrases. For generalizeability, any translator can be used to
translate the sentences, however, a well-known translator, named
“Google translator” is considered for sentence translation
purpose across the languages. For evaluation perspective, Urdu
tweets of a hot political issue “Panama leaks” from twitter.com
are extracted. To measure the accuracy, the kappa statistic along
with confusion matrix is deliberated. The precision of tagging the
Urdu sentences by reusing Stanford Part of speech tagger is
96.05 percent. The respected approach can be globally applied to
tagging the sentences of several different languages.
in a particular language consider POS tagging a necessary
component. To develop a new language specific POS tagger
targeting such particular language is a tedious job for
unstructured data due to the variation in text, type and
complexity of text. For that reason, it impacts the precision of
tagging as a result of the variety of a certain language. Current
research focused on the thought of reusability of a popular
language specific Part of speech tagger, for example, Stanford
Part of speech Tagger can be employed for tagging non-Engish
phrases. For generalizeability, any translator can be used to
translate the sentences, however, a well-known translator, named
“Google translator” is considered for sentence translation
purpose across the languages. For evaluation perspective, Urdu
tweets of a hot political issue “Panama leaks” from twitter.com
are extracted. To measure the accuracy, the kappa statistic along
with confusion matrix is deliberated. The precision of tagging the
Urdu sentences by reusing Stanford Part of speech tagger is
96.05 percent. The respected approach can be globally applied to
tagging the sentences of several different languages.
Original language | English |
---|---|
Number of pages | 6 |
DOIs | |
Publication status | Published (in print/issue) - 8 Feb 2018 |
Event | 13th International Conference on Emerging Technologies 2017 - Higher Education Commission Office, Islamabad, Pakistan Duration: 27 Dec 2017 → 28 Dec 2017 |
Conference
Conference | 13th International Conference on Emerging Technologies 2017 |
---|---|
Abbreviated title | ICET 2017 |
Country/Territory | Pakistan |
City | Islamabad |
Period | 27/12/17 → 28/12/17 |
Keywords
- Stanford-Part-of-speech Tagger
- Google-Translator
- Multi-lingual labling