Text classification using word sequence kernel methods.

Luis Trindade, H. Wang, William Blackburn, Niall Rooney

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

This paper presents a comparison study of two sequence kernels for text classification, namely, all common subsequences and sequence kernel. We consider some variations of the two kernels - kernels based on individual features, linear combination of individual kernels and kernels with a factored representation of features - and evaluate them in text classification by employing them as similarity functions in a support vector machine. A sentence is represented as a sequence of words along with their lemma and part-of-speech tags. Experiments show that sequence kernel has a clear advantage over all common subsequences. Since the main difference between the two kernels lies in the fact that the frequency of words (objects) is considered in sequence kernel but not in all common subsequences, we conclude that the frequency of words is an important factor in the successful application of kernels to text classification.
Original languageEnglish
Title of host publicationUnknown Host Publication
PublisherIEEE
Pages1532-1537
Number of pages6
DOIs
Publication statusPublished - 10 Jul 2011
EventInternational Conference on Machine Learning and Cybernetics, ICML 2011 - Guilin, China
Duration: 10 Jul 2011 → …

Conference

ConferenceInternational Conference on Machine Learning and Cybernetics, ICML 2011
Period10/07/11 → …

    Fingerprint

Cite this

Trindade, L., Wang, H., Blackburn, W., & Rooney, N. (2011). Text classification using word sequence kernel methods. In Unknown Host Publication (pp. 1532-1537). IEEE. https://doi.org/10.1109/ICMLC.2011.6016983