Abstract
We present a new batch learning algorithm for text classification in the vector space of document representations. The algorithm uses ellipsoid separation in the feature space which leads to a semidefinite program. An approximation of the latent semantic feature extraction approach using Gram-Schmidt orthogonalization is used for the feature extraction. Preliminary results demonstrate some potential for the presented approach.
Original language | Undefined |
---|---|
Title of host publication | Learning Methods for Text Understanding and Mining (26/01/04 - 29/01/04) |
Publication status | Published (in print/issue) - 2004 |
Bibliographical note
Event Dates: 26 - 29 January 2004Keywords
- Text categorization
- pattern separation
- semidefinite programming
- ellipsoid
- latent semantic indexing
- feature extraction
- bag-of-words text representation
- Gram-Schmidt orthogonalization.