Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach

Sun Fan, Ammar Belatreche, SA Coleman, TM McGinnity, Yuhua Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Online financial textual information contains a large amount of investor sentiment, i.e. subjective assessment and discussion with respect to financial instruments. An effective solution to automate the sentiment analysis of such large amounts of online financial texts would be extremely beneficial. This paper presents a natural language processing (NLP) based pre-processing approach both for noise removal from raw online financial texts and for organizing such texts into an enhanced format that is more usable for feature extraction. The proposed approach integrates six NLP processing steps, including a developed syntactic and semantic combined negation handling algorithm, to reduce noise in the online informal text. Three-class sentiment classification is also introduced in each system implementation. Experimental results show that the proposed pre-processing approach outperforms other pre-processing methods. The combined negation handling algorithm is also evaluated against three standard negation handling approaches.
LanguageEnglish
Title of host publicationUnknown Host Publication
Pages122-129
Number of pages8
Publication statusPublished - 27 Mar 2014
EventIEEE Computational Intelligence for Financial Engineering and Economics -
Duration: 27 Mar 2014 → …

Conference

ConferenceIEEE Computational Intelligence for Financial Engineering and Economics
Period27/03/14 → …

Fingerprint

Processing
Syntactics
Feature extraction
Semantics

Cite this

@inproceedings{be55f90f775b4841be7a72bb0129d805,
title = "Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach",
abstract = "Online financial textual information contains a large amount of investor sentiment, i.e. subjective assessment and discussion with respect to financial instruments. An effective solution to automate the sentiment analysis of such large amounts of online financial texts would be extremely beneficial. This paper presents a natural language processing (NLP) based pre-processing approach both for noise removal from raw online financial texts and for organizing such texts into an enhanced format that is more usable for feature extraction. The proposed approach integrates six NLP processing steps, including a developed syntactic and semantic combined negation handling algorithm, to reduce noise in the online informal text. Three-class sentiment classification is also introduced in each system implementation. Experimental results show that the proposed pre-processing approach outperforms other pre-processing methods. The combined negation handling algorithm is also evaluated against three standard negation handling approaches.",
author = "Sun Fan and Ammar Belatreche and SA Coleman and TM McGinnity and Yuhua Li",
year = "2014",
month = "3",
day = "27",
language = "English",
pages = "122--129",
booktitle = "Unknown Host Publication",

}

Fan, S, Belatreche, A, Coleman, SA, McGinnity, TM & Li, Y 2014, Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach. in Unknown Host Publication. pp. 122-129, IEEE Computational Intelligence for Financial Engineering and Economics, 27/03/14.

Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach. / Fan, Sun; Belatreche, Ammar; Coleman, SA; McGinnity, TM; Li, Yuhua.

Unknown Host Publication. 2014. p. 122-129.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Pre-processing Online Financial Text for Sentiment Classification: A Natural Language Processing Approach

AU - Fan, Sun

AU - Belatreche, Ammar

AU - Coleman, SA

AU - McGinnity, TM

AU - Li, Yuhua

PY - 2014/3/27

Y1 - 2014/3/27

N2 - Online financial textual information contains a large amount of investor sentiment, i.e. subjective assessment and discussion with respect to financial instruments. An effective solution to automate the sentiment analysis of such large amounts of online financial texts would be extremely beneficial. This paper presents a natural language processing (NLP) based pre-processing approach both for noise removal from raw online financial texts and for organizing such texts into an enhanced format that is more usable for feature extraction. The proposed approach integrates six NLP processing steps, including a developed syntactic and semantic combined negation handling algorithm, to reduce noise in the online informal text. Three-class sentiment classification is also introduced in each system implementation. Experimental results show that the proposed pre-processing approach outperforms other pre-processing methods. The combined negation handling algorithm is also evaluated against three standard negation handling approaches.

AB - Online financial textual information contains a large amount of investor sentiment, i.e. subjective assessment and discussion with respect to financial instruments. An effective solution to automate the sentiment analysis of such large amounts of online financial texts would be extremely beneficial. This paper presents a natural language processing (NLP) based pre-processing approach both for noise removal from raw online financial texts and for organizing such texts into an enhanced format that is more usable for feature extraction. The proposed approach integrates six NLP processing steps, including a developed syntactic and semantic combined negation handling algorithm, to reduce noise in the online informal text. Three-class sentiment classification is also introduced in each system implementation. Experimental results show that the proposed pre-processing approach outperforms other pre-processing methods. The combined negation handling algorithm is also evaluated against three standard negation handling approaches.

M3 - Conference contribution

SP - 122

EP - 129

BT - Unknown Host Publication

ER -