Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets

Nicola Burns, Yaxin Bi, Hui Wang, Terry Anderson

Research output: Chapter in Book/Report/Conference proceedingChapter

15 Citations (Scopus)

Abstract

More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and naïve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research
LanguageEnglish
Title of host publicationKnowledge-Based and Intelligent Information and Engineering Systems
Pages161-170
Publication statusPublished - 2011

Fingerprint

Learning algorithms
Learning systems
Classifiers
Experiments

Cite this

Burns, N., Bi, Y., Wang, H., & Anderson, T. (2011). Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. In Knowledge-Based and Intelligent Information and Engineering Systems (pp. 161-170)
Burns, Nicola ; Bi, Yaxin ; Wang, Hui ; Anderson, Terry. / Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. Knowledge-Based and Intelligent Information and Engineering Systems. 2011. pp. 161-170
@inbook{0b14d69f3f4148a7b10d7642891170d9,
title = "Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets",
abstract = "More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and na{\"i}ve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research",
author = "Nicola Burns and Yaxin Bi and Hui Wang and Terry Anderson",
year = "2011",
language = "English",
isbn = "978-3-642-23850-5",
pages = "161--170",
booktitle = "Knowledge-Based and Intelligent Information and Engineering Systems",

}

Burns, N, Bi, Y, Wang, H & Anderson, T 2011, Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. in Knowledge-Based and Intelligent Information and Engineering Systems. pp. 161-170.

Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. / Burns, Nicola; Bi, Yaxin; Wang, Hui; Anderson, Terry.

Knowledge-Based and Intelligent Information and Engineering Systems. 2011. p. 161-170.

Research output: Chapter in Book/Report/Conference proceedingChapter

TY - CHAP

T1 - Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets

AU - Burns, Nicola

AU - Bi, Yaxin

AU - Wang, Hui

AU - Anderson, Terry

PY - 2011

Y1 - 2011

N2 - More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and naïve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research

AB - More people are buying products online and expressing their opinions on these products through online reviews. Sentiment analysis can be used to extract valuable information from reviews, and the results can benefit both consumers and manufacturers. This research shows a study which compares two well known machine learning algorithms namely, dynamic language model and naïve Bayes classifier. Experiments have been carried out to determine the consistency of results when the datasets are of different sizes and also the effect of a balanced or unbalanced dataset. The experimental results indicate that both the algorithms over a realistic unbalanced dataset can achieve better results than the balanced datasets commonly used in research

M3 - Chapter

SN - 978-3-642-23850-5

SP - 161

EP - 170

BT - Knowledge-Based and Intelligent Information and Engineering Systems

ER -

Burns N, Bi Y, Wang H, Anderson T. Sentiment Analysis of Customer Reviews: Balanced versus Unbalanced Datasets. In Knowledge-Based and Intelligent Information and Engineering Systems. 2011. p. 161-170