Sentiment analysis amidst ambiguities in YouTube comments on Yoruba language (nollywood) movies

Document Type

Conference Proceeding

Publication Title

WWW'12 - Proceedings of the 21st Annual Conference on World Wide Web Companion

Abstract

Nollywood is the second largest movie industry in the world in terms of annual movie production. A dominant number of the movies are in Yoruba language spoken by over 20 million people across the globe. The number of Yoruba language movies uploaded to YouTube and their corresponding comments is growing exponentially. However, YouTube comments made by native speakers on Yoruba movies combine English language, Yoruba language, and other commonly used "pidgin" Yoruba language words. Since Yoruba is still a resource constrained language, existing sentiment or subjectivity analysis algorithms have poor performances on YouTube comments made on Yoruba language movies. This is because of the constrained language ambiguities. In this work, we present an automatic sentiment analysis algorithm for YouTube comments on Yoruba language movies. The algorithm uses SentiWordNet thesaurus and a lexicon of commonly used Yoruba language sentiment words and phrases. In terms of precision-recall, the algorithm performs more than a state-of-the-art sentiment analysis technique by up to 20%. Copyright is held by the author/owner(s).

First Page

583

Last Page

584

DOI

10.1145/2187980.2188138

Publication Date

5-21-2012

Share

COinS