Online communities allow users to interact using textual communication.
Insult is a form of harassment appearing in online communities. Current control
method using regular expression as a word filter has some problems with false-
positive errors. This paper presents alternatives to detect insults in texts retrieved from
online communities. Algorithms presented are based on linguistic features of Thai
language. These features are word boundary and part-of-speech analyzed by natural
language processing technology. The goal of the proposed algorithms is to reduce
false-positive errors in regular expression results. The proposed algorithms are
compared to regular expression. Algorithms are compared using precision and recall
scores. Results suggest that performance of the proposed insult detection algorithms
relies on accuracy of natural language processing. Insult detection algorithms have
higher precision scores but lower recall scores compared to regular expression
algorithms. Insult detection algorithms require shorter insult detection time but longer
data preprocessing time than regular expression algorithms.
|