Print document
TITLE DETECTION SYSTEM FOR THAI INSULT BASED ON LINGUISTIC FEATURE ANALYSIS
AUTHOR JIRAPON TANASANTI
DEGREE MASTER OF SCIENCE PROGRAM IN TECHNOLOGY OF INFORMATION SYSTEM MANAGEMENT
FACULTY FACULTY OF ENGINEERING
ADVISOR PISIT PHOKHARATKUL
CO-ADVISOR VLADIMIR BUNTILOV
BUDSABA KANOKSILPATHAM
 
ABSTRACT
Online communities allow users to interact using textual communication. Insult is a form of harassment appearing in online communities. Current control method using regular expression as a word filter has some problems with false- positive errors. This paper presents alternatives to detect insults in texts retrieved from online communities. Algorithms presented are based on linguistic features of Thai language. These features are word boundary and part-of-speech analyzed by natural language processing technology. The goal of the proposed algorithms is to reduce false-positive errors in regular expression results. The proposed algorithms are compared to regular expression. Algorithms are compared using precision and recall scores. Results suggest that performance of the proposed insult detection algorithms relies on accuracy of natural language processing. Insult detection algorithms have higher precision scores but lower recall scores compared to regular expression algorithms. Insult detection algorithms require shorter insult detection time but longer data preprocessing time than regular expression algorithms.
KEYWORD REGULAR EXPRESSION / WORD FILTER / NATURAL LANGUAGE PROCESSING / ONLINE COMMUNITY / LINGUISTIC FEATURES
 
FACULTY OF GRADUATE STUDIES. MAHIDOL UNIVERSITY. THAILAND
POWERED BY GITC.