EVALUATION THE ML AND DEEP LEARNING MODELS FOR AUTOMATED SENTIMENT CLASSIFICATION OF YOUTUBE COMMENTS
DOI:
https://doi.org/10.84761/7xtc8q84Abstract
YouTube's comments provide a wealth of sentiment information for different video categories. However, the sheer volume of data and the presence of noise make the analysis a challenging task. In this study, machine learning and deep learning methods are used for automatic sentiment analysis of YouTube comments in several categories. Due to the increasing amount of user-generated content on YouTube, manual analysis of audience sentiment becomes increasingly challenging and time-consuming. Hence, in this paper, an efficient approach is proposed to analyse sentiments in YouTube comments based on natural language processing (NLP), machine learning (ML), and deep learning (DL) algorithms. In order to train the model, a dataset consisting of 50,000 YouTube comments belonging to different classes such as educational, entertainment, music, news, and games was collected using the YouTube Data API. The comments were processed through several techniques, including tokenisation, lemmatisation, removal of stopwords, conversion of slang words, and class balancing via SMOTE. In addition, several ML algorithms, including Random Forest, SVM and DL models such as CNN, BiLSTM, and BERT, were applied to perform the experiment. However, the results of these experiments showed that BERT transformers' algorithm generated the highest accuracy and macro F1 of 92.5% and 0.92, respectively, outperforming the rest of the machine learning algorithms. Optimal tuning of hyperparameters was vital for boosting accuracy, while the highest accuracy was achieved by BERT + RF (93.8%).




