VIETNAMESE MULTI-DOCUMENT SUMMARIZATION BASE UNSUPERVISED LEARNING METHODS

  • Nguyen Hoang Diep Hung Yen University of Technology and Education
  • Nguyen Thi Hai Nang Hung Yen University of Technology and Education
  • Do Thi Thu Trang Hung Yen University of Technology and Education
  • Ngo Thanh Huyen Hung Yen University of Technology and Education
  • Trinh Thi Nhi Hung Yen University of Technology and Education
Keywords: Text summary, machine learning, learning to rank, unsupervised learning method, NLP, CNN, LSTM

Abstract

Recently, English summarization has been amazing results, while Vietnamese summarization has been being at an early stage with limited results. This paper proposes a solution to summarize Vietnamese text by utilizing unsupervised learning.

       The article shows the results of employing unsupervised learning methods to summarize a document. To do that, the authors compared results of unsupervised learning methods for summarization to supervised learning ones, including CNN and LSTM. The comparison can demonstrate the effectiveness of unsupervised learning methods for summarization.

        Unsupervised learning methods give promising empirical results because of some reasons. Firstly, based on ranking mechanisms, they pick up high-scoring sentences, which ensure the selection of important sentences. Secondly, the selection of sentences with low correlation shows that a summary text does not overlap with remaining sentences, which are not included in the summary.

Published
2020-01-14
How to Cite
Nguyen Hoang Diep, Nguyen Thi Hai Nang, Do Thi Thu Trang, Ngo Thanh Huyen, & Trinh Thi Nhi. (2020). VIETNAMESE MULTI-DOCUMENT SUMMARIZATION BASE UNSUPERVISED LEARNING METHODS. UTEHY Journal of Science and Technology, 24, 66-71. Retrieved from http://tapchi.utehy.edu.vn/index.php/jst/article/view/326