LSTM-based Siamese neural network for Urdu news story segmentation
Article Ecrit par: Bhatti, Muhammad Nauman Ahmed ; Siddiqi, Imran ; Moetesum, Momina ;
Résumé: News story segmentation is a challenging task mainly due to the dynamic range of topics, smooth story transitions, and varied duration of each story. This paper presents a technique to segment stories from Urdu news bulletins. The technique relies on a Long Short-Term Memory-based Siamese neural network that is trained on positive (belonging to the same story) and negative (belonging to different stories) pairs of sentences. The model, once trained, identifies the transition between stories by detecting the dissimilarity between the adjacent sentences of a given text. For algorithmic development and experimental study, we employ two datasets, a dataset of Urdu news as well as transcriptions of news bulletins from multiple news channels. Experiments report promising results in identifying story boundaries validating the ideas put forward in this study.
Langue:
Anglais