lobilists.blogg.se - Babelnet text classification

#Babelnet text classification manual#

Some of them addressed the problem and have attempted to use specific domain knowledge and showed that it improves WSD performance compared with generic supervised WSD. Several authors have attempted to use different techniques to solve with the WSD technique. The results demonstrate the DSS approach gains a great improvement over the WSD approach. Moreover, we compared our DSS model with one of the word sense disambiguation techniques, Context2vec in the text classification task. The result obtained by the Convolutional Neural Network (CNN) shows that our model gains great improvement over the result without assigning the Domain-Specific Senses (DSS). To examine the effectiveness of our detection method, we apply the results to text classification on the dataset collected from the Reuters corpus. The sense similarity scores which is used to decide the importance of senses is obtained by using distributed representations of, WordNet. We apply a simple Markov Random Walk (MRW) model to rank senses for each domain.

#Babelnet text classification manual#

Our model employs distributed representations of words learned by using Word2Vec and thus does not require manual annotation of sense-tagged data. In this paper, we focus on domain-specific senses of nouns and verbs and propose a method for detecting predominant sense in each domain/category.

A technique, predominant sense detection or domain-specific sense identification is an approach for detecting a sense of a word which is a most frequent use in a specific domain. On the other hand, the fifth sense of “party”, that is, “a person involved in legal proceedings” tends to be used in the “law” domain. The first sense of “party” is “an organization to gain political power”, and the usage of it would frequently be found in the domain of “politics” more than the “law” domain. There are five noun senses of “party” in the WordNet.

Furthermore, the FSH is not based on the domain but instead on the simple frequency counts of SemCor corpus. However, the disadvantage of the FSH obtained from the WordNet is a small amount of SemCor corpus which causes data sparseness problems, that is, we cannot apply the FSH to the senses that do not appear in SemCor corpus. It is often used as a baseline for supervised word sense disambiguation (WSD) systems because it is powerful, especially for words with highly skewed sense distributions.

The simplified version is to apply the First Sense Heuristic (FSH) that chooses the first or predominant sense of a word. Various techniques are applied to identify the actual sense of a word. The benefits from disambiguating a sense of a word can utilize in many applications, that is, Machine Translation, Question Answering, and Text Classification. The most frequent words generally have many senses than less frequent words, so determining a proper sense of the word in a context is a challenging task in Natural Language Processing (NLP). We compared our DSS model with one of the word sense disambiguation techniques (WSD), Context2vec, and the results demonstrate our domain-specific sense approach gains 0.053 F1 improvement on average over the WSD approach. We applied the results of domain-specific senses to text classification and examined how DSS affects the overall performance of the text classification task. We used the Reuters corpus and the WordNet in the experiments. It can capture large semantic context and thus does not require manual annotation of sense-tagged data. The similarity of senses is obtained by using distributional representations of words from gloss texts in the thesaurus. It decides the importance of a sense within a graph by using the similarity of senses. We apply a simple Markov Random Walk (MRW) model to ranking senses for each domain. Our Domain-Specific Senses (DSS) model is an unsupervised manner and detects predominant senses in each domain. This paper focuses on the domain-specific senses of words and proposes a method for detecting predominant sense depending on each domain.