Читать книгу Semantic Web for Effective Healthcare Systems - Группа авторов - Страница 18
1.3 Motivation
ОглавлениеFeature extraction from product or service review documents often includes different steps like data pre-processing, document indexing, dimension reduction, model training, testing, and evaluation. Labeled data set of document collection is used to train or learn the model. Further, the learned model is used for identifying unlabeled concept instances from the new set of documents. Document indexing is the most critical and complex task in text analysis. It decides the set of key features to represent the document. It also enhances the relevancy between the word (or feature) and the document. It needs to be very effective as it decides the storage space required and query processing time of documents.
The Ontology-based or semantic-based approach is used to retrieve the concepts from the documents by establishing the contextual relationships. In content-based approach, BagOfWords model is used for representing the text, where synonymy and polysemy cannot be resolved as they use terns as indexes. However, in context-based semantic approaches like topic modeling techniques, concepts are used to extract information and their categorization. It projects the contextual relationship among the terms present in the documents.
In order to utilize the strength of Ontology in user-generated content analysis process, this chapter proposes a domain Ontology-based Semantic Indexing (OnSI) technique for product or service reviews generated by the customers in the social media platform. The integration of topic modelling technique with the Ontology learning is explained by the OnSI method; it is generic and is applicable to any domain.