Читать книгу Semantic Web for Effective Healthcare Systems - Группа авторов - Страница 14
1.1 Introduction
ОглавлениеText analysis is defined as deriving structured data from unstructured text. Additional information like customer insight about the product or service can be retrieved from the unstructured data sources using text analytics techniques. Its techniques have different applications such as insurance claims assessment, competitor analysis, sentiment analysis and the like. Many industries use text analytics for their business improvement. Social media impacts different industries like product business [1, 2], tourism [3, 4], and healthcare service [5] with the tremendous changes in the recent past years.
Retrieving and summarizing web data, which are dispersed in different web pages, are difficult and complex processes; also, they consume most of the manual effort and time. No standard data model exists for web documents. This increases the necessity of annotating the huge number of text documents that exist in the World Wide Web (WWW). Extracting and collating the information from these text is a complex task. Unlike numerical dataset, text documents contain more number of features. The amount of resources required to represent big dataset may be improved by representing the text documents with most needed and non-redundant features. Classification or clustering algorithms may be used for identifying the features from the text documents. The documents are analyzed, modeled and then used in the process of business improvement or for personal interest. Thus, the annotated text improves automated decision-making process, which in turn reduces the manual effort and time required for text analysis.
The report from British Columbia Safety and Quality Council says when patients and healthcare service entities are engaged in online platform, then there would be greater improvement in offering healthcare services. Improvement in healthcare services is visible when insights from the experience of patients are analyzed [5]. Hence, it becomes necessary to consolidate the opinions from the customers or clients so as to improve business, decision-making and increase revenue. Figure 1.1 gives the overview of decision-making process from the online product/service reviews, using different information extraction and text analysis techniques.
There exist many challenges while analyzing the social media text or user-generated content. In languages like English, the same word has multiple meaning (polysemy), and different words have same meaning (synonymy). People show “variety” and use heterogeneous words while expressing their views. It often leads to complication in processing the textual data. Most of the feature extraction techniques do not consider the semantic relationships between the terms. Subjectivity that exists in text processing techniques adds complexity to the process, which in turn impacts the evaluation of results. Also, the rare availability of gold-standard or annotated text data for different domains add more challenges to text analysis [6]. Hence, the identification and application of suitable Natural Language Processing (NLP) techniques are the main research focus in text data analysis.
Figure 1.1 Decision-making process from social media reviews.
Text analytics supports the context matching between the reader and the writer. This challenge can be managed if different vocabularies of features and their relationship are well represented in the data model. For example, content based contextual user feedback analysis enables the users to buy new products or avail any service by highlighting the best features of products or services. Challenges and issues in information retrieval problems are overcome if Ontology representation and topic modelling techniques are used for modeling the text documents. The chapter focuses on extracting relevant features from the set of documents and building domain ontology for them. The Ontology helps in building the predictive or sentiment analysis model by using suitable information retrieval (IR) techniques and contextual representation of data, so as to enable automated decision-making process, before buying a new product or availing a new service, as shown in Figure 1.2.
Figure 1.2 User-generated content analysis (UCA) model.