Читать книгу Semantic Web for Effective Healthcare Systems - Группа авторов - Страница 30
1.7 Results and Discussions
ОглавлениеOntology-based Semantic Indexing (OnSI) model builds domain Ontology for each product/service review documents using the selected features from the CFSLDA model. Protégé software is used to build and query the Ontology model. The top five terms selected by LDA model, related to each topic for the dataset is shown in Table 1.4. Each topic is manually labeled in context with the first term in the list. It is difficult to carry out human annotation for all the terms grouped under the topic.
Table 1.4 List of top five terms by LDA model.
Topics/features of (DS1) | Feature terms by FSLDA model (DS1) |
Topic 1—Cost | cost, test, money, charge, day |
Topic 2—Medicare | doctor, nurse, team, treatment, bill |
Topic 3—Staff | staff, patient, child, problem, face |
Topic 4—Infrastructure | hospital, people, room, experience, surgery |
Topic 5—Time | time, operation, hour, service, check |
For example, the term “bill” is one of the top words under the topic “medicare.” However, it is more related to the topics “cost” or “infrastructure” or “time.” Similarly, the term “appointment” is not present in any of the list under top 5 or top 10 terms; however, it is more appropriate to the topics “time” and “medicare.” In order to alleviate this problem, the CFSLDA model selects the representative terms of each topic with reference to the first term (the term which has the highest term-topic probability in each topic) in the list, using the correlation analysis. As stated in the previous example, the term “bill” is not related to the term “doctor,” and it is highly correlated with the terms “cost,” “hospital,” and “time.” The correlation values of these terms are shown in Table 1.5. For example, the term-topic probability “Φtw” of “room” is 0.0134 and correlated value “c” with “cost” is 0.0222. As stated in another example, the term “appointment” is highly correlated with the terms “doctor” and “time,” and it is grouped under the topic “medicare” and “time” as shown in Table 1.5. As an another example, the term “disease” is related with “doctor” and “hospital,” and it is not related with the terms “cost,” “time,” and “staff,” as shown in Table 1.5.
Table 1.5 Sample correlated terms selected by CFSLDA.
Features | Cost | Medicare | Staff | Infrastructure | Time | |
High probable terms | cost | doctor | Staff | Hospital | time | |
Term-topic probability (Φtw) | 0.0923 | 0.2132 | 0.2488 | 0.3152 | 0.1247 | |
Correlated value (c ) | 1 | 1 | 1 | 1 | 1 | |
Sample terms modeled by CFSLDA | ||||||
Room | Φtw C | 0.01340.0222 | 0.00040.1378 | 0.00050.2392 | 0.04710.0402 | 0.01340.0222 |
Disease | Φtw C | 0.0004-0.0347 | 0.02220.0547 | 0.0005-0.0462 | 0.00040.0948 | 0.0004+0.0408 |
Appointment | Φtw C | 0.0134-0.0343 | 0.00920.1802 | 0.0005-0.0462 | 0.0004-0.0414 | 0.00040.0477 |
Patient | Φtw C | 0.01770.0042 | 0.00040.1415 | 0.12470.1429 | 0.00040.1502 | 0.00050.2238 |
Bill | Φtw C | 0.00040.0468 | 0.0265-0.0015 | 0.00010.1614 | 0.01210.1176 | 0.00050.2111 |
Table 1.6 shows the list of feature terms selected by the CFSLDA model. Among the pre-processed and PoS tagged nouns, 68 terms are selected for the topic “cost,” 110 for “medicare,” 112 for “staff,” 101 for “infrastructure,” and 73 for “time.”
Table 1.6 List of correlated feature terms selected by CFSLDA model.
Features of DS1 | Number of terms selected by CFSLDA | Correlated feature terms by CFSLDA model (DS1) |
Cost | 68 | cost, test, money, charge, day, case, department, patient, room, pay, bill, ... |
Medicare | 110 | doctor, discharge, medicine, treatment, appointment, admission, disease, option, pain, reply, duty, test, meeting, … |
Staff | 112 | staff, patient, medicine, problem, report, manner, management, treatment, complaints, … |
Infrastructure | 101 | hospital, room, facility, meals, rate, … |
Time | 73 | time, service, hour, operation, day, bill, ... |
Thus, the feature extraction process model, the CFSLDA model, selects not only the terms with high term-topic probability value but also the terms which are highly correlated with the topmost term under each topic, which are contextually equivalent. The terms which are positively correlated with the top probable term are selected for the list. The topic name, the set of terms, and their LDA score are given to the Ontology builder tool for repository. Figure 1.11 shows the spring view of domain Ontology built for Healthcare service reviews (DS1).
Figure 1.11 Spring view of domain ontology (DS1).
Figure 1.12 shows the precision vs recall value curve and f-measure value when OnSI model is used on different query documents. It shows that recall improves continuously for higher number of query documents. When the document size is 500, OnSI model gives precision 0.61, recall 0.53, and F-measure 0.57 values.
Figure 1.12 Precision vs recall curve for the Dataset DS1.