Читать книгу Semantic Web for Effective Healthcare Systems - Группа авторов - Страница 30

1.7 Results and Discussions

Ontology-based Semantic Indexing (OnSI) model builds domain Ontology for each product/service review documents using the selected features from the CFS_LDA model. Protégé software is used to build and query the Ontology model. The top five terms selected by LDA model, related to each topic for the dataset is shown in Table 1.4. Each topic is manually labeled in context with the first term in the list. It is difficult to carry out human annotation for all the terms grouped under the topic.

Table 1.4 List of top five terms by LDA model.

Topics/features of (DS1)	Feature terms by FSLDA model (DS1)
Topic 1—Cost	cost, test, money, charge, day
Topic 2—Medicare	doctor, nurse, team, treatment, bill
Topic 3—Staff	staff, patient, child, problem, face
Topic 4—Infrastructure	hospital, people, room, experience, surgery
Topic 5—Time	time, operation, hour, service, check

For example, the term “bill” is one of the top words under the topic “medicare.” However, it is more related to the topics “cost” or “infrastructure” or “time.” Similarly, the term “appointment” is not present in any of the list under top 5 or top 10 terms; however, it is more appropriate to the topics “time” and “medicare.” In order to alleviate this problem, the CFS_LDA model selects the representative terms of each topic with reference to the first term (the term which has the highest term-topic probability in each topic) in the list, using the correlation analysis. As stated in the previous example, the term “bill” is not related to the term “doctor,” and it is highly correlated with the terms “cost,” “hospital,” and “time.” The correlation values of these terms are shown in Table 1.5. For example, the term-topic probability “Φ_tw” of “room” is 0.0134 and correlated value “c” with “cost” is 0.0222. As stated in another example, the term “appointment” is highly correlated with the terms “doctor” and “time,” and it is grouped under the topic “medicare” and “time” as shown in Table 1.5. As an another example, the term “disease” is related with “doctor” and “hospital,” and it is not related with the terms “cost,” “time,” and “staff,” as shown in Table 1.5.

Table 1.5 Sample correlated terms selected by CFS_LDA.

Features	Cost	Medicare	Staff	Infrastructure	Time
High probable terms	cost	doctor	Staff	Hospital	time
Term-topic probability (Φ_tw)	0.0923	0.2132	0.2488	0.3152	0.1247
Correlated value (c )	1	1	1	1	1
Sample terms modeled by CFS_LDA
Room	Φ_tw C	0.01340.0222	0.00040.1378	0.00050.2392	0.04710.0402	0.01340.0222
Disease	Φ_tw C	0.0004-0.0347	0.02220.0547	0.0005-0.0462	0.00040.0948	0.0004+0.0408
Appointment	Φ_tw C	0.0134-0.0343	0.00920.1802	0.0005-0.0462	0.0004-0.0414	0.00040.0477
Patient	Φ_tw C	0.01770.0042	0.00040.1415	0.12470.1429	0.00040.1502	0.00050.2238
Bill	Φ_tw C	0.00040.0468	0.0265-0.0015	0.00010.1614	0.01210.1176	0.00050.2111

Table 1.6 shows the list of feature terms selected by the CFS_LDA model. Among the pre-processed and PoS tagged nouns, 68 terms are selected for the topic “cost,” 110 for “medicare,” 112 for “staff,” 101 for “infrastructure,” and 73 for “time.”

Table 1.6 List of correlated feature terms selected by CFS_LDA model.

Features of DS1	Number of terms selected by CFSLDA	Correlated feature terms by CFSLDA model (DS1)
Cost	68	cost, test, money, charge, day, case, department, patient, room, pay, bill, ...
Medicare	110	doctor, discharge, medicine, treatment, appointment, admission, disease, option, pain, reply, duty, test, meeting, …
Staff	112	staff, patient, medicine, problem, report, manner, management, treatment, complaints, …
Infrastructure	101	hospital, room, facility, meals, rate, …
Time	73	time, service, hour, operation, day, bill, ...

Thus, the feature extraction process model, the CFS_LDA model, selects not only the terms with high term-topic probability value but also the terms which are highly correlated with the topmost term under each topic, which are contextually equivalent. The terms which are positively correlated with the top probable term are selected for the list. The topic name, the set of terms, and their LDA score are given to the Ontology builder tool for repository. Figure 1.11 shows the spring view of domain Ontology built for Healthcare service reviews (DS1).

Figure 1.11 Spring view of domain ontology (DS1).

Figure 1.12 shows the precision vs recall value curve and f-measure value when OnSI model is used on different query documents. It shows that recall improves continuously for higher number of query documents. When the document size is 500, OnSI model gives precision 0.61, recall 0.53, and F-measure 0.57 values.

Figure 1.12 Precision vs recall curve for the Dataset DS1.

Semantic Web for Effective Healthcare Systems

Подняться наверх