Читать книгу Computational Intelligence and Healthcare Informatics - Группа авторов - Страница 45
2.3 Existing Models
ОглавлениеModels proposed in the past are mainly classified into two types: ensemble models and hybrid and pretrained models. Ensemble models either focused on classifying all fourteen pathologies or limited abnormalities like cardiomegaly, Edema, Pneumonia, or COVID-19. In pretrained models, initialization of parameters of deep learning models is done from ImageNet dataset, and then, the network is fine-tuned as per the pathologies targeted. This section deals with discussion on various existing models implemented in the literature along with issues they have addressed related to x-ray images, datasets used for training, and the type of pathologies detected by the model in chronological order of their implementation.
In [4], the deep learning model named Decaf trained on non-medical ImageNet dataset for detection of pathologies in medical CXR dataset is applied. Image is considered as Bag of Visual Words (BoVW). The model is created using CNN, GIST descriptor, and BoVW for feature extraction on ImageNet dataset and then it was applied for feature extraction from medical images. Once the model is trained, SVM is utilized for pathology classification of CXR and the AUC is obtained in the range of 0.87 to 0.97. The results of feature extraction can be further improved by using fusion of Decafs model such as Decaf5, Decaf6, and GIST is presented by the authors. In [41], pre-trained model GoogleNet is employed to classify chest radiograph report into normal and five chest pathologies namely, pleural effusion, consolidation, pulmonary edema, pneumothorax, and cardiomegaly through natural language processing techniques. The sentences were separated from the report into keywords such as “inclusion” and “exclusion” and report is classified into one of the six classes including normal class.
Considering popularity of deep learning, four different models of AlexNet [34] and GoogleNet [65] are applied for thoracic image analysis wherein two of them are trained from ImageNet and two are trained from scratch. Then, these models are used for detecting TB from CXR radiography images. Parameters of AlexNet-T and GoogleNet-T are initialized from ImageNet, whereas AlexNet-U and GoogleNet-U parameters are trained from scratch. The performance of all four models are compared and it is observed that trained versions are having better accuracy than the untrained versions [35].
In another model, focus was given only on eight pathologies of thoracic diseases [70]. Weakly supervised DCNN is applied for large set of images which might have more than one pathology in same image. The pre-trained model is adopted on ImageNet by excluding fully connected and final classification layer. In place of these layers, a transition layer, a global pooling layer, a prediction layer, and a loss layer are inserted in the end after last convolution layer. Weights are obtained from the pre-trained models except transition, and prediction layers were trained from scratch. These two layers help in finding plausible location of disease. Also, instead of conventional softmax function, three different loss functions are utilized, namely, Hinge loss, Euclidean loss, and Cross Entropy loss due to disproportion of number of images having pathologies and without pathology. Global pooling layer and prediction layer help in generating heatmap to map presence of pathology with maximum probability. Moreover, Cardiomegaly and Pneumothorax have been well recognized using the model based on ResNet50 [21] as compared to other pathologies.
In [28], three different datasets, namely, Indiana, JSRT, and Shenzhen dataset, were utilized for the experimentation of proposed deep model. Indiana dataset consists of 7,284 CXR images of both frontal and lateral region of chest annotated for pathologies Cardiomegaly, Pulmonary Edema, Opacity, and Effusion. JSRT consists of 247 CXR having 154 lung nodule and 94 with no nodule. Shenzhen dataset consists of 662 frontal CXR images with 336 TB cases and remaining normal cases. Features of one of the layers from pre-defined models are extracted and used with binary classifier layer to detect abnormality and features are extracted from second fully connected layer in AlexNet, VGG16, and VGG19 network. It is observed that, dropout benefits shallow networks in terms of accuracy but it hampers the performance of deeper networks. Shallow DCN are generally used for detecting small objects in the image. Ensemble models perform better for spatially spread out abnormalities such as Cardiomegaly and Pulmonary Edema, whereas pointed small features like nodules cannot be easily located through ensemble models.
Subsequently, three branch attention guided CNN (AG-CNN) is proposed based on the two facts. First fact is that though the thoracic pathologies are located in a small region, complete CXR image is given as an input for training which add irrelevant noise in the network. Second fact is that the irregular border arises due to poor alignment of CXR, obstruct the performance of network [19]. ResNet50 and DenseNet121 have been used as backbone for two different version of AG-CNN in which global CNN uses complete image and a mask is created to crop disease specific region from the generated heat map of global CNN. The local CNN is then trained on disease specific part of the image and last pooling layers of both the CNNs are concatenated to fine tune the amalgamated branch. For classifying chest pathologies, conventional and deep learning approaches are used and are compared on the basis of error rate, accuracy, and training time [2]. Conventional models include Back Propagation Neural Network (BPNN) and Competitive Neural Network (CpNN) and deep learning model includes simple CNN. Deep CNN has better generalization ability than BPNN and CpNN but requires more iteration due to extraction of features at different layers.
A pre-defined CNN for binary classification of chest radiographs which assess their ability on live customized dataset obtained from U.S. National Institutes of Health is presented in [18]. Before applying deep learning models, the dataset is separated into different categories and labeled manually with two different radiologist. Their labels are tallied and conflicting images are discarded. Normal images without any pathology were removed and 200,000 images were finally used for training purpose. Out of those images, models were trained on different number of images and performance of models noted in terms of AUC score. It is observed that modestly size images achieve better accuracy for binary classification into normal and abnormal chest radiograph. This automated image analysis will be useful in poor resource areas.
The CheXNet deep learning algorithm is used to detect 14 pathologies in chest radio-graphs where the 121-layer DenseNet architecture is densely connected [49]. Ensemble network is generated by allowing multiple network to get trained on training set and networks which has less average prediction error are selected to become the part of ensemble network. The parameters of each ensemble network are initialized using the ImageNet pretrained network. The image input size is 512 × 512 and the optimization of Adams was used to train the NN parameter with batch size of 8 and learning rate of 0.0001. To prevent dropouts and decay, network was saved after every epoch. To deal with overfitting, early stopping of iteration was done.
Considering the severity of TB which is classified as the fifth leading cause of death worldwide, with 10 million new cases and 1.5 million deaths per year, DL models are proposed to detect it from CXR. Being one of the world’s biggest threats and being rather easy to cure, the World Health Organization (WHO) recommends systematic and broad use of screening to extirpate the disease. Posteroanterior chest radiography, in spite its low specificity and difficulty in interpretation, is still unfortunately one of the preferred TB screening methods. Since TB is primarily a disease of poor countries, the clinical officers trained to interpret these CXRs are often rare in number. In such circumstances, an automated algorithm for TB diagnosis could be an inexpensive and effective method to make widespread TB screening a reality. As a consequence, this has attracted the attention of the machine learning community [9, 27, 28, 30, 33, 35, 38, 40 , 42, 68] which has tackled the problem with methods ranging from hand-crafted algorithm to support vector machines and convolutional neural networks. Considering the rank of TB in the list of cause of death worldwide, deep learning models are implemented for fast screening of TB [46]. The results are encouraging, as some of these methods achieve nearly-human sensitivities and specificities. Considering the limitation of availability of powerful and costly hardware and large number learning parameters, a simple Deep CNN model has been proposed for CXR TB screening rather than using complex machine learning pipelining as used in [30, 40, 42, 68]. The saliency maps and the grad-CAMs have been used for the first time to provide better visualization effects. As radiologist is having deeper perspective of the chest abnormalities, this model is helpful in providing second opinion to them. The architecture of model consists of five blocks of convolution followed by global average pooling layer and fully connected softmax layer. In between each convolutional block, a max pooling layer is inserted moreover, the overall arrangement is similar to AlexNet. Batch normalization is used by each convolution layer to avoid problem of overfitting. After training of network, silency-maps and grad-CAM are used for better visualization. Silency-maps help generating heat map with same resolution as input image and grad-CAM helps in better localization with poor resolution due to pooling. NIH Tuberculosis Chest X-ray dataset [29] and Belarus Tuberculosis portal dataset [6] are used for experimentation. It is observed that model facilitates better visualization of presence or absence of TB for clinical practitioners. Subsequently, by considering the severity of Pneumonia, a novel model which is ensemble of two models RetinaNet and Mask R-CNN is proposed in [61] and is tested on Kaggle pneumonia detection competition dataset consisting of 26,684 images. Transfer learning is applied for weight initialization from models trained on Microsoft COCO challenge. To detect the object, RetinaNet is utilized first and then Mask R-CNN is employed as a supplementary model. Both these models are allowed to individually predict pneumonia region. If bounding box of predicted region from both models overlapped then averaged was taken on the basis of weight ratio 3:1, otherwise it was used in the dataset without any change for detection by ensemble model. In addition, Recall score is obtained by the ensemble model is 0.734.
A model, namely, ChestNet, is proposed for detection of consolidation, a kind of lung opacity in pediatric CXR images [5]. Consolidation is one of the critical abnormalities whose detection helps in early prediction of pneumonia. Before applying model, three-step pre-processing is done to deal with the issues, namely, checking the presence of confounding variables in the image, searching for consolidation patterns instead of using histogram features, and learning is used to detect sharp edges such as ribs and spines instead of directly detecting pattern of consolidation by the CNN. ChestNet models consist of convolutional layers, batch normalization layers embedded after each convolutional layer, and two classifier layers at the last. Only two max-pooling layers were used in contrast to five layers of VGG16, and DenseNet121 in order to preserve the region of image where the consolidation pattern is spread out. Smaller size convolutional layer such as 3 × 3 learns undesirable features, so to avoid this author used 7 × 7 size convolutional layer to learn largely spread consolidation pattern.
A multi-attention framework to deal with issues like class imbalance, shortage of annotated images, and diversity of lesion areas is developed in [41] and ChestX-ray14 dataset is used for experimental purpose. Three modules which are implemented by the authors are feature attention module, space attention module, and hard example attention module. In feature attention module, interdependencies of pathologies are detected considering structure of ResNet101 model as base. Because of the ability of Squeeze and Excitation (SE) block [35] to model channel interdependencies of modules, one SE block is inserted into each ResNet block. The feature map generated by this module contains lots of noise and is learnt from global information rather than concentrating on small diseases related region. To avoid this, space attention module is introduced. In this module, global average pooling is applied on feature map obtained from ResNet101 [39]. This help in carrying out global information of image in each pixel which benefits classification and localization task. In hard attention modules, positive and negative images are separated into two different sets and model is trained on these individual sets to obtain threshold value of predicted score for each set. Then, set C is created which is combination of both sets and contained increased proportions of positive samples. The models is retrained on set C to distinguish 14 classes of thoracic diseases. This helps in resolving issue of presence of large gap in positive and negative samples.
Multiple feature extraction technique was used by author in paper [23] for the classification of thoracic pathologies. Various classifiers such as Gaussian discriminant analysis (GDA), KNN, Naïve Bayes, SVM, Adaptive Boosting (AdaBoost), Random forest, and ELM were compared with pretrained DenseNet121 which was used for localization by generating CAM (Class Activation Map) and integrated results of different shallow and deep feature extraction algorithms such as Scale Invariant Feature Transform (SIFT), Gradient-based (GIST), Local Binary Pattern (LBP), and Histogram Oriented Gradient–based (HOG) with different classifiers have been used for final classification of various lung abnormalities. It is observed that ELM is having better F1-score than the DenseNet121.
Two asymmetric networks ResNet and DenseNet which extract complementary unique features from input image were used to design new ensemble model known as DualCheXNet [10]. It has been the first attempt to use complementarity of dual asymmetric subnetworks developed in the field of thoracic disease classification. Two networks, i.e., ResNet and DenseNet are allowed to work simultaneously in Feature Level Fusion (FLF) module and selected features from both networks are combined in Decision Level fusion (DLF) on which two auxiliary classifiers are applied for classifying image into one of the pathologies.
The problem of poor alignment and noise in non-lesion area of CXR images which hinders the performance of network is overcome by building three branch attention guided CNN which is discussed in [20]. It helps to identify thorax diseases. Here, AGCNN is explored which works in the same manner as radiologist wherein ResNet50 is the backbone of AGCNN. Radiologist first browse the complete image and then gradually narrows down the focus on small lesion specific region. AGCNN mainly focus on small local region which is disease specific such as in case of Nodule. AGCNN has three branches local branch, global branch, and fusion branch. If lesion region is distributed throughout the image, then the pathologies which were missed by local branch in terms of loss of information such as in case of pneumonia were captured by global branch. Global and local branches are then fuse together to fine tune the CNN before drawing final conclusion. The training of AGCNN is done in different training orders. G_LF (Global branch then Local and Fusion together), GL_F (Global and Local together followed by Fusion), GLF all together, and then G_L_F (Global, Local and Fusion separately) one after another.
Lack of availability of annotated images majorly hinders the performance of deep learning model designed for localization or segmentation [53]. To deal with this issue, a novel loss function is proposed and the conditional random field layer is included in the backbone model of ResNet50 [22] whose last two layers are excluded and weights initialized on ImageNet have been used. In order to make CNN shift invariant, a low pass antialiasing filter as proposed by [73] is inserted prior to down sampling of network. This supports in achieving better accuracy across many models. NIH ChestX-ray14 has been used by the author which have very limited annotated images. Only 984 images with bounding boxes are used for detecting 8 chest pathologies and 11,240 images are having only labels associated with them. Furthermore, chest x-ray dataset is investigated which has many images with uncertain labels. To dispense this issue, a label smoothing regularization [44, 66] is adopted in the ensemble models proposed in [47] which performs averaging of output generated by the pre-trained models, i.e., DenseNet-121, DenseNet-169, DenseNet-201 [25], Inception-ResNet-v2 [64], Xception [12], and NASNetLarge [74]. Instead of ReLU, sigmoid function is utilized as an activation. In addition, label smoothing is applied on uncertain sample images which helped in improving AUC score.
A multiple instance learning (MEL) assures good performance of localization and multi-classification albeit in case of availability of less number of annotated images is discussed in [37]. Latest version of residual network pre-act-ResNet [22] has been employed to correctly locate site of disease. Initially, model is allowed to learn information of all images, namely, class and location. Later, input annotated image is divided into four patches and model is allowed to train for each patch. The learning task becomes a completely supervised problem for an image with bounding box annotation, since the disease mark for each patch can be calculated by the overlap between the patch and the bounding box. The task is formulated as a multiple-instance learning (MIL) problem where at least one patch in the image belongs to that disease. All patches have to be disease-free if there is no illness in the picture.
Considering orientation, rotation and tilting problems of images, hybrid deep learning framework, i.e., VDSNet by combining VGG, data augmentation, and spatial transformer network (STN) with CNN for detection of lung diseases such as asthma, TB, and pneumonia from NIH CXR dataset is presented in [7]. The comparison is performed with CapsNet, vanilla RGB, vanilla gray, and hybrid CNN VGG and result shows that the VDSNet achieved better accuracy of 73% than other models but is time consuming. In [67], a technique of using predefined deep CNN, namely, AlexNet, VGG16, ResNet18, Inception-v3, DenseNet121 with weights either initialized from ImageNet dataset or initialized with random values from scratch is adopted for classification of chest radiographs into normal and abnormal class. Pretrained weights of ImageNet performed better than random initialized weights from scratch. Deeper CNN works better for detection or segmentation kind of task rather than binary classification. ResNet outperformed training from scratch for moderate sized dataset (example, 8,500 rather than 18,000).
A customized U-NET–based CNN model is developed in [8] for the detection and localization of cardiomegaly which is one of the 14 pathologies of thorax region. To perform the experimentation ChestX-ray8 database was used which consist of 1,010 images of cardiomegaly. Modified (Low Contrast) Adaptive Histogram Equalization (LC-AHE) was applied to enhance the feature of image or sharpen the image. Brightness of low intensity pixel of small selected region is amplified from the intensities of all neighbouring pixels which sharpens the low intensity regions of given image. Considering the medical fact that the Cardiomegaly can be easily located just by observing significant thickening of cardiac ventricular walls, authors developed their own customized mask to locate it and separated out that infected region as image. This helped in achieving an accuracy of 93% which is better than VGG16, VGG19, and ResNet models.
Thoracic pathology detection not only restricted from CXR images but can also be done from video data of lung sonography. Deep learning approach for detection of COVID-19–related pathologies from Lung Ultrasonography is developed in [51]. By applying the facts that the augmented Lung Ultrasound (LUS) images improve the performance of network [62] in detecting healthy and ill patient and keeping consistencies in perturbed and original images, hence robust and more generalized network can be constructed [52, 55]. To do so, Regularized Spatial Transformer Network (Reg-STN) is developed. Later, CNN and spatial transformer network (STN) are jointly trained using ADAMS optimizer. Network lung sonography videos of 35 patients from various clinical centers from Italy were captured and then divided into 58,924 frames. The localization of COVID-19 pathologies were detected through STN which is based on the concept that the pathologies are located in a very small portion of image therefore no need to consider complete image.
A three-layer Fusion High Resolution Network (FHRNet) has been applied for feature extraction and fusion CNN is adopted for classifying pathologies in CXR is presented in [26]. FHRNet helped in reducing noise and highlighting lung region. Moreover, FHRN has three branches: local feature extraction, global feature extraction, and feature fusion module wherein local and global feature extraction network finds probabilities of one of the 14 classes. Input given to local feature extractor is a small lung region obtained by applying mask generated from global feature extractor. Two HRNets are adjusted to obtain prominent feature from lung region and whole image. HRNet is connected to global feature extraction layer through feature fusion layer having SoftMax classifier at the end which helps in classifying input image into one of the 14 pathologies. Another deep CNN consisting of 121 layer is developed to detect 5 different chest pathologies: Consolidation, Mass, Pneumonia, Nodule, and Atelectasis [43] entropy, as a loss function is utilized and achieved better AUC-ROC values for Pneumonia, Nodule, and Atelactasis than the model by Wang et al. [70].
Recently, due to cataclysmic outbreak of COVID-19, it is found by researchers that, as time passes, the lesions due to infection of virus spread more and more. The biggest challenge at such situations, however, is that it takes a lot of valuable time and the presence of medical specialists in the field to analyse each X-ray picture and extract important findings. Software assistance is therefore necessary for medical practitioners to help identify COVID-19 cases with X-ray images. Therefore, researchers have tried their expertise to design deep learning models on the data shared word wide to identify different perspective of spread of this virus in the chest. Many authors have created augmented data due to unavailability of enough CXRs images and applied deep learning models for detecting pneumonia caused due to COVID virus and other virus induced pneumonia. Author designed a new two-stage based deep learning model to detect COVID-induced pneumonia [31]. At the first stage, clinical CXR are given as input to ResNet50 deep network architecture for classification into virus induced pneumonia, bacterial pneumonia and normal cases. In addition, as COVID-19–induced pneumonia is due to virus, all identified cases of viral pneumonia are therefore differentiated with ResNet101 deep network architecture in the second stage, thus classifying the input image into COVID-19–induced pneumonia and other viral pneumonia. This two-stage strategy is intended to provide a fast, rational, and consistent computer-aided solution. A binary classifier model is proposed for classifying CXR image into COVID and non-COVID category along with multiclass model for COVID, non-COVID, and pneumonia classes in [45]. Authors adopted DarkNet model as a base and proposed an ensemble model known as DarkCovidNet with 17 convolutional layers, 5 Maxpool layers with different varying size filters such as 8, 16, and 32. Each convolutional layer is followed by BatchNorm and LeakyReLU operations here LeakyReLU prohibits neurons from dying. Adams optimizer was used for weight updates with cross entropy loss function. Same model was used for binary as well multiclass classification and the binary class accuracy of 98.08% and multiclass accuracy of 87.02% is reported. Another CNN with softmax classifier model is implemented for classification of ChexNet dataset into COVID-19, Normal, and Pneumonia class and is compared with Inception Net v3, Xception Net, and ResNext models [32]. In order to handle irregularities in x-ray images, a DeTraC (Decompose-Transfer-Compose) model is proposed [1] which consists of three phases, namely, deep local feature extraction, training based on gradient descent optimization and class refinement layer for final classification into COVID-19, and normal class. DeTraC achieved accuracy of 98.23 with use of VGG19 pretrained ImageNet CNN model.