Читать книгу Advances in Neurolaryngology - Группа авторов - Страница 65
Introduction
ОглавлениеThe principle of using flashing lights for examination of rapidly moving but periodic oscillation using stroboscopy is well known. Videostroboscopy (VSL) is one of the standard methods used to examine moving vocal folds and is used extensively for the analysis of vocal folds motion in the voice clinic. The qualitative aspects of interpretation of the stroboscopy motion was formulated in the initial book on videostroboscopy by Schönharl [1]. Stroboscopic signs associated with benign mucosal lesions can be systematically rated using a rating form [2]. Modern VSL with its immediate image capture and playback is able to reveal a number of abnormalities. These include abnormalities of laryngeal closure, absence of vibration, and vibratory asymmetry. The pliability of the vocal fold after surgical intervention can be assessed qualitatively by observation of vibratory characteristic using VSL [3]. Today, most clinical applications of VSL depend on subjective interpretation using parameters defined in prior publications (Bless and Hirano). Table 1 is a summary of target frequency and amplitude for four tokens of phonation in males and females commonly used in clinical practice.
Table 1. Target frequency and amplitude values for the four tokens for males and females
Objective measurements of the vocal fold vibration patterns are an attractive concept as it allows for the possibilities of objective documentation and study. To date, practical clinical application of objective measurements from stroboscopy have not been realized. This is because of multiple factors related to the complexities of the vocal fold vibratory image. The initial high-speed cinematography recordings of vocal fold vibration from the works of Timcke, VonLeden, and Moore [4–7] were studied by tedious frame by frame tracings to extract the trajectories of vocal fold motion and glottal area. This allowed for study of each fold trajectory, motion, and position over time. Done as a research tool, it established the parameters of vocal fold oscillation that are important in the understanding of the normal glottal cycle. Some of the parameters include factors such as open quotient, opening and closing speed index, and closed phase.
With the availability of digital image processing, the possibility of automated image extraction and quantification is now practical. High-speed imaging of vocal fold vibration and clinical applications are now readily available within the time and cost constraints of a clinical practice [8, 9]. Today, digital kymography (DKG) from high-speed video (HSV) images can be routinely captured and analyzed and compared. Today, these tools allow all investigators to capture and quantify vocal fold vibration function in the normal and diseased states for their investigative needs.
Sustained voice that is quasi-periodic can be studied by stroboscopy. During each vocal gesture of sustained phonation, normal human subjects should be able to sustain phonation within a narrow limit of frequency and amplitude. The steady state of phonation can be analyzed by evaluation of the glottal cycle by using videostroboscopy. The stroboscopy image is a montage made up of many hundreds of glottal cycles. Although not a true representation of each glottal cycle, if the vocal folds are oscillating within the norms of quasi-periodic vibration, the assembled stroboscopy video approximates the true glottal cycle.
Videostroboscopy is well established in clinical applications for evaluation of the dysphonic patient. It is most useful for identification of small mass lesions, and for verification of stiffness [10]. Identification of laryngeal asymmetry points to asymmetric rheological changes in the vocal fold cover resulting from mass, tension, or stiffness causes. Often, these asymmetrical changes are the indicators for recommendation for surgical intervention [11]. After surgery, VSL examination often shows improvement in the vocal fold edge, configuration, phase closure, and return of amplitude and mucosal wave [12]. Objective measures of vibratory capability based on VSL would be desirable. However, practical applications based on VSL of before and after treatment VSL have proved challenging. The flash-timing laryngeal videostroboscopy image is difficult to standardize from examination to examination. Control of the patient’s phonation volume, frequency, and even size of the laryngeal image on the monitor must be standardized [13]. Some authors have recommended overlaying images from the prior examination over the current examination using a transparency tracing in order to standardize the distance of the endoscope from the vocal folds and to standardize the size of the laryngeal image [14]. Such an approach is often not practical and reserved for the research laboratory.
If the VSL is recorded with steady phonation at the same loudness and frequency, one can use the VSL video images from the composite images from stroboscopy as a representation of the glottal cycle. Using the standard stroboscopic flash rate of 1.5 Hz above the fundamental frequency, a video frame rate of 30 frames per second will result in a complete glottal cycle in 20 video frames. If the patient can hold the steady phonation for 2 full seconds at this same fundamental frequency and loudness, then a montage of video frames can be acquired that is representative of the glottal cycle for that token. For the male, a 2-s phonation of three glottal cycles on the strobe video is a montage made up of approximately 250 glottal cycles, while for the female, this would be approximately 500 glottal cycles. This montage can be analyzed. Figure 1 is a montage of two glottal cycles obtained by capturing every frame of the video cycle from a stroboscopy video. By limiting the video frame of interest to the vocal folds, one can clip this video area and assemble them as a single image made of many video frames within the area of interest. This montage of the video frames can be clipped as a summary of the glottal cycle as a summary of the glottal cycle in the patient’s chart. One can appreciate that the vibratory pattern is regular, and the completion of a glottal cycle shows the characteristic pattern of open and closed phase with the phase difference between the upper and lower lip of the vocal fold.
Fig. 1. Two glottal cycles can be assembled as a montage of video frames by clipping the area of interest and assembled as a series of glottal images. This is a montage of two glottal cycles obtained by capturing every frame of the video cycle from a stroboscopy video.
From the glottal cycle, some simple measure can be obtained. The phase of glottal closure has been estimated to be 40–50% in the normal depending on gender [15]. From Figure 1, it is a simple measure to count the number of open frames and the number of closed frames and obtain a percentage of open phase. For modal phonation at 72- to 75-dB output, the phase closure is considered to be open phase dominant if it is greater than 60% open and close phase dominant if the glottal cycle is greater than 60% closed. Open phase dominant characteristic may then be further characterized as due to phase asymmetry, poor closure, or stiffness with reduced amplitude or mucosal wave. Figure 6 is an example of a glottal cycle from a subject with vocal keratosis showing open phase dominant pattern due to unilateral vocal fold stiffness.
Fig. 2. The image has been brightened and contrast enhanced and changed to a black and white image. Now, the glottal area is quite sharp and ready for automated edge detection and area extraction.
Quantification of the stroboscopic image comes from the initial manual measurements of the glottal cycle from high-speed cinematography [16]. One can trace out the edge of the vocal fold by eye to obtain the elliptical shape of the vocal fold margin. By dividing the area measured by the length of the vocal folds, one can get a normalized measure of the glottal area of function. When the glottal area of function is then assembled over the glottal cycle, this is called the glottal area waveform (GAW); the GAW can be plotted versus time.
The GAW is a measure of the glottal area function throughout the glottal cycle. The normal GAW has features that can be measured. These include the open and closed phase of vocal fold oscillation. The maximum glottal area and a minimum glottal area can also be measured. The rate of vocal fold opening and closing during all vocal fold oscillation can be calculated by measuring the slope of the GAW. The ability to rapidly change the configuration of the vocal fold or the opening and closing speed index is an indication of vocal fold pliability and has been shown to change before and after phonosurgery [12].
The GAW in normal showed significant differences between the sexes in GAW, peak glottal area, closed period, closing slope, and size and incidence of the posterior glottal gap. Intensity and frequency changes significantly affected GAW. Intensity variations affected the steepness of the closing phase and the duration of the closed period. Frequency changes affected the open period and the relative duration of opening versus closing [15].
The glottal gap area, amplitude, and degree of bowing over time can be tracked either manually or by digitized tablet. Measurement of the glottal gap probably represents the easiest measure. Estimates of the size of the glottal gap before and after surgical intervention have been obtained [17]. These studies do not require video stroboscopic analysis and are primarily based on digitized image of the vocal fold at its most closed phase. By measuring the glottal length and a number of pixels that are dark within the glottal gap, one can get a ratio of pixels to vocal fold length ratio. This can then be compared to before and after surgical treatment. This normalized glottal gap estimates have been used in studies to estimate the risk of aspiration in patients with vocal cord paralysis [18].
Image processing techniques can be applied to stroboscopic images to automate GAW extraction. Some techniques include contrast enhancement, edge detection, and image gradient analysis. Histogram equalization followed by maximum histogram gradient shift was found to be most effective in edge detection in a semiquantitative method for the detection of vocal fold vibratory pattern. This can reduce the subjective bias. Such algorithm was used for the study referred to above [15]. Software has become available for the analysis of the videostroboscopy image. One of the common software is the KSIP software (Pentax Medical, Montvale, NJ, USA)
Fig. 3. The glottal area has been traced out using the automated edge detection algorithm and the glottal area has been outlined for extraction. Note that the tracing is imperfect as mucous on the edge has been erroneously excluded from the vocal fold edge and will not be included in the area. These areas may need operator intervention to correct the deficiencies.
Fig. 4. The image analysis software has extracted the vocal margin for pixel count. This corresponds to the vocal fold margin. The operator can change the edge tracing by eye if needed.
By using imaging analysis algorithms based on edge tracking and image threshold shift and by limiting the edge detection to the area of the vocal fold edge, one can automate and track the glottal area and assemble them into a GAW. GAW is the representation of the change in the glottal area over time for the glottal cycle. Also by defining the midline, the software can tract the right versus the left fold area change in contribution to the total glottal area. Figure 5 is a plot of GAW from a normal male phonation at 108 Hz and 74 dB output.
We will illustrate how the montage of images can be converted to a GAW tracing from this stroboscopy montage.
Fig. 5. The extracted glottal area waveform (GAW) is the tracing in yellow and is depicted for two glottal cycles from a stroboscopy video from Figure 1. The frame number is plotted as the x axis and the pixel/glottal length is in the y axis. Note that the cycle-to-cycle repeatability is good with well-defined opening and closing speed, closed phase, and peak glottal area. These measures can be reliably obtained in normal patients. The other tracing is the plot of the right (downward) and the left (upward) contribution to the vocal fold area as defined by the midline. Thus, the amplitude of the GAW is made of the sum of the right and left vocal fold movement from the midline.
The video frames are converted to AVI format for automated image extraction. Once the image is selected, the frame that represents the vocal fold in its most closed phase is used for defining the beginning of the glottal cycle sequence. Using the software, the cursor is used to define the area of interest for automated image extraction. By limiting the cursor to the area of vocal folds, the image analysis routine can be best utilized to analyze the changes in the GAW function. Typically, 40 frames of video image representing two glottal cycles are used for analysis. A minimum of 2 glottal cycles is used to verify repeatability of the glottal cycle measures. Figure 1 is an example of a normal male glottal cycle from a videostroboscopy examination that has been captured and assembled in a photomontage for image analysis.
Once the color image montage has been assembled, it is usually necessary to enhance the image prior to automated edge detection. Figure 2 shows the image after it has been changed to a black and white image and subject to image brightening and contrast enhancement. The image edge detection algorithm is applied to automatically trace the glottal edge based on maximum histogram gradient shift. Figure 3 shows the automated edge tracing that has been applied. During the image analysis, the operator can visually check the area specified for image analysis and position the cursor on the glottal area to be analyzed. The extracted vocal margin is shown in Figure 4. The edge tracing of the glottal area is then assembled as a GAW with the frame number in the x axis. In addition to the GAW plot, right versus left vocal fold area relative to the defined midline can also be plotted. The GAW, the right and left fold GAW, is shown in Figure 5. Note that the open and closed phases are well defined, and the left (upward black tracing) fold area is approximate that of the right (downward red tracing). There is approximately 50% of the glottal cycle in the closed position with the right and left fold showing equal area opening and closing.
Fig. 6. This is a montage of two glottal cycles assembled from a stroboscopy examination of a female with hoarse voice. She has a right vocal fold keratosis with reduced amplitude. The subjective impression is open phase dominant pattern with loss of amplitude on the right.
For an example of abnormal vocal fold analysis, we will illustrate this with a patient with steady phonation but amplitude difference between the two folds. This is illustrated in Figure 6. This is a female with voice disturbance due to right vocal fold keratosis and right vocal stiffness. Differences in amplitude can be appreciated on the montage shown in Figure 6. With the image analyzed, the GAW is shown in Figure 7. Unlike the normal GAW (Fig. 5), there are several changes in the GAW plot derived from Figure 6. Note that there is a very short closed phase. The right and left area contribution to the glottal area is different and greater for the left fold than the right. This corresponds to the reduced amplitude noted during subjective observation. The right side is not completely stiff. The degree of reduction is approximately ½ the opposite more pliable fold. In this way, GAW and area analysis may be used to analyze pre- and postoperative stroboscopy images to objectively document changes in stroboscopy vibratory function.
An inherent limitation of videostroboscopy and trying to use GAW analysis is that not all patients have a perfect examination. Some patients will have tilting of the epiglottis that obscures the anterior commissure, while others have arytenoid hooding that obscures the posterior commissure. This makes it necessary to manually edit the edge tracking by eye to make the images useful for extraction of the GAW waveform. Even with this intervention, one can see on the GAW waveform that there are some irregularities in the smoothness of the waveform. This is one of the deficiencies of GAW analysis using stroboscopy light illumination as it exists today.
Although objective evaluation is difficult due to the composite nature of video recordings, some authors have tried to use the strobe image for quantification [19]. Provided the images are carefully collected and standardized, mucosal wave propagation can be identified for tracking, and some information regarding mucosal pliability can be estimated.
Despite the availability of objective image analysis software, only limited literature supports its use. This is due to imperfect automated image extraction requiring user interaction. Development of new automated image analysis routine is ongoing. For automated analysis, standardized recording of videostroboscopy and laryngeal image analysis algorithms are prerequisites to achieving objective measures of phonatory function before and after surgery [20].
Stroboscopy will continue to be an easy clinical tool useful for evaluation of the patient with dysphonia. With the new generation of high-definition videos and stroboscopy combined with special image filters such as narrow band imaging and florescence imaging, the role of imaging of the vocal fold is continuing to expand. As researchers come to some agreement as to which of the parameters from the stroboscopy image are most relevant in the evaluation of the vocal vibratory function, there is now a robust set of tools that could be applied. What is needed is a set of automated software tools that can assess the video, do preliminary analysis of the quality, and extract key parameters without extensive user input.
Fig. 7. The GAW plot shows the amplitude differences between the right and left fold well and shows the glottal cycle to be mostly in the opening and closing position with a short closed phase.