Читать книгу Advances in Neurolaryngology - Группа авторов - Страница 66

High-Speed Video

Оглавление

To correct the deficiencies of stroboscopy, high-speed imaging has been of interest since the 1970s.

High-speed cinematography was used in the 1930s to 1960s to record vocal fold vibration [21]. Its use is not practical for clinical purposes. Video high speed using the CCD chip camera is lower cost and can be adapted for laryngeal visualization. The initial report using high-speed digital imaging was by Hirose [22, 23]. This was limited to 100 × 100 pixels and had poor detail resolution. Nevertheless, many new observations were available in terms of vibratory asymmetry as well as sources of vibration that could not be observed using standard videostroboscopy. Other authors have used high-resolution high-speed cameras to make observations regarding aperiodicity and asymmetry and published the results [24]. With the continuing cost reduction of CCD cameras, a commercially available high-speed video system became available in early 2000. This system initially had 256 × 256 pixel resolution in black and white. Today such as system is available in 512 × 512 in color (Pentax Medical, Montvale, NJ, USA). HSV is available in even greater resolution and speeds, but its application in laryngeal imaging is still limited.

HSV overcomes some of the problems with stroboscopy. HSV allows thousands of pictures to be taken of the vibrating vocal folds per second. When greater than 10 images of the vocal fold per glottal cycle can be captured, then each glottal cycle can be imaged without the concern for viewing a composite image as from stroboscopy. If quantification of the image is needed, according to the Nyquest theorem, it is preferred to have 20 frames per glottal cycle to permit frequency analysis. Therefore, if the vocal folds are vibrating at 100–200 Hz for males and females, a sampling rate of a minimum of 2,000 frames per second is necessary with 4,000 frames per second preferred. Rigid endoscopes are normally used for HSV due to the need to illuminate the larynx at these high capture rates. A 300-W xenon light source is necessary.

An emerging technology using fiber scopes coupled to a high-speed camera has also been described [25]. Flexible high speed has the added advantage of evaluation of the changes in the resonating structures of the supraglottic larynx and pharynx without the need to hold the tongue, thereby giving the researcher a more physiologic assessment of vocal fold and aerodigestive tract function during phonation, deglutition, and respiration.

Some indications for HSV that can be applied to study normal voicing and singing gestures include imaging of the laryngeal gesture onset, steady state, offset [26].

Some applications of clinical uses for HSV in voice-disordered patients include: (1) those patients with voice breaks; (2) patients with undefined tremor, wobble, or tremolo to the singing or speaking voice; (3) patients with spasmodic dysphonia and voice onset delay [27]; and (4) patient with diplophonia [28].

After video capture, kymographic analysis from the video can be carried out off-line. In one clinical model, up to 8,000 grayscale images of 256 × 256 pixels can be stored by the Wolf HS Endocam 5560. A maximum of 4,000 images per second can be taken by the high-speed camera [29]. In contrast to stroboscopic investigations, aperiodic movements of the vocal folds can be visualized. The duration of a recording differs from 2 to 4 s depending on the capture speed. Due to the short sampling time, the examiner must time the capture time to the time of interest. In our clinic, we capture 8 s of data at 2,000 frames per second to capture voice onset, offset, and steady-state phonation in 2–3 tokens.

Using HSV, initial observations about laryngeal physiology using HSV were able to show that combining HSV with other measures such as electroglottography is very helpful in visualization of voice onset and offset, singing gestures, and extremely high phonation. Furthermore, analysis of various singing styles could be realized [30, 31]. Clinical report of HSV to identify diplophonia pattern consistent with vocal fold paresis was also reported as being important in identification of a patient with previously undiagnosed vocal fold paresis [32].

Some of the focus in recent years has been to look for a way of automated analysis of the HSV. This is because at 4,000 frames per second of acquisition, playback is usually at 20 frames per second. This means that 2 s of phonation acquisition will take 8,000 frames. This will take 400 s or close to 7 min in order to review a 2-s video sample. Another deficiency of the high-speed system is the necessity to acquire the image sequence and then subject it to review. If the sample time is insufficient, a repeat examination must be done. It is anticipated that system improvements and software for detection of the video frames for areas of interest will evolve, making review and analysis easier.

The images obtained by high speed can be analyzed in the same way as the stroboscopy image. The major advantage is that the images are real-time and not based on a montage. Because the images are acquired in such brief period with steady light illumination, the images are very stable for image analysis.


Fig. 8. The figure shows four DKG tracings of beginning of phonation from a high-speed video of a normal male phonating at 125 Hz and 76 dB output. Note the vocal folds are brought into adduction without complete closure. The oscillation starts with both folds contributing to flow modulation. The steady state is reached in 4–5 glottal cycles.

Measurement of vocal fold movements can also be done either with high-speed imaging or with short interval, color-filtered double-strobe flash stroboscopy. The strobe flashes are color filtered and are separated by a brief interval. By this means, a double exposure is created in each video frame. Real-time visualization of opening and closing velocities over the entire length of the vocal fold from anterior to posterior is possible. Quantification is possible off-line after image calibration [33]. Another method for data reduction from high-speed motion is to use multiple line kymography lines. In this technique, the video is acquired at high-speed mode. The line of interest across the mid-membranous vocal fold at several sites is selected for DKG extraction, and a videokymography plot is generated. This technique is more practical than the line-scanning camera since the line of interest can be defined after the high-speed images have been acquired [34]. Using multislice kymography, asymmetry and breaks can be compared and measured [35]. Mapping high-speed movies of vocal fold vibrations into 2-D diagrams have been proposed and have been termed phonovibrography [36].

The initial impetus for videokymography is to use a line scanning mode of the traditional camera but look at only one video line at high rates. Vocal fold vibration was observed with the aid of videokymography, during which images from a single transverse line can be recorded. Successive line images were shown in real time on a monitor, with the time dimension displayed in the vertical direction. Videokymography, using a modified CCD video camera, works in two modes: standard and high speed. In standard mode, the vocal folds are displayed on a video monitor in the usual way, providing 50 images per second (or 60 in the National Television Standards Committee (NTSC) system). This is used for routine laryngoscopy and examination of the larynx. In high-speed mode (nearly 8,000 images per second), only one line from the whole image is selected and displayed on the x axis of the monitor; the y axis represents the time dimension. This system enabled the assessment of left-right asymmetries, open quotient, and propagation of mucosal waves. The scanning camera using a single line can be placed on the area of interest of the laryngeal image, and line scanning of that line can be achieved at high speed. This was initially reported by Schutte and Svec [37]. This was reported for routine laryngoscopy and stroboscopic examination of the larynx in clinical applications. In high-speed mode (nearly 8,000 lines per second), only one line from the whole image is selected and displayed on the x axis of the monitor; the y axis represents the time dimension. They reported that all vocal fold vibrations, including those leading to pathological rough, breathy, hoarse, or diplophonic voice productions can be observed [28]. Videokymography was able to detect small left-right asymmetries, open quotient differences along the glottis, lateral propagation of mucosal waves, and movements of the upper margin.


Fig. 9. An abnormal DKG in a subject with vocal fold scar acquired during voice onset. In this abnormal tracing, one can appreciate there is a prolonged delay after the vocal folds are in approximation before the beginning of just noticeable vocal fold vibration. There are also visible differences in the vibrator characteristics between the posterior tracing and the anterior tracing and differences between the stiff left side and the pliable right side.


Fig. 10. Edge detection software has been applied to the DKG tracing from a normal with the right and left edge traced for the DKG. The DKG tracing is plotted on the right as a function of time. This is vibrogram waveform. Note this is made of two waveforms representing the right and left fold excursion as a function of time. These two waveforms will be analyzed by FFT signal analysis.


Fig. 11. Spectral power plot for analysis of the DKG derived from a normal subject shown in Figure 8.

With the high-speed video camera, the clinician can specify the site of the kymogram and assemble a digital kymogram. Multiple digital kymograms can be assembled by specifying the line of interest. An example of digital kymogram from two sites for a normal is shown in Figure 8. In this tracing, the voice onset during the beginning of vocal fold oscillation is captured. The token of phonation is in a male phonating at 125 Hz and 76 dB output. The initial onset of vocal fold vibration is preceded by vocal fold adduction into a pre-phonation set position of adduction but not complete closure. The vibration starts small and then ramps up to full amplitude of vibration in 4 or 5 cycles depending on the amplitude and frequency of the vocal gesture. Note the oscillations ramp up to steady state in a gradual fashion. Both vocal folds are in phase to each other, and when vibration is established, the kymogram shows each glottal cycle to be quite regular with well-defined contact closed phase and opening and closing phase.

Figure 9 is an abnormal DKG in a subject with vocal fold scar. In this abnormal tracing, one can appreciate there is a prolonged delay after the vocal folds are in approximation before the beginning of just noticeable vocal fold vibration. The duration to established steady state of vocal fold oscillation is also longer. After the established vocal fold vibration, the pattern becomes regular. There is a difference in the vocal fold tracing between the posterior DKG tracing (line 1) versus the anterior (line 2). The more pliable side on the right anterior fold has an extra dichrotic notch to the DKG tracing compared to the stiffer left side. This would not be appreciated using stroboscopy.

Next, we will demonstrate some tools that can be used to measure the DKG waveform.

Once the video image has been acquired, the cursor is placed on the mid-membranous vocal fold. Multiple lines may be placed to obtain multiple DKG lines. Once the video-kymography display is done, the image can be converted to the DKG waveform using edge detection software. Figure 8 is the line tracing of the DKG waveform based on edge detection. Figure 10 is the vibrogram waveform extracted from the edge detection of the same subject as in Figure 8 during steady phonation at the same loudness and frequency. One can appreciate two lines, with one line representing the right and the other the left vocal fold. The vibratory patterns can be transformed into a numerical waveform plot using edge detection software (KSIP Pentax Medical, Montvale, NJ, USA). The waveform values are time and frame locked and can be used for signal analysis and display (Fig. 10). One way the signal can be analyzed is to subject the waveform to power spectrum analysis and fast Fourier transformation to derive the frequency versus power plot. This is shown in Figure 11. Note that there are two lines, with one line representing the left and the other the right. Some features are typical of the normal power spectral plot. This has been studied in a small number of normal subjects [38]. A normal spectral plot of the digital kymogram waveform obtained at modal frequency and amplitude should have the following characteristics: the energy is limited to the fundamental frequency and its harmonics, there is little subharmonic energy and interharmonic energy, and there is symmetry of the spectral plot between the two vocal folds (Fig. 11).


Fig. 12. This is a DKG tracing of steady state in a female with essential tremor. By eye, the DKG looks quite normal. With spectral analysis of the DKG shown in Figure 13, we can get further details of the frequency and amplitude of this DKG tracing.


Fig. 13. Edge detection has been applied to extract the vibrogram from the DKG tracing in Figure 12.

The spectral characteristics of normal vocal fold vibration can be characterized by the spectral shape, the spectral slope, and the degree of vibrogram asymmetry between the right versus the left fold. In a small series of normal adults, there is good concurrence as to the spectral characteristics between normals [38]. Further study will be needed to see the changes in spectral shape as variations in acoustic amplitude and loudness varies.

Figure 12 is the DKG tracing of a patient with vocal tremor. Although the tracing looks near normal, we can use the edge detection algorithm to transform the DKG to a vibrogram (Fig. 13) and then to a vibrogram waveform (Fig. 14). Spectrogram analysis of the vibrogram wave is demonstrated in Figure 15. Compared to normal (Fig. 11), this spectrogram shows amplitude splaying with a widened spectral energy at the fundamental frequency and its harmonics. This is an example how frequency analysis can be used to analyze vibratory anomalies in patients with neurolaryngological voice disorders.


Fig. 14. The vibrogram tracings are ready for analysis by FFT.

It is helpful to go beyond visual inspection of the DKG plot in order to better understand vibratory characteristic differences between normal and pathological cases. We illustrate this by comparison of spectrogram analysis of normal versus pathological states of mass, stiffness and tension as characterized by polyp, scar, and paralysis.

Figure 16 is a power spectral plot of the DKG waveform of 3 normal subjects. They all show symmetry of vocal fold spectral peaks with equal energy peaks from both folds. Both folds have energy in the fundamental frequency of vocal fold vibration with decreasing energy noted in the first three harmonics. There is little energy in the interharmonic area.

Figure 17 shows three spectral plots from a subject with right vocal fold polyp, left vocal fold paralysis, and right vocal fold scar. In all three, there are differences from the normal. In all three, there are asymmetric spectral peaks between the right versus the left folds. The spectral peaks show the energy peak to be limited to only the fundamental frequency with little higher harmonic energy. There is now energy in the low frequency area and large amount of interharmonic energy in all three plots when compared to normal. The case of unilateral scar shows the side with normal vibration having a good power spectral peak, while the opposite vocal fold has almost no energy in the fundamental. The side with the polyp shows reduced amplitude with lower peak energy in the fundamental frequency. In the subject with paralysis, the side with the paralysis not only has lowered energy in the fundamental, there is an extra spectral peak contributed by the paralyzed vocal fold that is vibrating at a different frequency. This is believed to be the source of the patients’ perceived diplophonia. In the paralysis patient, a clear subharmonic peak is now detectable, indicating in- and out-of-phase vibration creating subharmonic of vocal fold oscillation. When each vocal fold has different fundamental frequency of vocal fold vibration, diplophonia may result with phase interactions that result in subharmonics below the fundamental frequency of each vocal fold. With sustained phonation, the vocal folds come in and out of phase, resulting in subharmonic energy. Using such analysis, diplophonia may be revealed as discreet vocal fold oscillations or subharmonics of the fundamental frequency.


Fig. 15. The FFT plot shows symmetric spaying of the fundamental frequency with a wide splayed apart energy centered at her fundamental frequency (F0).

Using the above set of tools, applications of HSV to a variety of neurolaryngology disorders can be anticipated. Some of these include study of diplophonia in patients with presbyphonia, study of vocal tremor types, study of paresis and paralysis, and study of voice-onset voice disorders such as spasmodic dysphonia.


Fig. 16. Power spectral plot of the digital kymography waveform of 3 normal subjects.


Fig. 17. Three power spectral plots from a subject with right vocal fold polyp, left vocal fold paralysis, and right vocal fold scar.

Advances in Neurolaryngology

Подняться наверх