Vocal Tract Measurements Overview




In the past vocal tract measurements were made using 2-D sagittal plane film images from x-rays from the 1960's. These areas were inferred from the anterior-posterior width of the films. Also, plastic molds of a cadaver nasal cavities were made and the areas calculated from the molds. These area functions were use to make electrical circuits to produce sounds.1

More recently, CT and MRI studies have been used to produce the vocal tract areas. Both of these modalities be used in conjunction with computer programs to produces 3-D images of the vocal tract. The measurements may be taken from these shapes. The modality of choice is MRI since there is no harmful radiation given to subjects. The disadvantage of MRI is the fact that only static images of speech may be produced instead of images of running speech.

Along with the images of the subject, the speech wave is recorded and analysed in order to compare the speech simulation sound to the human subjects produced sound.

Here at the University of Iowa, MRI of human subjects are used and speech recordings made. The vocal tract measurements are made in VIDA's Tube Geometry Analysis (TGA) and then these measurements are then run through a wave-reflection model program outside of VIDA. A brief description of the measurements follow.

First it is necessary to segment the air within the airway from the surrounding tissue. Then interpret the segmented vocal tract using multi-object shape based interpolation. Next, Tube Geometry Analysis is used to calculates cross-sectional areas from oblique sections calculated to be perpendicular to the local airway long axis. Prior to computing the center line, the voxel intensities within the piriform sinuses were changed to a value lower than the brightest, so that the airway analysis algorithm would ignore them. For additional measurements, Region of Interest (ROI) program is also used to gather the cross-sectional area. The nasal tract and the nostrils were each measured separately using TGA. The piriform sinuses measurements are made separately using ROI. For detail information see Image Display and Analysis Protocol.

The cross-sectional areas (gather from VIDA) are then run on a separate wave-reflection program. This program requires that the vocal tract be divided into equal length sections. Which may not happen when using Tube Geometry due to the fact that the vocal tract is processed in sections. Additional measurements may be made in ROI to compensate. Typical choices for tract length were:

40 sections = 15.87 cm
42 sections = 16.66 cm
44 sections = 17.46 cm
46 sections = 18.25 cm
48 sections = 19.05 cm
50 sections = 19.84 cm
The area functions are normalized to the closest discrete length listed above. Then these normalized areas are used in the wave-reflection model progam. This program presently does not take into account soft tissue variables like yeilding walls, skin radiation, frequency dependent viscous loss, and multiple nasal branches. This wave-reflection model progam seems to replicate the static speech sounds below 2500 Hz fairly well. For frequencies above 2500 Hz, the correlation for simulated speech and human speech was poor.

References

1) Fant, G. The Acoustic Theory of Speech Production, Mouton, The Hague, 1960.






©1994-99 Division of Physiologic Imaging, Dept. of Radiology, Univ. of Iowa


Voice | Tutorials | DPI Homepage | VIDA | NLM | Contact Us | Search



Last modified: Fri Jun 4 12:47:21 CDT