Andrien J. Wang,
and Joseph M. Reinhardt
Department of Electrical Engineering and the Bioengineering Program
Division of Physiologic Imaging
Department of Radiology
University of Iowa College of Medicine
Iowa City, Iowa 52242
Medical imaging scanners now exist that can generate 4D cardiac images (time sequences of 3D volumes). Since the heart is an organ that exhibits motion, examining its image characteristics with a 4D image can give useful information about its condition. For multi-dimensional image segmentation, semi-automatic methods have many advantages over manual segmentation. This paper describes a procedure for performing semi-automatic image segmentation and analysis upon a 4D cardiac image. This procedure involves the input of user-defined information (cues) at certain time points of the sequence. These cues are then automatically interpolated or extrapolated for the remaining time points. The analysis system interprets the completed sequence of cues to generate a list of image processing functions that can subsequently segment and analyze the 4D image. This paradigm has been implemented using INTERSEG, an existing 3D cue-based analysis system. This cue-based analysis procedure permits 4D cardiac image segmentation with a small amount of user interaction. Performance of the proposed 4D image analysis system compares favorably to results generated by defining cues on each individual volume, as well as manual techniques. Further, the 4D approach requires significantly less interaction time than a 3D-only approach.
Keywords: image segmentation, cue-based image analysis, multi-dimensional image processing, medical imaging, 3D imaging, cardiac imaging, interpolation, 4D image processing.
Medical imaging scanners now exist that can generate 4D cardiac images (e.g., the Dynamic Spatial Reconstructor[1] and Imatron). A 4D cardiac image is a time sequence of 3D cardiac volumetric images. Much work has been done in the world of 4D imaging. Kriete et al. outlined 4D microscopy resources and visualization techniques in ref. kriete92, and ANALYZE[3] and VIDA[4] are medical image analysis packages that can be used for visualization and display of 4D images. In addition, studies on image enhancement using 4D mathematical morphology and 4D morphological filters [5,6,7,8] have shown their utility; one can remove artifacts that may be present at one time point, but not at the time points surrounding it. However, this paper is concerned, not with visualization or filtering, but with the problem of 4D image segmentation, in which regions of interest are extracted from the 4D image. Since the heart has motion and exhibits an inherently dynamic nature, it is beneficial to examine images of this organ from a 4D (time) point of view. Thus, the goal of this research is the development of a procedure to extract various regions of interest from 4D heart sequences.
Several studies on 4D cardiac analysis have already been completed [9,10,11,12,13]. This paper continues this type of work by proposing a basic procedure for segmenting regions of 4D images. Also, although the problem of analyzing cardiac images will be often mentioned, the procedure described is applicable to generic 4D images.
In the world of medical image segmentation, manual segmentation has been the prevailing standard.[14] Manual segmentation is a procedure in which a skilled operator uses slice tracing, region painting, etc., to define regions of interest. The use of this technique is necessitated by the complex structures present in medical image volumes; unfortunately, manual segmentation suffers from several drawbacks. It is extremely time consuming, is subject to both intra/inter-operator variability and human error, and doesn't use the full multi-dimensional image data. On the other end of the spectrum, there are digital image processing algorithms available for automated image segmentation which can generate reproducible results fairly quickly while using full 3D/4D images. These algorithms, however, are usually problem-specific, and an image-processing expert is needed to determine which image-processing functions are best suited for a given segmentation task.
Semi-automatic image segmentation, the strategy advocated in this paper, combines manual interaction with automated processing to solve segmentation problems. One specific form of semi-automatic segmentation that has been developed is cue-based analysis.[14,15,16,17] In this procedure, the user defines image information ( cues) and then a software system interprets these cues to generate a list of image-processing functions (a process) that can subsequently enhance, segment, and analyze the image. Semi-automatic segmentation has many advantages over manual and strictly automated techniques. It is much quicker, gives reproducible results, uses the complete image data set, and is minimally affected by human inconsistency and error. In addition, the user can harness the power of automated image-processing algorithms without being an image-processing expert. The user can also easily introduce problem-specific information to the segmentation task. INTERSEG[18] is the cue-based analysis system which we have implemented for segmentation of 3D volumes. The strategy INTERSEG uses can be summarized as follows[7,17]:
In the existing INTERSEG system,[17] only one 3D volume is analyzed at a time, and a set of cues must be specified for each volume to be analyzed. Since a 4D image can be viewed as a set of its 3D component volumes ( time points), it is possible to analyze 4D images using INTERSEG (i.e., analyzing each 3D time point image individually). However, this involves a fair amount of human interaction to define the cues. Also, a 3D ( x,y,z) approach does not exploit the fourth dimension (time). This paper describes a 4D paradigm for performing cue-based image segmentation and analysis upon a 4D cardiac image. This procedure involves the user-specification of cues at only selected time points of the sequence. These cues are then automatically interpolated or extrapolated for the remaining time points. 4D object interpolation/extrapolation algorithms have been developed to perform the cue-sequence completion. The 4D analysis system interprets the completed sequence of cues to generate a list of image-processing functions that will subsequently segment and analyze the 4D image.
The 4D image analysis paradigm is presented in Section 2. Section 3 presents notation, formally states the problem of 3D object interpolation and describes the implemented algorithm for cue interpolation. Results are presented in Section 4. Conclusions and future research directions are given in Section 5.
This section presents the 4D image analysis paradigm. Also discussed are issues such as the assumptions made about the images to be analyzed, the rationale for which cues are used for cue-sequence completion, and guidelines for user cooperation with respect to the analysis procedure. The 4D image-analysis paradigm is similar to the cue-based analysis methodology outlined in Section 1 and consists of four phases. The procedure has been implemented using INTERSEG[7,17] and is outlined below.
In the first phase, the user will:
After the user has specified cues and global image parameters for the 4D image, INTERSEG will then:
After using INTERSEG to generate the image-analysis process (phase 2), the user can apply the process to a 4D volume. In this third phase of the paradigm, the user directs INTERSEG to begin automatic image analysis. The first step of the processing is to perform any requested 4D filtering operations. Next, each time point of the sequence is segmented (by running IMPROMPTU) using the appropriate cues at each time point. Appropriate input, output, and cue images are used for each run of the process. In the fourth phase of the 4D image analysis paradigm, the user can peruse the results of the automatic analysis. The system-generated cues can be viewed and edited, if desired, and a new IMPROMPTU segmentation process can be created and executed (or the old process can be re-run with the new cues).
When developing the 4D image-analysis paradigm, we made several
assumptions concerning the 4D images that were to be segmented. It is
assumed that the regions to be extracted from the volume sequence do
not change drastically across time points. For example, in the DSR
cardiac sequence analyzed in Section 4, the main region of interest,
the left ventricular (LV) chamber, is generally enclosed by the heart
muscle (myocardium), and the myocardium is surrounded by air and
tissue in the chest cavity. Also, we assume that the properties of
the regions of interest should remain relatively constant over all
volumes of the time sequence. For example, if the LV chamber is a
single connected component at one time point, it should remain so in
all others. Another assumption is that the same regions are present
(and should thus be specified) for all time points. For example, if
the LV chamber and the myocardium are to be segmented at one time
point, it is unrealistic for the system to segment the rib cage and
air at other time points.
Because of these assumptions, symbolic cues (which describe region properties) are not interpolated (extrapolated, replicated) across time points. They are specified once for the entire volume sequence. Thus, the only cues which are interpolated are biopsy, inclusionary, exclusionary, and preclusionary iconic cues.
Although the 3D object interpolation and extrapolation algorithms
given in Section 3 were implemented in such a way as to be able to
operate on a fairly wide array of cue shapes and sizes, the proposed
system does assume that the user will cooperate to some degree.
Basically, this entails that the user follow two main guidelines.
The first guideline is that cues should be defined with the time variations of the volume kept in mind. For example, in the 16-volume DSR cardiac sequence that is analyzed in Section 4, the heart begins at end-diastole (ED) at time point 1. The heart contracts during time points 2--7 to end-systole (ES) at time point 8, and expands during time points 9--15 back to ED at time point 16. Time points 1 and 16 are nearly identical. So, the LV chamber starts out ``large,'' ``shrinks,'' and then grows ``large'' again. It is assumed that the user is aware of this trend and thus would specify cues for the LV chamber at a minimum of 3 time points --- one ``early'' time point (at ED), one ``middle'' time point (during ES), and one ``late'' time point (again at ED). It would be unreasonable for the user to specify cues only at time points 1 and 16 and expect the interpolated cues for the intermediate time points to be valid.
The second cue-definition guideline is that the user should define the iconic cues in a conservative manner. Specifically, cues should be specified according to the following criteria:
As mentioned previously, one of the key aspects of the 4D cue-based segmentation procedure is automatic cue sequence completion. This involves the interpolation, extrapolation, and/or replication of the user-specified cues defined at a number of selected time points into a full set of cues which range across all of the time points. Cue replication is a trivial task; the cue data is simply copied from the specified time point to the other time points. Cue interpolation and extrapolation, however, are somewhat more complex.
This section is devoted to the mathematical problem statements and algorithms of 4D cue interpolation and extrapolation. It first defines notation and formally states a general binary-object interpolation problem. It then gives the 4D cue-interpolation algorithm. It attempts to parallel the notation of ref. higgins93c as much as is feasible.
The use of these algorithms falls within the scope of the 4D
image-analysis paradigm. The exact procedure of determining which
cues are interpolated, extrapolated, and/or replicated is specified by
the user. These points were outlined in Section 2. Let s denote a
4D image, or volume sequence, defined over 4D Euclidean
space
. Let (x, y, z, t)
denote an 4D image point, or
hypervoxel, and let s(x,y, z, t)denote the gray-scale value of
4D image s at hypervoxel (x,y, z, t) .
For a typical radiological imaging scanner, the volume sequence s is
usually defined over a finite-extent lattice of
. So, let
where
,
z, and
t are the sampling intervals and N, K, and L are positive integers greater than 1. The above notation takes into account that the sampling intervals in the x and y directions are equal in most scanners --- hence, the same
for both x and y. Therefore, s is defined over the finite-extent lattice shown below:
As mentioned in the Introduction, we generally view a given 4D image as a time sequence of 3D volumes; i.e.,
where
denotes a 3D image, or volume, defined over 3D Euclidean space
. Also, since
contains all of the image information of s for
is considered the
time point of the 4D volume sequence, or
Similar to the 4D image described earlier,
let (x, y, z)
denote an 3D image point, or voxel, and
let
(x, y, z)be the gray-scale value of 3D image
at voxel
(x, y, z). Since these 3D volumes are the time points comprising the
4D image s, as defined by (1--4), each
is defined over a
3D lattice
, consistent with (2).
As an example for the DSR, a typical cardiac 4D volume sequence s
over one heart cycle may have dimensions N=90, K=95, L=16. This is equivalent to a set of sixteen 3D volumes of size 90 x 90 x 95,
where the image points and corresponding gray-scale values
of the 4D image at, say, time
i.e.,
are equivalent to the image points and
corresponding gray-scale values of the 3D volume
.
For cue interpolation, we track the change in shape and size of a 3D
object
over time, where
appears at various time points
in 4D image s. This object can be represented by a binary
image I, defined as follows
Furthermore, let
refer to
the boundary of
and let
refer to the collection
voxels in object
at time
; i.e., the collection of the
``1''-valued hypervoxels of the volume
.
(Alternately, it is the collection of the ``1''-valued voxels of
--- see below). In this discussion,
will
generally be a single connected component, since it is either a series
of biopsy cues or a large exclusionary cue. This restriction, however,
is not required by the cue interpolation algorithm.
The 4D image I is also only defined over a discrete finite space, consistent with the 4D sampling lattice of its gray-scale counterpart s. Also, similar to gray-scale volume sequence s, the 4D binary image I can be viewed as a sequence of 3D binary volumes
where
denotes a component 3D (time point) binary image of I, defined over 3D Euclidean space
; i.e.,
.
I might only be defined for a few select time points; e.g., if L=16, I might only be specified for l=0,8,12. The goal is to interpolate and extrapolate the data from these known time points,
, l=0,8,12, to obtain the complete 4D time sequence,
.
So, the object interpolation and extrapolation problem formally stated is
This problem can be viewed as a series of interpolations and
extrapolations. In the above example, two interpolations must be done
--- one using
and
as the end points, and another
using
and
as the end points. One extrapolation must also be done ---
using information from
and
, binary time points
are
extrapolated. The interpolation subproblem that must be solved can be
stated as

Two similar subproblems exist for the extrapolation of the binary object.
Since the objects (cues) to be interpolated and extrapolated are strictly binary (rather than gray-scale) images, shape-based information must be used in order to facilitate the interpolation and extrapolation. To do this, we will use an algorithm for calculating the 3D Euclidean distances of the base objects in combination with linear interpolation and thresholding. This algorithm is an extension of the 2D slice-interpolation procedure described in ref. higgins93c.
The basic shape-based interpolation method involves finding the 3D Euclidian distances of the object at the two given time points, interpolating the distance measures for the intermediate time points, and thresholding the interpolated 3D distance values to obtain the interpolated object at the intermediate time points. There are many ways to compute the distance measure. In this paper, 3D distances were determined using a 3D extension of a fast method proposed by Vincent and Soille [21].
Since the objects (cues) to be interpolated may have samples that are
widely spread apart between the two known time points (i.e., the
object is spatially located in one portion of the volume at time
and is located in an entirely different portion of the volume
at time
), the inclusion of centroid information into the
interpolation is used. The general idea is to translate the objects
at the given time points such that their centroids are equal, obtain
the interpolated objects as described above, return the two endpoint
objects to their original locations, determine interpolated centroids
for the interpolated objects, and translate the interpolated objects
to their correct positions based on the interpolated centroids.
Before presenting the interpolation algorithm, we define some
additional notation. As stated above, let
denote the
collection of voxels comprising object
at time
. Let
denote the 3D centroid of
with respect to the 3D
time point volume
. The cue
interpolation algorithm to solve
the cue interpolation subproblem, as stated in (8), is given below:
and
, of the
object at time points
and
.
and
, denote
them
and
which are similar to
and
, but where the objects have been translated such that
their centroids are equal. Collectively, the volumes (time points)
of
comprise a 4D image P,
similar to I. Also, denote the translated objects
and
and let
denote the 4D collection of the translated objects.
and
:
For a particular time t,
is the boundary
of object
.
represents the interior of
. The quantity d(x, y, z, t) is the 3D Euclidean
distance of point (x, y, z, t)
to
.
new volumes
at the intermediate time points, by interpolating the distance measures and
then by thresholding. Specifically, the binary values of the
lattice points for the interpolated volumes of P(t),
are given by
The quantity
, a univariate function of t passing through the known values of D, interpolates the distance D(x, y, z, t) for point (x, y, z, t). Many interpolation methods could be used, but we selected a linear function; i.e., for
to their original
locations; i.e., translate the centered objects
and
back to
and
.
interpolated
objects. Assume that the new centroids lie on a line segment
connecting
and
;
of
to their proper locations using the corresponding
centroid computed in step 6 --- this gives
.
The first 4D cardiac image analyzed (``rxa'') is a time sequence of a
canine heart over one heart cycle obtained from the
DSR[1]. It is composed of sixteen time points (3D
volumes) situated 33 msec apart. Each is of size 90 x 90 x 95, where each voxel represents a cube of approximately
(0.9mm)
in volume. As mentioned earlier, the beginning and end time
points (1 and 16) contain volume data of the heart near ED and the
middle time points (around 8) contain volume data of the heart near
ES. In this image, a Roentgen contrast agent has been injected into
the heart in order to make the LV chamber appear as a bright
object[22]. The goal is to extract the LV chamber from
this image. Cues were defined for the bright LV chamber, the gray
myocardium surrounding the chamber, and the surrounding dark air.

The first result shows the interpolation of LV-chamber biopsy cues. Biopsy cues have been defined at the first and eighth time points of ``rxa.'' At time point 1, a biopsy cue of the LV chamber (the bright area) has been defined as a series of gray disks on slices 19--23. Figure 1(a) shows the portion of this cue on slice 23. At time point 8, the same cue has been defined as a series of disks on slices 34--36. Figure 1(d) shows the portion of the biopsy cue on slice 36. (This second cue is a little difficult to discern due to the contrast of the figure).
Biopsy cues for time points 2 through 7 were interpolated using time points 1 and 8 as end points. Figures 1(b) and 1(c) show portions of the interpolated cues at time points 3 and 6. At time point 3, the interpolated biopsy cue spans slices 23--26; Figure 1(b) shows the portion on slice 26. At time point 3, the interpolated biopsy cue spans slices 30--32; Figure 1(c) shows the portion on slice 30. As expected, the centroid of the biopsy cue moves with the change in time. The interpolated biopsy cues are, as desired, valid samples of the LV chamber at the intermediate time points.
of volume.
These interpolated cues were then used in the 4D augmented version of INTERSEG to extract the LV chamber from the complete 4D image. Figure 2 is a graph comparing the performance of the 4D analysis methodology against manual segmentation, segmentation via the turnkey semi-automatic method discussed in ref. higgins90, and 3D cue-based watershed analysis from ref. higgins93a. The 4D image ``rxa'' was segmented to extract the left ventricular (LV) chamber. For the 4D analysis method, the 4D image was segmented using 3D cue-based relaxation labeling[16] in conjunction with a mild (8-connected) 4D close-open morphological filter[6], used for image enhancement and region smoothing. (Unfortunately, it is a little difficult to differentiate the lines for the 3D turnkey method and the 4D method --- for reference, the turnkey line has a value of 53885 voxels at time point 1).
Manual segmentation is considered to be the gold standard. We assume that the manually segmented volumes are ``correct'' values. Notice that all of the curves follow the same general trend: large volumes at ED (time points 1 and 16) and small volumes at ES (time point 8). 4D analysis provides results nearly identical to the other methodologies, but requires much less user interaction. All semi-automatic methods lie below the manual results. This is because: (1) the operator has difficulty precisely defining the 3D LV chamber's shape and (2) this particular human operator may be predisposed to being more conservative[23].
of volume.
Figure 3 is a graph showing the sensitivity of the 4D segmentation when specifying cues at different time points. Three trials were run, where 4D filtering and relaxation labeling were used as the enhancement and grayscale segmentation techniques. In all three trials, identical cues were defined at time points 1 and 16. The only difference between the three runs was at which ES time point cues are specified --- this was varied between time points 7, 8, and 9. The cues for the remaining time points were obtained via interpolation. Note that the third set of cues were always specified on a time point near time point 8. This was necessary because cues had to defined at both the ES and ED portions of the cardiac cycle. Defining cues at more varied time points (for example, time points 1, 3, and 16) produced invalid results.
The line on the graph representing segmentation based on user-defined cues at time points 1, 8, and 16 is identical to the ``4D With 4D Filtering'' line of Figure 2. Defining cues at time point 7 or 9 instead of 8, yielded slightly different volume values for the LV chamber, but the overall trend remained intact. Note that because the same cues were used for time points 1 and 16 in all three cases, the LV chamber volume at these time points is identical for all three lines.

The sensitivity of the 4D analysis system when specifying cues at the same three time points, but with varied shapes and sizes of the cues at those time points, was also studied. Cues were defined at time points 1, 8, 16 for ten runs of 4D analysis upon ``rxa.'' For each run, the biopsy cues specified for the LV chamber were altered. No other iconic or symbolic cues were changed. Table 1 shows the average, standard deviation, and minimum and maximum volume values for each time point. Note that the standard deviation of the volume for all of the time points is relatively small. As long as the cues are properly defined, INTERSEG will generate accurate and reproducible results.

Figure 5: Segmented volumes for time points 3, 6, and 12: (a)--(c) manual segmentation; (d)--(f) 4D analysis.
Figure 4 shows 3D shaded surface displays of the segmented LV chamber at time points 1, 8, and 16. The segmented volumes were obtained via both manual segmentation and the presented 4D analysis procedure (Figure 4). At these time points, the iconic cues were user-defined. Note the similarity between the pairs of segmented volumes. Figure 5 shows a few more 3D shaded surface displays of the segmented LV chamber from the same result --- these are from time points 3, 6, and 12. At these time points, the iconic cues used by the 4D analysis method were obtained using cue interpolation. The manually segmented volumes and those generated by the 4D cue-based procedure are still comparable, even with the automatically-generated cues. Hence, the visual results corroborate with the numerical results of Figure 4.
For the next set of results, three 4D Imatron EBCT sequences of a dog heart (``5150,'' ``5151,'' and ``5152'') under three different physiological conditions were segmented via the 4D analysis method. The three original images were of size 136 x 138 x 8 x 10, where each voxel represents a volume of size (0.585mm x 0.585mm x 8.0mm). These images were cropped of extraneous information and non-linear global-sigma interpolation[24] was used to obtain additional slices. The final images to used by the segmentation were of size 136 x 136 x 83 x 10, where each voxel represents a volume of size (0.585mm x 0.585mm x 0.585mm).
The three Imatron images were segmented into the following regions: the left and right ventricular (LV/RV) chambers, the myocardium, and air. Biopsy cues were defined for all four regions, and inclusionary cues were defined for the LV and RV chambers. These iconic cues were designated at time points 1 (ED), 6 (ES), and 10 (ED) --- the cues for the other time points were interpolated. 3D watershed analysis[14] was used to segment the images, and a maximum homogeneity filter was used for noise reduction. It was difficult to do exhaustive tests on the Imatron data for several reasons: (1) there are no benchmark values with which to compare the generated results (unlike the ``rxa'' image), and (2) the aforementioned very low z-resolution of the original images. However, the results obtained were promising.

Time point 1 of the Imatron data (following the truncation and interpolation explained previously) is shown in Figure 6. The same time point after segmentation is shown in Figure 7. In both images, every 11th slice of the 83-slice volumes is shown. The segmented regions are the left and right ventricular chambers, the myocardium, and air. The two dark areas are left and right ventricular chambers. The LV chamber is the rounder area on the right, and the RV chamber is the crescent-shaped area on the left. The results for the other time points (2--9) and sequences (``5151'' and ``5152'') were similar[7].

Table 2: Imatron data: Measured volumes of segmented LV and RV chambers.
Table 2 shows the chamber volume for the segmented LV and RV chambers of the three Imatron image sequences studied. Notice that the trend in chamber volume is similar to that of ``rxa'' --- large LV volumes at the end diastole time points (1 and 10), and small LV volumes at the end systole time points (near 6).
The first goal of this research was to implement a system for analyzing 4D medical images based on user-defined cues. This was accomplished by augmenting the existing INTERSEG system with capabilities to operate on volume sequences. Additionally, 4D morphological filtering was added to the system as a means of image enhancement for 4D volume sequences.
The second goal was to implement a method of obtaining a full set of cues (i.e., cues at all time points) from a limited set of given cues. A method of interpolating, extrapolating, and replicating cues was constructed and implemented. The 3D object-interpolation algorithm works effectively. Results for both generic test objects and actual biopsy and exclusionary (inclusionary, preclusionary) cues have shown the interpolation algorithm to be viable[7]. Extrapolation and replication are also possible mechanisms for cue sequence completion; however, they should be used sparingly since the cues that these two methods generate are more likely to be incorrect.
The cue-based 4D analysis system operates as expected --- as long as the cues specified by the user are ``reasonable,'' the interpolation scheme will also generate valid cues and the INTERSEG system will segment the volume sequence properly. Validation studies were performed on cardiac sequences obtained via the DSR scanner. The performance of the 4D system compares favorably with its 3D predecessor as well as manual techniques. Further, the 4D approach requires significantly less interaction time than the 3D-only approach. Sensitivity experiments have shown that defining cues at three time points of a cardiac sequence --- the ``beginning'' and ``end'' time points, and a ``middle'' time point --- are sufficient for proper segmentation. A study of the 4D Imatron cardiac sequence was also completed, with promising initial results. A critical point to realize, though, is that our 4D analysis paradigm utilized 3D image-segmentation methods. Possible future enhancements for the 4D analysis system include the addition of true 4D segmentation methods and inclusion of other types of 4D filters.
Papers |
DPI Homepage |
VIDA |
NLM |
Contact Us |
Search