Virtual Gel Profiling System
FIELD OF THE INVENTION
The present invention relates in general to a computerized system
analyzing profiles of biological sample signals created on a gel. More
specifically, it relates to a computerized system that is capable of reading,
refining, quantifying, comparing, organizing, storing and retrieving gel
signals of biological substances such as nucleic acids or proteins derived
from a plurality of samples of various origins, to qualitatively and
quantitatively characterize the biological states of such samples.
BACKGROUND OF THE INVENTION
The advancement of molecular biology techniques in the last few
decades has made it possible to elucidate a living organism's physiological
states in terms of the molecular composition of its tissues and cells. That
is, for example, a certain disease state can be characterized by a set of
regulatory and structural proteins expressed at certain levels and the
corresponding set of genes turned on, while the normal state is
characterized by a different set of proteins expressed at different levels
with the correspondingly different set of genes turned on. To take an
accurate "snap shot, " the existing protein, deoxyribonucleic acid (DNA)
and ribonucleic acid (RNA) species in the tissue or cell need to be
identified, and their amount need to be quantified. The separation of
these macromolecules is achieved by molecular sieving in conjunction
with charge separation. Agarose and cross-linked polymer of acrylamide
are commonly used for this purpose. The evolution of such gel separation
systems results in today's high throughput and high resolution in the
5 identification of protein and nucleic acid fragments, as evidenced by the
performance of nucleic acid sequencing gels.
Such advancement, in the meantime, creates a pressing need to
develop comparable gel reading and analysis systems that can accurately
extract and analyze the signals produced on the gel, particularly in the
10 context of quantitative studies on biological states by taking "snap shots"
or "profiling" the molecular components in tissues or cells. To effectively
generate comprehensive profiles of various biological states based on a
gel system, overall high throughput, high resolution and high flexibility
must be achieved by a gel profiling system.
i s Numerous commercial programs are available today to process gel
images created by a gel scanner. However, these programs have a
number of defects that render them insufficient for many profiling studies.
Adjusting for the background is often done in an ad-hoc fashion, without
reliable standards. The distortion of lanes in a DNA or RNA gel can
0 severely reduce the accuracy in reading band signals. There is no
mechanism to safeguard against artifacts in one lane comparing to other
lanes. The existing problems like these make it difficult to establish
accurate biological profiles based on the signals on the gel. In addition,
such programs when used in gene expression studies, for example, build
only limited profiles from data gathered beforehand. The disconnection
between data generation (e.g. gel electrophoresis), data processing(e.g.
gel scanning, gel reading), and data analysis (e.g . profile computing) on
the one hand and data organizing and storage (e.g . profile database) on
the other hand is a serious barrier to effective profiling studies.
Furthermore, the lack of intelligent, user-interactive monitoring and
controlling systems in such studies has led to non-coordinated, manual
work which threatens the consistency of the results and creates
additional difficulties in interpreting them.
SUMMARY OF THE INVENTION
To resolve the above problems, the present invention is directed to
a computerized system for processing gel signals and establishing
biological profiles for samples of different origins. In particular, the
present invention relates to a set of improved methods and processes for:
( 1 ) reading and refining gel signals; (2) constructing profiles directly from
the refined signals; (3) storing such profile information into a connected
database to allow for interactive and timely retrieval; and (4) monitoring
and controlling such profile analyses by accepting the user's input in real
time.
Briefly summarized, the computerized gel profiling system
comprises a number of elements according to the present invention. A
gel separating system identifies and separates macromolecules, e.g., a
polyacrylamide gel electrophoresis (PAGE) system that separates
deoxyribonucleic acid (DNA) or ribonucleic acid (RNA) fragments. A gel
scanner is provided to generate gel images. A computerized image
processing system processes these images through a series of steps to
extract and refine the signals. And, a selector facilitates the selection of
sample signals of interest. Finally, a profile compiler is provided to
construct profiles for any selected sample signals. A database connector
then provides support for information storage and retrieval.
In accordance with one aspect of the present invention, the
computerized image processing system begins with an image reader that
reads and identifies sample signals presented by the gel image, e.g., lanes
and bands on a PAGE gel, and that records the lane and band information,
preferably in a file. A trace generator then creates a trace for each lane
based upon this lane and band information, and this trace is in turn
aligned with all other traces. The background signals are then subtracted
from the aligned traces. Additionally, signals in each lane are normalized
to allow for cross-lane comparison. Finally, the intensity value of each
band is read and recorded according to the aligned, background-adjusted
and normalized traces. In the meantime, a gel simulator is used to
construct virtual lane and band images from these aligned, background-
adjusted, and normalized traces.
In accordance with another aspect of the present invention, the
selector generates a set of selected signals in one of the two ways: (1 )
the user identifies sample signals directly; and (2) the selector highlights
the sample signals on the gel image satisfying a pre-configured criteria,
which is derived from the sample information accessible through the
database connector.
In accordance with yet another aspect of the present invention, the
profile compiler is capable of constructing biological profiles in one of the
two ways defined. First, it can create a corresponding profile by
organizing the intensity values for all detectable signals according to one
of the three formats supported by the gel profiling system. Alternatively,
it can create a corresponding profile by organizing the intensity values for
selected signals, e.g. bands selected by the user, according to one of the
three formats supported by the gel profiling system. The three supported
formats are ( 1 ) text file, (2) tabular form, and (3) an appropriate data
structure registered in a connected database.
In accordance with a fourth aspect of the present invention, the
database connector provides a channel through which the profile
information is exported to a connected database, using a set of database
write methods to update relevant fields in the database. In parallel, the
sample information is retrieved through the database connector from a
connected database, using a set of database read methods for each
corresponding field in the database.
In accordance with a fifth aspect of the present invention, the
computerized gel profiling system further comprises a graphical interface
that permits the user to monitor and control the activities of the system.
This graphical interface includes three different windows or viewers. An
image viewer allows the user to view the original images. In this viewer,
the user can magnify the images as needed, and can circle or box a band
of interest to select it. A virtual gel viewer allows the user to view the
simulated gel images constructed based on the processed original images.
This viewer enables zooming in/out and scrolling up/down for optimal
viewing. And a trace viewer allows the user to view the traces of all the
lanes, which are drawn based on the processed original image. This
viewer similarly enables zooming and scrolling.
In accordance with a sixth aspect of the present invention, the
graphical interface further includes a profile editor. This editor allows the
user to adjust the boundaries of a band while computing its intensity
value to be included in a profile. In addition, the user can modify intensity
values of a band within a profile, and the user can also delete a profile
altogether through use of this profile editor. Lastly, this editor takes the
user's input to establish a threshold value for determining differential
profiles, which are profiles containing differential intensity values It then
uses this threshold value to identify those profiles that are distinguishable
from others by the differential criteria.
The present invention, therefore, brings significant improvements in
biological profiling studies based on a gel system The gel profiling
system in accordance with the present invention achieves high
throughput, high resolution, high reliability and high flexibility.
Further features, objects, and advantages of the present invention
are apparent in the drawings and in the detailed description that follows.
The features of novelties that characterized this invention are pointed out
in the claims annexed to and forming a part of this specification.
BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 is a block diagram presenting a general overview of a Gel
Profiling System 1 20 designed in accordance with the present invention;
Fig. 2 is a block diagram of the elements and processes in the
Computerized Image Processing System 1 04 in Fig. 1 ;
Fig. 3 is an illustration of band counts in each lane computed by
the Band Counter 206 in Fig . 2;
Fig . 4(a) , 4(b), and 4(c) are illustrations of traces in the Trace
Viewer 702 in Fig . 7: Fig . 4(a) illustrates raw traces, Fig . 4(b) illustrates
aligned traces, and Fig . 4(c) illustrates aligned, background-adjusted, and
normalized traces;
Fig . 5 is an illustration demonstrating a normalizing reference trace
in the Trace Viewer 702 in Fig . 7;
Fig . 6 is a screen shot of a display of the Virtual Gel Viewer 704 in
Fig. 7;
Fig . 7 is a block diagram presenting various components and their
interactions in the Graphical Interface 1 06 in Fig . 1 ;
Fig . 8 is a block diagram of the processes and their relationships
within the Trace Viewer 702 in Fig. 7;
Fig . 9 is a block diagram of the processes and their relationships
within the Virtual Gel Viewer 704 in Fig . 7;
Fig. 1 0 is an illustration of an original gel image presented by the
Image Viewer 706 in Fig. 7;
Fig . 1 1 is a block diagram of the components and operations within
the Profile Editor 708 in Fig. 7;
Fig . 1 2 is a screen shot of a display of the Profile Editor 708 in Fig .
7;
Fig. 1 3 is a block diagram of elements and processes in the
Selector 1 08 in Fig. 1 ;
Fig. 1 4 is a block diagram of the elements and processes in the
Profile Compiler 1 10 in Fig. 1 ; and
Fig. 1 5 is a block diagram illustrating the two-way data flow within
the Database Connector 1 1 2 in Fig . 1 .
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
Fig. 1 presents an overview of a Gel Profiling System 1 20 designed
in accordance with the teachings of the present invention, and
constituting a preferred embodiment of the invention. Referring to Fig. 1 ,
a Gel Separation System 1 00 separates and thus identifies biologically
active macromolecules from a set of samples of various origins. These
samples can be prepared in many different ways and for different
purposes. They can be normal or diseased tissue samples, body fluids
taken from patients at various progression stages of a particular disease,
cultured cells treated with various inhibitory or stimulatory agents as to a
specific cellular activity, or nucleic extracts of certain cell types
throughout the developmental stages of a test animal, etc. In the case of
DNA and RNA fragments, for example, the Gel Separation System 1 00
may be an electrophoresis system (not shown), according to one
preferred embodiment of this invention.
After a set of samples is separated by the system 1 00, the gel is
then put through a Scanner 1 02, which generates a gel image. The
preferred format of the gel image is a compressed (with no loss of
information)TIFF file or a JPEG file. This image file then is sent to an
Image Processing System 1 04, which extracts and refines the molecular
signals contained in the gel through a series of processing steps.
The Image Processing System 1 04 is connected to a Selector 1 08,
which selects signals of interest for the construction of profiles. The
profiles are then built by a Profile Compiler 1 1 0, which accepts the input
form the Selector 108 and which outputs profile information to a
Database Connector 1 1 2. The Database Connector 1 1 2 serves as a
channel, or two-way communication pipeline, between the Gel Profiling
System 1 20 and a connected Database 1 30. The Database 1 30 manages
and stores relevant information of the biological samples run on the gel,
along with other background information that is needed to aid one in
interpreting the profile information generated by the system 1 20. The
Database 1 30 may be a typical genomics or proteomics database or
knowledge base system, in accordance with the needs and resources of a
particular profiling project. The communication between the Database
Connector 1 1 2 and the Database 1 30 is two-way because the Database
Connector 1 1 2 transfers the profile information from the Profile Compiler
1 10 to the Database 1 30, while in the meantime, other relevant
information of each sample run on the gel are retrieved from the Database
1 30 and made available to the system 1 20.
The Image Processing System 1 04 also directly connects to and
interacts with the Profile Compiler 1 1 0, as is illustrated in Fig . 1 . The
system 1 04 computes the intensity values of the signals of interest, and
thereby provides the basis for profile construction. In addition, the
Graphical Interface 1 06 directly connects to the Image Processing System
1 04. The Graphical Interface 1 06 provides the user with a mechanism to
control and to monitor the activities of the Gel Profiling System 1 20, and
it provides a number of synchronized window viewers. The detail
working of the Graphical Interface 1 06 is discussed later. Aside from the
Image Processing System 1 04, the other modules such as the Selector
1 08, the Profile Compiler 1 1 0 and the Database Connector 1 1 2 are also
tied in with the Graphical Interface 1 06, to present the system state and
to accept the user's input, as demonstrated in Fig . 1 . An additional
connection between the Selector 108 and the Database Connector 1 1 2
allows the Selector 1 08 to incorporate relevant biological background
information on the samples in identifying sample signals to be included in
the selected set.
The Image Processing System 1 04 includes a series of modules and
processes as illustrated in Fig . 2. An Image Reader 200 reads signals
from a gel image and records the resulting signal information, preferably in
a file. For nucleic acid samples, for example, the Image Reader 200
identifies the lanes and bands of DNA or RNA fragments, and creates data
files to store data representing the lanes and bands, respectively. The
5 band data includes pixel values, and the x, y coordinates of each band .
The lane data includes coordinates for a given number of points at a
predetermined interval along the length of each lane. In addition,
information defining the marker bands is similarly extracted and stored in
a designated file. This information, including the position of each marker
i o band and its size with respect to the number of bases, is then directed to
a Band Size Calibrator 204, which calculates the number of bases and
thus calibrates the size of each sample band according to the
corresponding marker bands. In the meantime, the information in the lane
data file and the band data file are sent to a Band Counter 206 and a
15 Trace Generator 202. The Band Counter 206 counts the number of
identifiable bands in each lane, as is shown in Fig . 3. The lanes 302 and
304 are marker lanes, and 300 is a size measure with respect to the
number of bases according to the marker band information . The lanes
31 0, 31 2, 31 4, 31 6, 31 8, 320, 322 and 324 are sample lanes with
0 bands identified through the Image Reader 200. The number of bands in
each sample lane is then counted . The Trace Generator 202 creates a
trace for each lane in the gel image. Each trace is a two-dimensional
curve, where the X axis represents the gel shift and the Y axis represents
the intensity at each X position, as curve 402 illustrated in Fig. 4(a).
These raw traces in Fig. 4(a) are then aligned in an Aligning module 208
to correct for vertical distortions in the gel, which results in the aligned
traces, such as curves 41 2, 414 and 41 6 presented in Fig. 4(b). The
background noise is then subtracted from each of the aligned traces by
means of a Background Subtraction module 21 0 to produce aligned,
background-adjusted traces. To allow for cross-lane comparisons, a
reference trace is then built in a Normalization module 21 2 by taking the
average intensity values on the Y axis of all traces over the entire length
of each lane on the X axis. Fig. 5 demonstrates an example of a reference
trace, curve 502. Each aligned, background-adjusted trace is then
normalized against the reference trace, which creates aligned,
background-adjusted and normalized traces, such as curves 422, 424 and
426 shown in Fig. 4(c) . The information on these aligned, background-
adjusted and normalized traces is subsequently used by an Intensity
Computing module 21 6 to compute an intensity score 220 for each band
by taking the area underneath the peak corresponding to the band. In
parallel, a Gel Simulator 214 uses the aligned, background-adjusted and
normalized traces to create virtual lanes and bands which are displayed in
the Virtual Gel Viewer 704 in the Graphical Interface 1 06, as is shown in
Fig. 6.
The Image Processing System 1 04 uses the following pairwise
alignment algorithm for aligning the traces. Each lane is aligned against a
reference lane. The first lane is usually used as the alignment reference
lane. The reference trace (the trace of the alignment reference lane) is
5 kept fixed during the alignment process. In aligning a trace, its X-
dimension is incrementally reshaped so that eventually the peaks
(representing band features) over Y-dimension match those on the
reference trace. During the alignment process, the Aligning module 208
keeps track of (1 ) the incrementally changed trace and (2) the mapping
io function that can be used to transform the X-coordinates from the original
trace to the new X-coordinates after the alignment. During phase I of the
alignment, a starting point is first selected near the center of the
reference trace. Then, through exhaustive shifting, enlarging, and/or
shrinking the X axis, the point on the trace being aligned that corresponds
i s to the starting point is identified. This corresponding point is defined by
taking the minimum of the cost computed from intensity differences and
first derivative value differences in a window around the alignment
starting point. During phase II of the alignment, the alignment is extended
in both directions by minimizing the alignment cost in a sliding window
o that is moved incrementally in two directions towards the start and the
end of the trace, respectively. The alignment algorithm is implemented in
Java in the preferred embodiment of this invention.
In the Image Processing System 1 04, the Gel Simulator 21 4
creates virtual lanes and bands which are displayed in the Virtual Viewer
704 within the Graphical Interface 1 06. As mentioned above, the
Graphical Interface 1 06 consists of a number of windows with different
displays . Referring to Fig . 7, the whole interface system 1 06 represented
by the dashed block has four main components : ( 1 ) a Trace Viewer 702,
(2) a Virtual Gel Viewer 704, and (3) an Original Image Viewer 706,
which are interconnected with each other; and (4) a Profile Editor 708
which is connected to the Virtual Gel Viewer 706.
The first element, the Trace Viewer 702, displays traces derived
from a gel image, as shown in Figs 4(a) - 4(c) . Raw traces (402, 404
and 406), aligned traces (41 2, 41 4 and 41 6) , and aligned, background-
adjusted, normalized traces (422, 424 and 426) are illustratively shown in
Fig . 4(a) , 4(b), and 4(c), respectively. Referring to Fig . 8, the Trace
Viewer 702 has built-in scrolling and zooming capabilities, introduced by a
Scrolling module 800 and a Zooming module 802, respectively. In
addition, portions of the traces can be highlighted through a Highlighting
module 804 according to the user's selection, to enable subsequent
analysis. The shaded areas 432 and 434 in Fig . 4(c) demonstrate
highlighted parts of certain traces. The Trace Viewer 702 thus provides a
dynamic and responsive representation of traces.
The second element, the Virtual Gel Viewer 704, displays simulated
bands and lanes which are straightened and calibrated through the series
of processes in the Image Processing System 1 04, as discussed above.
Referring to Fig. 9, the central unit of the Virtual Gel Viewer 704 is a
Painting module 900, which paints simulated lane and band images in this
window based on the aligned, background-adjusted and normalized
traces. Scrolling 902 and Zooming 904 are also enabled in this viewer.
Fig. 6 is a screen shot of the Virtual Gel Viewer 704, where two scrolling
bars (602 and 604) and buttons for zooming (606 and 608) are shown.
The fragment size measure 61 0 is constructed from the marker lane
information extracted by the Image Reader 200 within the Image
Processing System 104. Additional buttons on the bottom represent
ancillary functions of the viewer for, e.g., adjusting the brightness, saving
the file, etc.
The third element, the Image Viewer 706, displays the original gel
images generated by the Scanner 1 02. Fig. 1 0 presents an example of
such an original gel image. The lanes 1 002 and 1 01 2 are marker lanes,
and lanes 1 004, 1 006, 1 008 and 1 01 0 are sample lanes. A horizontal
scrolling bar 1 020 and a vertical scrolling bar 1 030 are included. The
user can click on a band of interest to select it. For example, clicking on
the band 1 050 will include that band in the subsequent profile analysis.
To facilitate precise band selection, the user can use a magnifying
function of the Image Viewer 706 to enlarge the image.
The fourth element, the Profile Editor 708, allows the user to edit
the profiles. As shown in Fig. 1 1 , the user selects the band for which the
profile is to be edited, the band boundary position is then adjusted as
desired in a Boundary Adjustment module 1 1 00. The intensity value of
the band can be modified through a Intensity Modification module 1 1 02.
Finally, a profile can be deleted as needed in a Profile Deletion module
1 1 04. The edited profiles are then sent to a Comparative Filter 1 1 08
which identifies profiles with differential patterns. This is accomplished
by using the threshold number (control value) input by the user in a
Differential Threshold Input module 1 106. For example, given a set of
profiles containing the intensity values of five samples and one control, if
the user inputs " 1 0" as the threshold number, the Comparative Filter
1 1 08 will then go through each profile and identify the ones containing an
intensity value that is more than 1 0 units different than the control. Fig.
1 2 is a screen shot produced by the Profile Editor 708 according to a
preferred embodiment of this invention. The identifier of a selected band
is input into the text field 1 202 to initialize profile editing. The user enters
the threshold number into the text field 1 204.
In the preferred embodiment of this invention, the displays
presented by the Image Viewer 706, the Trace Viewer 702 and the
Virtual Gel Viewer 704 are simultaneous and synchronized. As shown in
Fig. 7, two-way communications exist between all three viewers.
Highlighting a certain portion of the lanes in the Trace Viewer 702 will
automatically highlight the same portion of the lanes in the Virtual Gel
Viewer 704. Likewise, marking certain bands in the Image Viewer 706
will automatically mark the same bands in the Virtual Gel Viewer 704 as
being selected. The Image Processing System 104 directly interacts with
each of the three viewers in its series of image processing steps, from
extracting the raw data (Image Reader 200), creating the traces (Trace
Generator 202), aligning the traces (Aligning), adjusting the background
noise for all traces(Background Subtraction 210), and normalizing each
trace with respect to a reference trace (Normalization 21 2), to eventually
constructing the virtual images (Gel Simulator 21 4) . The profile analysis
is also tied in with all three viewers, which is reflected by the two-way
connections shown in Fig. 7 between the Profile Compiler 1 1 0 and each
of the viewers. The details of the Profile Compiler 1 1 0 will be described
below. The Profile Editor 708 in the Graphical Interface 1 06 interacts
with the Virtual Gel Viewer 704 directly, as mentioned above.
Additionally, it connects to the Database Connector 1 1 2 and the Selector
1 08 to enable the finalized profile information for the selected sample
bands to be recorded in the connected Database 1 30.
Referring to Fig. 1 3, the Selector 1 08 has two modes: ( 1 ) selection
based upon the user's input through the Graphical Interface 1 06, which is
done within a User Selection module 1 300; and (2) automatic selection
based upon a preconfigured criteria, which is done within a System
Selection module 1 302. The preconfigured criteria is established based
upon sample information 1 31 0, which is imported from the connected
Database 1 30 through the Database Connector 1 1 2. Selected sets of
bands 1 31 2 are then sent to the Profile Compiler 1 1 0 which creates the
profiles.
Referring to Fig . 1 4, the Profile Compiler 1 1 0 takes three pieces of
information: ( 1 ) selected sets of bands 1 31 0 from either the User
Selection 1 300 or the System Selection 1 302, (2) sample information
1 31 0 retrieved from the Database 1 30, and (3) band intensity score 220
calculated in the Image Processing System 1 04. Using this information,
the Profile Compiler 1 1 0 builds profiles within its central Build Profile
module 1 400. According to a preferred embodiment of this invention, a
typical profile 1 402 contains the relevant sample information such as
tissue type, disease progression state, patient age, gender, treatment
regime, etc., and the intensity scores for a set of selected samples that
are being tested. This information is organized into one of three formats,
according to the user's preference: (1) textual format 1408, (2) table
format 1406; or (3) a data structure format 1404 that is acceptable to
the connected Database 130 through the Database Connector 112. Fig.
14 illustrates the data flow between the Profile Compiler 110, the
Database Connector 112, and the Database 130. Data originates from
the Database Connector 112, sample information 1310 is then derived
and constructed, the profiles are subsequently built in one of the above
three formats (1404, 1406 or 1408) within the Build Profile module
1400, which are eventually sent back to the Database Connector 112.
Referring to Fig.15, the Database Connector 112 includes
communication channels in two directions: an Importer 1500 and an
Exporter 1502. The sample information 1310 is retrieved from the
connected Database 130 through the Importer 1500, while the profile
information 1402 (in the various formats, 1404, 1406 or 1408) is sent to
the Database 130 through the Exporter 1502. The Importer 1500 uses
computer programs to read data from the Database 130, which includes a
set of database retrieving functions, one for each of the fields for which
data is needed. The Exporter 1502 uses computer programs to write data
into the Database 130, which includes database insert or update
functions for profiles in each of the three formats, textual 1408, tabular
1406 or data structure 1404. It also includes such functions for any
other fields for which data needs to be updated as a result of analysis
runs by the Gel Profiling System 1 20.
In summary, the main characteristics of the gel profiling system in
accordance with the present invention are as follows.
( 1 ) The gel profiling system uses a gel separation system to
separate biologically active macromolecules. Such a separation system
includes, but not limited to a polyacrylamide gel electrophoresis (PAGE)
system or agarose gel system to separate nucleic acids molecules (DNA
and RNA), and 1 -dimensional or 2-dimensional protein denaturing sodium
dodecyl sulfate (SDS) gel system (e.g . SDS-PAGE) to separate protein
molecules and the like.
(2) The gel profiling system uses a scanner to scan the gels
produced by the separating system and generate image files, such as
TIFF, JPEG images.
(3) The gel profiling system uses a computerized image processing
system to extract, refine and analyze signals carried on each gel .
(4) In an image processing system for nucleic acid fragment
analysis, an image reader first identifies lanes and bands on the gel,
including the marker lanes and marker bands, and record those data.
Then a band counter uses those data to count the number of bands in
each lane, and a band size calibrator calibrates the size of each band with
respect to the number of bases in the fragment according to the marker
5 bands information.
(5) In an image processing system for nucleic acid fragment
analysis, a trace generator builds a trace for each lanes on the gel based
on the lane and band data from the image reader. All traces are
subsequently aligned using a pairwise alignment algorithm which
io designates a reference lane and incrementally align each trace against the
trace of the reference lane. The aligned traces are then adjusted to
subtract the background noise. The resulting aligned, background-
adjusted traces are subjected to normalization according to a normalizing
reference trace. This is to allow for cross-lane comparison. The
i s normalizing reference trace is created by taking the average intensity
values on the Y axis of all traces in a gel image over the length of each
lane on the X axis.
(6) In an image processing system for nucleic acid fragment
analysis, the intensity score for each band is computed based on the
0 aligned, background-adjusted and normalized trace by taking the area
underneath the peak corresponding to the band .
(7) In an image processing system for nucleic acid fragment
analysis, a gel simulator uses the data of the aligned, background-
adjusted and normalized traces to create virtual lanes and bands . The
intensity value of each peak (band) on the Y axis determines the intensity
5 of the simulated band . The virtual images are displayed in a virtual gel
viewer within the graphical interface in the gel profiling system .
(8) The gel profiling system uses a selector to establish a set of
selected sample signals for profiling analysis. The selector works in two
modes. In the first mode, it allows the user to identify sample signals (e.g .
i o bands) to be included in the set. In the second mode, it automatically
highlights the sample signals that satisfies a preconfigured criteria in the
system . The preconfigured criteria is established based on the sample
information retrieved from the connected database.
(9) The gel profiling system uses a profile compiler to create
i s profiles. For nucleic acid profiling analysis, a profile consists lane
numbers, band pixel coordinates, intensity scores and related information
on any qualifying attributes of samples corresponding to each band .
Three profile formats are used in the profile compiler: textual, tabular or a
data structure recognizable by the connected database. The profile
20 compiler works in two modes. In the first mode, the intensity scores for
all detectable signals in a gel image are compiled into one of the above
three valid formats along with the sample information . In the second
mode, the intensity scores for the selected sample signals (as determined
by the selector) are compiled into one of the above three valid formats
along with the sample information.
5 ( 1 0) The gel profiling system uses a database connector to
communicate with a connected biological database or knowledge. This
database system stores background information on all samples run on the
gel, and information about known genes, proteins, and certain diseases as
in a typical genomics or proteomics database. The database connector
i o provides communications in two directions : an importer retrieves sample
information from the database, and an exporter sends the profile
information to the database. Each of the two modules consists of a
respective set of computer programs for accessing the database.
( 1 1 ) The gel profiling system has a graphical interface allowing the
i s user to monitor and control processes in the system . The graphical
interface has three windows, a trace viewer, a virtual gel viewer and an
original image viewer, which display in synch, the traces, the virtual lanes
and bands, and the original gel images, respectively. Additionally, a profile
editor provides a control panel for profile editing . The profile editor
0 directly interacts with each of the three viewers in its operation .
( 1 2) In the graphical interface, the trace viewer displays traces of
all lanes . Zooming and scrolling are supported in this viewer. The trace
viewer has a characteristic sliding window that highlights portions of
traces for cross-lane comparisons .
( 1 3) In the graphical interface, the virtual gel viewer displays virtual
lane and band images created by the gel simulator. Zooming and scrolling
are also supported in this viewer.
( 1 4) In the graphical interface, the image viewer displays the
original gel images . The user can select certain bands by properly
marking them in this viewer. Zooming and scrolling are supported in this
viewer.
Although the preferred embodiment of the present invention has
been described and illustrated in detail, it is to be understood that the
same is by way of illustration and example only, and is not to be taken by
way of limitation . The feature of novelty which characterize the present
invention as well as its scope are defined and limited only by the terms of
the claims appended to and forming a part of this specification.