WO2020037574A1 - 基于图像构建测序模板的方法、碱基识别方法和装置 - Google Patents

基于图像构建测序模板的方法、碱基识别方法和装置 Download PDF

Info

Publication number
WO2020037574A1
WO2020037574A1 PCT/CN2018/101819 CN2018101819W WO2020037574A1 WO 2020037574 A1 WO2020037574 A1 WO 2020037574A1 CN 2018101819 W CN2018101819 W CN 2018101819W WO 2020037574 A1 WO2020037574 A1 WO 2020037574A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
bright spot
images
pixel
registration
Prior art date
Application number
PCT/CN2018/101819
Other languages
English (en)
French (fr)
Inventor
李林森
徐伟彬
金欢
姜泽飞
周志良
颜钦
Original Assignee
深圳市真迈生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市真迈生物科技有限公司 filed Critical 深圳市真迈生物科技有限公司
Priority to EP18930701.0A priority Critical patent/EP3843033A4/en
Priority to US17/270,418 priority patent/US11170506B2/en
Priority to PCT/CN2018/101819 priority patent/WO2020037574A1/zh
Publication of WO2020037574A1 publication Critical patent/WO2020037574A1/zh

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4038Scaling the whole image or part thereof for image mosaicing, i.e. plane images composed of plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T5/70
    • G06T5/73
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • G06T7/0014Biomedical image inspection using an image reference approach
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • G06T7/337Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/772Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/04Recognition of patterns in DNA microarrays

Definitions

  • the present invention relates to the field of image processing and information recognition, and in particular, to a method for constructing a sequencing template based on an image, a base recognition method, a device for constructing a sequencing template based on an image, a base recognition device and a Computer program products.
  • Related technologies include how to process and correlate multiple nucleic acid molecules at different time points in a sequencing platform that performs image acquisition of a nucleic acid molecule (template) in a biochemical reaction multiple times based on an imaging system to determine the nucleotide sequence of the nucleic acid molecule.
  • the acquired image includes information on the image to effectively and accurately obtain the nucleotide composition and sequence of at least a portion of the nucleic acid template, which is a problem worthy of attention.
  • the embodiments of the present invention aim to solve at least one of the technical problems in the related art or provide at least one optional practical solution.
  • a method for constructing a sequencing template based on an image includes a first image corresponding to a same field of view during four base extension reactions of A / U, T, G, and C, respectively.
  • the second image, the third image, and the fourth image there are multiple nucleic acid molecules with optically detectable labels in the field of view during the base extension reaction, and at least a portion of the nucleic acid molecules appear as bright spots on the image, defining the order and / Or realize four types of base extension reactions as one sequencing reaction at the same time.
  • the first image includes images M1 and M2, the second image includes images N1 and N2, the third image includes images P1 and P2, and the fourth image Including image Q1 and image Q2, image M1 and image M2 are from two rounds of sequencing reactions, image N1 and image N2 are from two rounds of sequencing reactions, image P1 and image P2 are from two rounds of sequencing reactions, and image Q1 and image Q2 are from Two rounds of sequencing reactions, the method includes combining any two images of image M1, image M2, image N1, image N2, image P1, image P2, image Q1, and image Q2 for bright spots Match, and make image M1, image N1, image N2, image P1, image P2, image Q1 and image Q2 all participate in the combination at least once to obtain multiple combined images including the first coincident bright spot, and the distance on the combined image is less than the first Two or more bright spots of a predetermined pixel are one first coincident bright spot; and the first coincident bright spots on a plurality of combined images are merged to obtain a bright spot set corresponding to a sequencing
  • an apparatus for constructing a sequencing template based on an image is provided, and the apparatus is configured to implement all or part of the steps of the method for constructing a sequencing template based on an image in the foregoing embodiment of the present invention.
  • the so-called images include the first image, the second image, the third image, and the fourth image corresponding to the same field of view in the four base extension reactions of A / U, T, G, and C, respectively.
  • the sequence and / or simultaneous realization of four types of base extension reactions is a sequencing reaction.
  • the images include images M1 and M2, the second image includes images N1 and N2, the third image includes images P1 and P2, the fourth image includes images Q1 and Q2, and the images M1 and M2 are from two rounds of sequencing reactions, respectively.
  • Image N1 and image N2 are from two rounds of sequencing reactions
  • image P1 and image P2 are from two rounds of sequencing reactions
  • image Q1 and image Q2 are respectively from two rounds of sequencing reactions.
  • the device includes: a combination unit for combining images M1 and images M2, image N1, image N2, image P1, image P2, image Q1, and image Q2 for bright spot matching, and make image M1, image N1, image N2, image P1, and image P2 Both the image Q1 and the image Q2 participate in the combination at least once to obtain a plurality of combined images including the first coincident bright spot, and the two or more bright spots on the combined image whose distance is less than the first predetermined pixel are one first coincident bright spot And a merging unit for merging the first overlapping bright spots on a plurality of combined images from the combining unit to obtain a bright spot set corresponding to a sequencing template.
  • a computer-readable storage medium for storing a program for execution by a computer, and executing the program includes performing a method for constructing a sequencing template based on an image in any of the foregoing embodiments.
  • the computer-readable storage medium includes, but is not limited to, read-only memory, random access memory, magnetic disks, or optical disks.
  • a terminal a computer program product
  • the product includes instructions that, when the computer executes a so-called program, causes the computer to execute the image-based Methods for constructing sequencing templates.
  • the sequencing template constructed by using the method, device, computer-readable storage medium, and / or computer program product based on the image to construct the sequencing template is a bright spot set corresponding to the sequencing template.
  • the bright spot set can be effective, accurate, and comprehensive.
  • the information reflecting the sequencing template facilitates further accurate base call, that is, accurate recognition of the nucleotide sequence of at least a portion of the template nucleic acid.
  • a base recognition method includes matching bright spots on an image obtained from a base extension reaction to a bright spot set of a corresponding sequencing template, and performing the matching based on the bright spots on the matching template.
  • Base recognition there are multiple nucleic acid molecules with optically detectable labels in the field of view corresponding to the image obtained from the base extension reaction, and at least a part of the nucleic acid molecules appear as bright spots on the image obtained from the base extension reaction, corresponding to sequencing
  • the bright spot set of the template is constructed and acquired by the method, device, computer-readable storage medium, and / or computer program product based on the image-based sequencing template construction method in the embodiments of the present invention.
  • a base recognition device for implementing the above-mentioned base recognition method in the embodiment of the present invention.
  • the device is configured to apply bright spots on an image obtained from a base extension reaction.
  • a set of bright spots matching the corresponding sequencing template, and base recognition is performed according to the bright spots on the matching.
  • Bright spots appear on the image obtained from the base extension reaction, and the bright spot set corresponding to the sequencing template is obtained by the image-based method and / or device construction of the sequencing template in the embodiment of the present invention described above.
  • a computer-readable storage medium for storing a program for execution by a computer, and executing the program includes performing the base identification method in any one of the foregoing embodiments.
  • the computer-readable storage medium includes, but is not limited to, read-only memory, random access memory, magnetic disks, or optical disks.
  • a computer program product is also provided.
  • the product includes instructions for implementing base identification.
  • the instruction causes the computer to execute the base in the embodiment of the present invention.
  • Method of identification is also provided.
  • the type of bases that are bound to the template nucleic acid during the base extension reaction can be identified, It can be used to achieve accurate determination of the template nucleic acid sequence.
  • FIG. 1 is a schematic flowchart of a method for constructing a sequencing template based on an image in a specific embodiment of the present invention.
  • FIG. 2 is a schematic diagram of combining and merging the images Repeat1, Repeat5, Repeat6, and Repeat7 to construct a sequencing template based on bright spots in a specific embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a correction process and a correction result in a specific embodiment of the present invention.
  • FIG. 4 is a schematic diagram of corresponding matrices of candidate bright spots and pixels together in a specific embodiment of the present invention.
  • FIG. 5 is a schematic diagram of pixel values in a range of m1 * m2 centered on a central pixel of a pixel matrix in a specific embodiment of the present invention.
  • FIG. 6 is a schematic diagram of comparison of bright spot detection results before and after determination according to a second bright spot detection threshold in a specific embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an apparatus for constructing a sequencing template based on an image in a specific embodiment of the present invention.
  • an embodiment of the present invention provides a method for constructing a sequencing template based on an image.
  • the so-called image is collected from a same field of view, and includes four types of base extension reactions: A / U, T, G, and C.
  • a / U, T, G, and C the first image, the second image, the third image, and the fourth image, there are multiple nucleic acid molecules with optically detectable labels in the field of view during the base extension reaction, and at least a portion of the nucleic acid molecules appear as bright spots on the image.
  • the first image includes images M1 and M2
  • the second image includes images N1 and N2
  • the third image includes images P1 and P2.
  • the fourth image includes images Q1 and Q2, images M1 and M2 are from two rounds of sequencing reactions, images N1 and N2 are from two rounds of sequencing reactions, and images P1 and P2 are from two rounds of sequencing reactions, respectively.
  • Image Q2 comes from two rounds of sequencing reactions, and the method includes: S10 combines image M1, image M2, image N1, image N2, image P1, image P2, image Q1, and image Q2.
  • This method can obtain the bright spot set corresponding to the template nucleic acid molecule by first taking the intersection and then the union of the bright spots on multiple images.
  • the sequencing template obtained by this method is a bright spot collection corresponding to the sequencing template.
  • the bright spot collection can effectively, accurately and comprehensively reflect the information of the sequencing template.
  • the obtained bright spot collection can be further used for accurate base identification ( base call), that is, a nucleotide sequence used to accurately obtain at least a portion of a template nucleic acid.
  • the so-called sequencing reaction which sequentially and / or simultaneously achieves four types of base extension reactions, can be four types of base reaction substrates (such as nucleotide analogs / base analogs) at the same time in one base
  • One round of sequencing reaction can be realized in the base extension reaction system.
  • Two types of base analogs can be used in one base extension reaction system, and the other two types of reaction substrates can be used in the next base extension reaction system to achieve one round of sequencing reaction.
  • one type of base analog may be added to one base extension reaction system, and the four types of base analog may be added to four consecutive base extension reaction systems in order to realize a round of sequencing reaction. It can be seen that the first image, the second image, the third image, and the fourth image can be collected from two or more base extension reactions.
  • a base extension reaction may include one image acquisition or multiple image acquisitions.
  • a round of sequencing reactions includes multiple base extension reactions, such as monochrome sequencing, and the reaction substrates (nucleotide analogs) corresponding to the four types of bases all carry the same fluorescent dye.
  • the round sequencing reaction includes four base extension reactions.
  • one base extension reaction includes one image acquisition.
  • Image M1, image N1, image P1, and image Q1 are four from a round of sequencing reaction, respectively.
  • Subbases extend the same field of view of the reaction.
  • one round of sequencing reaction includes two base extension reactions. Two types of base reaction substrates with different dyes are combined in one base extension reaction. For one field of view, one base The extension reaction includes two image acquisitions at different excitation wavelengths. Image M1, image N1, image P1, and image Q1 are from the same field of view at two excitation wavelengths of two base extension reactions of a round of sequencing reaction, respectively.
  • a round of sequencing reactions includes a base extension reaction, such as a two-color sequencing reaction of a second-generation sequencing platform, and the four types of base reaction substrates (such as nucleotide analogs) have dyes a, With dye b, with dye a and dye b, and without any dye, the excitation wavelengths of dye a and dye b are different; four types of reaction substrates realize a round of sequencing reaction in the same base extension reaction, and a base
  • the base extension reaction includes two image acquisitions at different excitation wavelengths. The first image is the same as the third image, the second image is the same as the fourth image, and the images M1 and N1 are from different rounds of sequencing reactions or differences in the same round of sequencing reactions. The same field of view at the excitation wavelength.
  • S20 merges the first overlapping bright spots on multiple combined images, including performing one or more matching on the first overlapping bright spots in different combined images to obtain a bright spot set corresponding to a sequencing template. .
  • each base extension reaction contains only one type of nucleotide analog, preferably, S is greater than 1, and more preferably, S is greater than 2, which is beneficial to avoid Or reduce the interference caused by the noise caused by the biochemical test factors to construct the sequencing template based on the image, which is conducive to the effective and accurate determination of the template.
  • FIG. 2 is a schematic diagram of this process.
  • the top four images in Figure 2 are Repeat1, Repeat5, Repeat6, and Repeat7.
  • the change in the middle image shows the process and results of the overlapping bright spot matching of Repeat1 and Repeat5.
  • the following figure shows the images Repeat1, Repeat5, Repeat6 and Repeat7 overlap the results of bright spot matching.
  • the size of the electronic sensor in the imaging system is 6.5 ⁇ m
  • the magnification of the microscope is 60 times
  • the smallest size that can be seen is 0.1 ⁇ m.
  • the size of the bright spots corresponding to the nucleic acid molecule is generally less than 10 * 10 pixels.
  • the so-called first predetermined pixel is, in one example, 1.05 pixels.
  • the two first coincident bright spots with a distance greater than 1.85 pixels are set as the two first coincident bright spots.
  • the overlapping bright spots with a distance greater than 1.05 pixels but less than 1.85 pixels away from the other overlapping bright spots are discarded. In this way, it is beneficial to construct an accurate sequencing template.
  • the image is a registered image. In this way, it is beneficial to accurately obtain the bright spot set corresponding to the sequencing template.
  • image registration is performed by using the following method, including: performing first registration on the image to be registered based on the reference image, the reference image and the image to be registered correspond to the same object, and the reference image and the image to be registered each include multiple Bright spots, including determining a first offset of a predetermined area on the image to be registered and a corresponding predetermined area on the reference image, and moving all bright spots on the image to be registered based on the first offset to obtain a first registration
  • the image to be registered after the second registration based on the reference image, the second registration is performed on the image to be registered after the first registration, including merging the image to be registered and the reference image after the first registration, obtaining a merged image, and calculating the merged image
  • This image registration method can be relatively called coarse registration and fine registration through two association registrations, including fine registration using bright spots on the image, which can quickly achieve high-precision image correction based on a small amount of data information. It is especially suitable for scenes where high-precision image correction is required.
  • single-molecule-level image detection such as images of sequencing reactions from third-generation sequencing platforms.
  • the so-called single molecule level refers to the size of a single or a few molecules, such as 10, 8, 5, 4, or less molecules.
  • the image to be registered is from a sequencing platform that uses the principle of optical imaging for sequence determination.
  • sequencing also called sequence determination, refers to nucleic acid sequence determination, including DNA sequencing and / or RNA sequencing, including long-sequence sequencing and / or short-sequence sequencing, and sequencing biochemical reactions include base extension. Sequencing can be performed through a sequencing platform.
  • the sequencing platform can be selected but not limited to Hisq / Miseq / Nextseq sequencing platform from Illumina, IonTorrent platform from Thermo Fisher / Life Technologies, BGISEQ platform and single molecule sequencing platform from BGI; sequencing method You can choose single-end sequencing or double-end sequencing; the obtained sequencing results / data are the read out fragments, which are called reads, and the length of the reads is called read length.
  • the so-called "bright spots" correspond to the optical signals of extended bases or clusters of bases.
  • the predetermined area on the so-called image may be the entire image or a part of the image.
  • the predetermined area on the image is a part of the image, such as a 512 * 512 area in the center of the image.
  • the so-called image center is the center of the field of view.
  • the intersection between the optical axis of the imaging system and the imaging plane can be referred to as the image center point, and the area centered on the center point can be regarded as the image center area.
  • the image to be registered comes from a nucleic acid sequencing platform
  • the platform includes an imaging system and a nucleic acid sample carrying system
  • the nucleic acid molecule to be tested with an optical detection label is fixed in a reactor, which is also called
  • the chip is mounted on a movable table, and the moving table drives the chip to realize image acquisition of the nucleic acid molecules to be tested located at different positions (different fields of view) of the chip.
  • there is a limit on the accuracy of the movement of the optical system and / or the mobile stage For example, there is a deviation between the specified movement to a certain position and the position reached by the actual movement of the mechanical structure, especially in application scenarios that require high accuracy.
  • the so-called reference image is obtained through construction, and the reference image can be constructed when the image to be registered is registered, or it can be constructed in advance to be saved when needed.
  • constructing the reference image includes: obtaining a fifth image and a sixth image, the fifth image and the sixth image corresponding to the same object as the image to be registered; and performing coarse registration on the sixth image based on the fifth image, including determining An offset between the sixth image and the fifth image, and the sixth image is moved based on the offset to obtain a sixth image after coarse registration; the fifth image and the sixth image after coarse registration are combined to obtain a reference image
  • the fifth image and the sixth image each include multiple bright spots.
  • the use of multiple images to construct a reference image facilitates the reference image to obtain complete bright spot information of the corresponding nucleic acid molecule and facilitates correction of the bright spot-based image.
  • the fifth image and the sixth image are from the same field of view at different times of the nucleic acid sequence determination reaction (sequencing reaction), respectively.
  • a round of sequencing reactions includes multiple base extension reactions, such as monochrome sequencing, and the reaction substrates (nucleotide analogs) corresponding to the four types of bases all carry the same fluorescent dye.
  • the round sequencing reaction includes four base extension reactions. For one field of view, one base extension reaction includes one image acquisition, and the fifth image and the sixth image are from the same field of view for different base extension reactions. In this way, the reference image obtained by processing and collecting the information of the fifth image and the sixth image is used as a reference for correction, which is conducive to more accurate image correction.
  • a single-molecule, two-color sequencing reaction uses two types of bases (nucleotide analogs) with one fluorescent dye and two with a different Excitation wavelength fluorescent dye.
  • One round of sequencing reaction includes two base extension reactions. Two types of base reaction substrates with different dyes are combined in one base extension reaction. For one field of view, one base extension The reaction includes two image acquisitions at different excitation wavelengths, and the fifth and sixth images are from different base extension reactions or the same field of view at different excitation wavelengths in the same base extension reaction, respectively. In this way, the reference image obtained by processing and collecting the information of the fifth image and the sixth image is used as a reference for correction, which is conducive to more accurate image correction.
  • a round of sequencing reactions includes a base extension reaction, such as a two-color sequencing reaction of a second-generation sequencing platform, and the four types of base reaction substrates (such as nucleotide analogs) have dyes a, With dye b, with dye a and dye b, and without any dye, the excitation wavelengths of dye a and dye b are different; the four types of reaction substrates achieve a round of sequencing reactions in the same base extension reaction, the fifth The image and the sixth image are from the same field of view at different rounds of sequencing reactions or different excitation wavelengths in the same round of sequencing reactions, respectively.
  • the reference image obtained by processing and collecting the information of the fifth image and the sixth image is used as a reference for correction, which is conducive to more accurate image correction.
  • the fifth image and / or the sixth image may be one image or multiple images.
  • the fifth image is a first image and the sixth image is a second image.
  • it further includes constructing a so-called reference image by using the seventh image and the eighth image, and the image to be registered, the fifth image, the sixth image, the seventh image, and the eighth image are from a sequencing reaction.
  • the same field of view, the fifth image, the sixth image, the seventh image, and the eighth image correspond to the field of view of the four types of base extension reactions of A / U, T, G, and C, respectively.
  • a plurality of nucleic acid molecules with optically detectable labels, at least a portion of the nucleic acid molecules appear as bright spots on the image, and constructing a reference image further includes: coarsely registering the seventh image based on the fifth image, including determining the seventh image and the first image.
  • a Fourier transform can be used to determine the first offset by using frequency domain registration.
  • a Fourier transform can be used to determine the first offset by using frequency domain registration.
  • the two-dimensional discrete Fourier transform in the phase-only correlation function (Phase-Only Correlation Function) in Kenji TAKITA et al, IEICE TRANS.FUNDAMENTALS, VOL.E86-A, NO.8 AUGUST 2003. Determining the first offset, the sixth and fifth images, the seventh and fifth images, and / or the eighth and fifth images.
  • the first registration / coarse registration can achieve an accuracy of 1 pixel. In this way, the first offset can be determined quickly and accurately, and / or a reference image that facilitates accurate correction can be constructed.
  • the reference image and the image to be registered are binarized images. In this way, it is beneficial to reduce the amount of calculation and quickly correct the deviation.
  • the image to be corrected and the reference image are both binary images, that is, each pixel in the image is not a or b, for example, a is 1, b is 0, and a pixel mark of 1 is brighter than a pixel mark of 0. , Or high intensity;
  • the reference image is constructed using the repeat1, repeat2, repeat3, and repeat4 images of the four base extension reactions of a sequencing reaction, and the fifth and sixth images are selected from any of the images repeat1-4 , Two or three.
  • the fifth image is image repeat1, the images repeat2, repeat3, and repeat4 are sixth images.
  • the images repeat2-4 are coarsely registered in order to obtain the coarsely registered images repeat2-4, respectively;
  • the image repeat1 and the repeat registered image repeat2-4 are combined to obtain a reference image.
  • the so-called merged image is an overlapping bright spot in the merged image. It is mainly based on the size of the bright spots of the corresponding nucleic acid molecule and the resolution of the imaging system. In one example, two bright spots with a distance of no more than 1.5 pixels on the two images are set as coincident bright spots.
  • the center area of the synthesized image of the four repeats is used as the reference image, which is helpful to make the reference image have a sufficient amount of bright spots and facilitate subsequent registration, and second, to detect and locate the bright spots in the central area of the image.
  • the speckle information is relatively more accurate and facilitates accurate registration.
  • the following steps are performed to correct the image: 1) The image repeat5 of a certain field of view of a base extension reaction collected from another round of sequencing reaction is roughly corrected, and repeat5 is a binarized image.
  • the center of the image is, for example, a 512 * 512 area, and the center image synthesized from repeat1-4 (the corresponding 512 * 512 area of the center of the reference image) is subjected to a two-dimensional discrete Fourier transform.
  • the frequency domain registration is used to obtain the offset offset ( x0, y0), that is, to achieve coarse image registration, x0 and y0 can achieve 1pixel accuracy; 2) the above coarse registration image and reference image are merged based on bright spots on the image, including calculating the repeat5 image
  • two bright spots with a distance of no more than 1.5 pixels on the two images are set as coincident bright spots; 3)
  • a field-of-view image (fov) with offsets (x0, y0) of different cycles is obtained.
  • -(x1, y1), for a bright spot (peak) can be expressed as: curRepeatPoints + (x0, y0)-(x1, y1), curRepeatPoints represents the original coordinates of the bright spot, that is, the coordinates in the image before correction.
  • the correction result obtained by the above image correction has higher accuracy, and the correction accuracy is less than or equal to 0.1 pixels.
  • Figure 3 shows the correction process and results.
  • image C is corrected based on image A.
  • the circles in image A and image C indicate bright spots.
  • Bright spots with the same digital mark are coincident bright spots.
  • Image C-> A indicates The correction result, that is, the image C is aligned to the image A.
  • performing image registration further includes identifying bright spots, including detecting bright spots on the image by using a k1 * k2 matrix, and determining that the central pixel value of the matrix is not less than any non-central pixel value of the matrix.
  • the so-called image is selected from at least one of an image to be registered and an image constructing a reference image.
  • this method to detect bright spots on an image can quickly and effectively detect bright spots (spots or peaks) on an image, especially for images collected from a nucleic acid sequence determination reaction.
  • the method has no special restrictions on the detection images, ie, the original input data, and is applicable to the processing and analysis of images generated by any platform that uses the principle of optical detection for nucleic acid sequence determination, including but not limited to second- and third-generation sequencing. Efficient feature, can get more representative sequence information from the image. It is especially advantageous for signal recognition with random images and high accuracy requirements.
  • the image is derived from a nucleic acid sequence determination reaction.
  • the nucleic acid molecule is provided with an optically detectable label, such as a fluorescent label.
  • the fluorescent molecule can be excited to emit fluorescence under laser irradiation at a specific wavelength, and the image is acquired by an imaging system.
  • the acquired images include light spots / bright spots that may correspond to the location of the fluorescent molecules. Understandably, when in the focal position, the size of the bright spot corresponding to the position of the fluorescent molecule in the collected image is small and the brightness is high; when it is in the non-focus position, the collected image The size of the bright spot corresponding to the position of the fluorescent molecule is larger and the brightness is lower.
  • the so-called single molecule is a few molecules, for example, the number of molecules is not more than 10, for example, one, two, three, four, five, six, eight or ten.
  • the central pixel value of the matrix is greater than the first preset value
  • any non-central matrix pixel value is greater than the second preset value
  • the first preset value and the second preset value are related to the average pixel value of the image.
  • a k1 * k2 matrix may be used to perform ergodic detection on the image, and the setting of the so-called first preset value and / or the second preset value is related to the average pixel value of the image.
  • the pixel values are the same as the grayscale values.
  • k1 * k2 matrix, k1 and k2 may be equal or unequal.
  • the relevant parameters of the imaging system are: the objective lens is 60 times, the size of the electronic sensor is 6.5 ⁇ m, the image formed by the microscope and then passed through the electronic sensor, the minimum size that can be seen is 0.1 ⁇ m, the obtained image or the input image It can be a 16-bit grayscale or color image of 512 * 512, 1024 * 1024, or 2048 * 2048.
  • the color image can be converted into a grayscale image and then bright spot detection can be performed to reduce the calculation amount and complexity of the image detection process.
  • the inventor has performed a large number of image processing statistics, taking the first preset value as 1.4 times the average pixel value of the image, and taking the second preset value as 1.1 times the average pixel value of the image, which can eliminate interference, Obtain bright spot detection results from the optical detection mark.
  • the size, similarity and / or intensity of the ideal bright spot can be used to further screen and judge candidate bright spots.
  • the size of the candidate bright spots on the comparison image is quantitatively reflected by using the size of the connected domain corresponding to the candidate bright spots, so as to filter and determine whether the candidate bright spots are the desired bright spots.
  • the size defines the connected pixels in a k1 * k2 matrix that are larger than the average pixel value as a connected domain corresponding to a so-called candidate bright spot. In this way, it is possible to effectively obtain bright spots corresponding to the labeled molecules and conforming to subsequent sequence recognition, and obtain nucleic acid sequence information.
  • the average pixel value of the image is used as a reference, and two or more adjacent pixels that are not less than the average pixel value are called connected pixels / connectivity, as shown in FIG. 4, Bold and enlarged represents the center of the matrix corresponding to the candidate bright spot, and the thick line frame represents the 3 * 3 matrix corresponding to the candidate bright spot.
  • the so-called third preset value may be determined according to the information of the size of the connected domain corresponding to all candidate bright spots on the image. For example, by calculating the size of the connected domain corresponding to each candidate bright spot on the graph, taking the average value of the size of the connected domain of the bright spots represents a characteristic of the image as a third preset value; for example, each candidate in the image may be The size of the connected domain corresponding to the bright spot is sorted from small to large, and the size of the connected domain at the 50th, 60th, 70th, 80th, or 90th percentile is taken as the third preset value. In this way, the bright spot information can be effectively obtained, which is beneficial for subsequent recognition of the nucleic acid sequence.
  • the candidate bright spots are screened by statistically setting parameters to quantitatively reflect the intensity characteristics of the comparison candidate bright spots.
  • the so-called fourth preset value may be determined according to the information of the magnitudes of the scores of all candidate bright spots on the image. For example, when the number of candidate bright spots on the image is greater than a certain number, which meets the statistical requirements, for example, the number of candidate bright spots on the image is greater than 30, the score values of all candidate bright spots on the image can be calculated and Ascending order, the fourth preset value can be set to the 50th, 60th, 70th, 80th, or 90th quantile Score value, so that less than 50th, 60th, 70th, 80th, or 90th can be excluded.
  • the candidate bright spots of the quantile Score value are helpful for effectively obtaining the target bright spots and accurate subsequent recognition of the base sequence.
  • the basis for performing this processing or the screening setting is that, generally, it is considered that the bright spots that have a large difference in intensity and pixel value between the center and the edge and are converged are bright spots corresponding to the location of the molecule to be detected.
  • the number of candidate bright spots on the image is greater than 50, greater than 100, or greater than 1,000.
  • candidate bright spots are screened for morphology and intensity / brightness.
  • a connected pixel that is larger than the average pixel value in a k1 * k2 matrix as a connected field corresponding to a so-called candidate bright spot.
  • CV represents the candidate bright spot.
  • the center pixel value of the corresponding matrix, EV represents the sum of the non-center pixel values of the matrix corresponding to the bright spot; the candidate bright spots whose size of the corresponding connected domain is greater than the third preset value and the score is greater than the fourth preset value are A bright spot.
  • the so-called third preset value and / or fourth preset value may be considered and set with reference to the foregoing specific implementation manner.
  • the image registration method further includes bright spot recognition detection, including: preprocessing an image to obtain a preprocessed image, the so-called image is selected from a first image, a second image, a third image, At least one of a fourth image, a fifth image, a sixth image, a seventh image, and an eighth image; determining a threshold value to simplify the pre-processed image, including pixels on the pre-processed image that are less than the threshold value
  • the pixel value of is assigned a first preset value, and the pixel value of a pixel point on a preprocessed image that is not less than a critical value is assigned a second preset value to obtain a simplified image;
  • a bright spot detection threshold c1 identifying candidate bright spots on the pre-processed image and simplified image, including determining a pixel matrix that meets at least two of the following conditions a) -c) as a candidate bright spot, a)
  • the pixel matrix can be expressed as r1 * r2, r1 and r2 are both odd numbers greater than 1, and r1 * r2 pixel matrix contains r1 * r2 pixels b) in the simplified image, the pixel value of the central pixel of the pixel matrix is the second preset value and the connected pixels of the pixel matrix are greater than 2/3 * r1 * r2; and c) in the preprocessed image
  • the pixel value of the central pixel of the pixel matrix is greater than the third preset value and satisfies g1 * g2> c1.
  • G1 is a correlation coefficient of a two-dimensional Gaussian distribution in the range of m1 * m2 centered on the central pixel of the pixel matrix.
  • G2 is a pixel in the m1 * m2 range, m1 and m2 are both odd numbers greater than 1, and the m1 * m2 range includes m1 * m2 pixels; and determining whether the candidate bright spot is a bright spot.
  • the detection of bright spots on an image using this method includes the use of judgment conditions or a combination of judgment conditions determined by the inventor through a large amount of data training, which can quickly and effectively detect the bright spots on the image, especially in response to the determination of the nucleic acid sequence collected Image.
  • the method has no special restrictions on the detection images, ie, the original input data, and is applicable to the processing and analysis of images generated by any platform that uses the principle of optical detection for nucleic acid sequence determination, including but not limited to second- and third-generation sequencing. Efficient feature, can get more representative sequence information from the image. It is especially advantageous for signal recognition with random images and high accuracy requirements.
  • the pixel values are the same as the grayscale values. If the image is a color image and one pixel of the color image has three pixel values, the color image can be converted into a grayscale image and then bright spot detection can be performed to reduce the calculation amount and complexity of the image detection process. You can choose, but are not limited to, converting non-grayscale images to grayscale images using floating-point algorithms, integer methods, shifting methods, or average methods.
  • preprocessing the image includes: determining the background of the image using an open operation; converting the image into a first image using a top hat operation based on the background; performing Gaussian blur processing on the first image to obtain a second image; The two images are sharpened to obtain the so-called pre-processed image.
  • the open operation is a morphological process, that is, the process of expanding and then corroding. The etching operation will make the foreground (the part of interest) smaller, and the expansion will make the foreground larger.
  • the open operation can be used to eliminate small objects.
  • the size of the structural elements p1 * p2 (the basic template used to process the image) for the image open operation is not particularly limited, and p1 and p2 are odd numbers.
  • the structural elements p1 * p2 may be 15 * 15, 31 * 31, and the like, and finally, a pre-processed image that is favorable for subsequent processing and analysis can be obtained.
  • the top hat operation is often used to separate plaques that are brighter than neighboring points (bright spots / bright spots). In an image with a large background and small objects are more regular, the top hat operation can be used for background extraction.
  • performing a top hat transformation on an image includes first performing an open operation on the image, and then subtracting the result of the open operation from the original image to obtain a first image, which is the top hat transformed image.
  • the inventor believes that the result of the open operation enlarges the crack or local low-luminance area, so the image obtained after subtracting the open operation from the original image highlights a brighter area than the area around the outline of the original image.
  • the operation is related to the size of the selected kernel. It can be considered to be related to the expected size of the bright spots / bright spots. If the bright spots are not the expected size, the processed effect will cause a lot of small bumps in the whole picture. For details, refer to the virtual focus picture, that is, Bright spots / bright spots halo. In one example, the expected size of the bright spot, that is, the size of the selected kernel is 3 * 3, and the obtained top-hat transformed image is beneficial for subsequent further denoising processing.
  • Gaussian Blur also known as Gaussian filtering
  • Gaussian filtering is a linear smoothing filter that is suitable for eliminating Gaussian noise and is widely used in image reduction noise reduction processes.
  • Gaussian filtering is a process of weighted average of the entire image. The value of each pixel is obtained by weighted average of itself and other pixel values in the neighborhood.
  • the specific operation of Gaussian filtering is: use a template (or convolution, mask) to scan each pixel in the image, and use the weighted average gray value of the pixels in the neighborhood determined by the template to replace the value of the central pixel of the template.
  • Gaussian blur processing is performed on the first image, and the Gaussian Blur function is used in OpenCV.
  • the Gaussian distribution parameter Sigma is 0.9.
  • the two-dimensional filter matrix (convolution kernel) used is 3 * 3.
  • the Gaussian blur processing is performed on the image angle, the small protrusions on the first image are smoothed, and the edges of the image are smooth.
  • the second image that is, the Gaussian filtered image is sharpened, for example, two-dimensional Laplacian sharpening is performed. After processing from an image perspective, the edges are sharpened, and the Gaussian blurred image is restored.
  • simplifying the pre-processed image includes: determining a critical value based on the background and the pre-processed image; comparing a pixel value of a pixel point on the pre-processed image with a critical value, The pixel value of the pixel point on the processed image is assigned a first preset value, and the pixel value of the pixel point on the preprocessed image not less than a critical value is assigned a second preset value to obtain a simplified image.
  • the pre-processed image is simplified, such as binarization, which is conducive to accurate detection of subsequent bright spots, accurate identification of subsequent bases, Get high-quality data and more.
  • obtaining a simplified image includes: dividing the sharpened result obtained after preprocessing by the result of an on operation to obtain a set of values corresponding to the image pixels; and determining the binarization through the set of values The critical value of the preprocessed image.
  • the set of values can be sorted in ascending order, and the value corresponding to the 20th, 30th, or 40th percentile of the set of values is taken as the binarization threshold / threshold. In this way, the obtained binarized image facilitates accurate detection and recognition of subsequent bright spots.
  • the structure element of the open operation during image preprocessing is p1 * p2, which is called dividing the preprocessed image (the sharpened result) by the result of the operation to obtain a set of the same size as the structure element.
  • Array / matrix p1 * p2 in each array, the p1 * p2 values contained in the array are sorted in ascending order, and the value corresponding to the thirtieth percentile in the array is taken as the area (numerical matrix)
  • the threshold / threshold of the binarization of the image In this way, the thresholds are determined to binarize each area on the image.
  • the resulting binarization result highlights the required information while denoising, which is conducive to the accurate detection of subsequent bright spots. .
  • the determination of the first bright spot detection threshold is performed using the Otsu method.
  • the Otsu method can also be called the maximum inter-class variance method.
  • the Otsu method uses the largest inter-class variance to segment the image, which means that the probability of misclassification is small and the accuracy is high.
  • the foreground and background segmentation thresholds of the preprocessed image are T (c1)
  • the proportion of pixels belonging to the foreground to the entire image is w 0
  • the average grayscale is ⁇ 0
  • the number of pixels belonging to the background accounts for the entire frame.
  • the scale of the image is w 1
  • its average gray scale is ⁇ 1 .
  • the total average gray level of the image to be processed is recorded as ⁇
  • the inter-class variance is recorded as var, then:
  • the traversal method is used to obtain the segmentation threshold T that maximizes the variance between classes, that is, the first bright spot detection threshold c1 obtained.
  • identifying candidate bright spots on the image based on the pre-processed image and the simplified image includes determining that a pixel point matrix that simultaneously meets a) -c) three conditions is a candidate bright spot. In this way, the accuracy of subsequent determination of the nucleic acid sequence based on the bright spot information and the quality of the offline data can be effectively improved.
  • the conditions that need to be satisfied for determining the candidate bright spots include a), k1 and k2 may be equal or unequal.
  • the relevant parameters of the imaging system are: the objective lens is 60 times, the size of the electronic sensor is 6.5 ⁇ m, and the image formed by the microscope and then passed through the electronic sensor, the minimum size that can be seen is 0.1 ⁇ m. It can be a 16-bit grayscale or color image of 512 * 512, 1024 * 1024, or 2048 * 2048.
  • the values of k1 and k2 are both greater than 1 and less than 10.
  • the conditions that need to be met for determining the candidate bright spots include b).
  • the pixel value of the central pixel of the pixel matrix is a second preset value
  • the connected pixels of the pixel matrix are greater than 2 / 3 * k1 * k2, that is, the pixel value of the central pixel is larger than the critical value and the connected pixels are larger than two thirds of the matrix.
  • two or more pixels whose adjacent pixel values are the second preset value are called connected pixels / connectivity.
  • a simplified image is a binary image, and the first preset value is It is 0, and the second preset value is 1. As shown in FIG.
  • the pixel point matrix does not satisfy the condition b), and is not a candidate bright spot.
  • the conditions that need to be met for determining the candidate bright spot include c).
  • g2 is a pixel in the range of m1 * m2 after correction, that is, the sum of pixels in the range of m1 * m2 after correction.
  • the so-called determining whether the candidate bright spot is a bright spot further includes: determining a second bright spot detection threshold based on the pre-processed image, and determining that the candidate bright spot whose pixel value is not less than the second bright spot detection threshold is Bright spots.
  • the pixel value of the pixel point at which the coordinates of the candidate bright spot are located is used as the pixel value of the candidate bright spot.
  • the center of gravity method can be used to obtain the coordinates of candidate bright spots, including sub-pixel-level coordinates.
  • the gray value of the coordinate position of the candidate bright spot is calculated by a bilinear interpolation method.
  • determining whether the candidate bright spot is a bright spot includes: dividing the pre-processed image into a set of blocks of a predetermined size, and sorting the pixel values of the pixels in the region to determine The second bright spot detection threshold corresponding to the region; for a candidate bright spot located in the region, it is determined that the candidate bright spot whose pixel value is not less than the second bright spot detection threshold corresponding to the region is a bright spot. In this way, distinguishing the differences in different areas of the image, such as the overall drop in light intensity, and further detecting and identifying bright spots, is conducive to accurately identifying bright spots and obtaining more bright spots.
  • the so-called pre-processed image is divided into a set of blocks of a predetermined size, and there may or may not be overlap between the blocks. In one example, there is no overlap between blocks.
  • the size of the pre-processed image is not less than 512 * 512, such as 512 * 512, 1024 * 1024, 1800 * 1800, or 2056 * 2056, etc., and the area of the predetermined size may be set to 200 * 200. In this way, it is beneficial to quickly calculate and identify bright spots.
  • the pixel values of the pixels in each block are arranged in ascending order by size, and p10 + (p10-p1) * 4.1 is taken as the corresponding value of the block.
  • the second bright spot detection threshold that is, the background of the block
  • p1 represents the pixel value of the hundredth percentile
  • p10 represents the pixel value of the tenth percentile.
  • the threshold is a relatively stable threshold obtained by the inventor through a large amount of data training tests, and can eliminate bright spots on a large number of backgrounds. Understandably, when the optical system is adjusted and the overall pixel distribution of the image is changed, this threshold may need to be adjusted appropriately.
  • FIG. 6 is a comparison diagram of the bright spot detection results before and after the process, that is, the bright spot detection results before and after the area background is excluded.
  • the upper half of FIG. 6 is the bright spot detection results after the processing, and the lower half.
  • the cross marks are candidate bright spots or bright spots.
  • An embodiment of the present invention further provides a base recognition method, which includes matching bright spots on an image obtained from a base extension reaction to a bright spot set of a corresponding sequencing template, and performing base recognition based on the bright spots on the matching.
  • a base recognition method which includes matching bright spots on an image obtained from a base extension reaction to a bright spot set of a corresponding sequencing template, and performing base recognition based on the bright spots on the matching.
  • the second predetermined pixel is called 2. In this way, accurate base recognition can be achieved, and a partial base sequence (reading) of the template can be obtained.
  • a "computer-readable storage medium” may be any device that can contain, store, communicate, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. .
  • computer-readable storage media include the following: electrical connections (electronic devices) with one or more wirings, portable computer disk cartridges (magnetic devices), random access memory (RAM) , Read-only memory (ROM), erasable and editable read-only memory (EPROM or flash memory), fiber optic devices, and portable optical disk read-only memory (CDROM).
  • the computer-readable storage medium may even be paper or other suitable media on which the program can be printed, because, for example, by optically scanning the paper or other media and then editing, interpreting, or otherwise Processing is performed in a suitable manner to obtain the program electronically and then store it in a computer memory.
  • An embodiment of the present invention further provides an apparatus 100 for constructing a sequencing template based on an image. As shown in FIG. 7, it is used to implement the method for constructing a sequencing template based on an image in any of the embodiments of the present invention. Corresponds to the first image, the second image, the third image, and the fourth image of the same field of view in the four base extension reactions of A / U, T, G, and C. There are multiple bands in the field of view in the base extension reaction.
  • Nucleic acid molecules with optically detectable labels at least a portion of the nucleic acid molecules appear as bright spots on the image, define the sequence and / or simultaneously realize a four-type base extension reaction as a round of sequencing reaction
  • the first image includes image M1 and image M2
  • the second image includes images N1 and N2
  • the third image includes images P1 and P2
  • the fourth image includes images Q1 and Q2
  • the images M1 and M2 are from two rounds of sequencing reactions, respectively
  • the images N1 and N2 are respectively From two rounds of sequencing reactions
  • image P1 and image P2 are from two rounds of sequencing reactions
  • image Q1 and image Q2 are from two rounds of sequencing reactions respectively.
  • the device includes: a combination unit 110 for grouping Combine any two images of image M1, image M2, image N1, image N2, image P1, image P2, image Q1 and image Q2 to perform bright spot matching, and make image M1, image N1, image N2, image P1, image P2, image Q1, and image Q2 all participate in the combination at least once to obtain a plurality of combined images including the first coincident bright spot, and the two or more bright spots on the combined image whose distance is less than the first predetermined pixel are a first coincidence Bright spots; a merging unit 130, configured to combine first overlapping bright spots on a plurality of combined images from the combining unit to obtain a bright spot set corresponding to a sequencing template.
  • a combination unit 110 for grouping Combine any two images of image M1, image M2, image N1, image N2, image P1, image P2, image Q1 and image Q2 to perform bright spot matching, and make image M1, image N1, image N2, image P1, image P2, image Q1, and image Q2 all participate in the combination at least once to obtain a
  • merging the first overlapping bright spots on multiple combined images includes performing one or more matching on the first overlapping bright spots in different combined images to obtain a bright spot set corresponding to a sequencing template.
  • the image M1, the image N1, the image P1, and the image Q1 are sequentially obtained
  • the image M2, the image N2, the image P2, and the image Q2 are sequentially obtained
  • the combining unit 130 is configured to: interval S images to the image M1, image M2, image N1, image N2, image P1, image P2, image Q1, and image Q2 are combined in pairs to obtain K combined images and match bright spots on the combined image, and discard non-overlapping bright spots on the combined image.
  • the image is a registered image.
  • the device 100 further includes a registration unit 108, which is used for image registration, the registration unit includes a first registration module and a second registration module, and the first registration module is used to be registered based on the reference image
  • the image is first registered, and the reference image and the image to be registered correspond to the same field of view, including determining a first offset of a predetermined region on the image to be registered and a corresponding predetermined region on the reference image, and moving based on the first offset.
  • the second registration module is used to perform second registration on the first-to-be-registered image based on the reference image, including Merge the to-be-registered image and the reference image after the first registration to obtain a merged image, and calculate the offsets of all the second coincident bright spots in a predetermined area on the merged image to determine the second offset on the merged image
  • Two or more bright spots with a distance smaller than the second predetermined pixel are a second coincident bright spot, and all bright spots on the image to be registered after the first registration are moved based on the second offset to achieve Registration of registered images .
  • the reference image is obtained through construction.
  • the registration unit 108 further includes a reference image construction module, which is used to obtain a fifth image and a sixth image, and the fifth image and the sixth image and the image to be registered.
  • rough registration of the sixth image based on the fifth image includes determining an offset between the sixth image and the fifth image, and moving the sixth image based on the offset to obtain a sixth image after the coarse registration ; Combine the fifth image and the sixth image after coarse registration to obtain a reference image.
  • the method when using the reference image construction module to construct a reference image, the method further includes using a seventh image and an eighth image, the seventh image and the eighth image and the image to be registered are from the same field of view of the sequencing reaction, the fifth image, The sixth image, the seventh image, and the eighth image correspond to the field of view of the four types of base extension reactions of A / U, T, G, and C, respectively.
  • the construction of the reference image further includes: rough matching the seventh image based on the fifth image.
  • Registration including determining an offset of the seventh image and the fifth image, moving the seventh image based on the offset to obtain a seventh image after coarse registration; performing coarse registration of the eighth image based on the fifth image, including Determine the offset between the eighth image and the fifth image, and move the eighth image based on the offset to obtain the eighth image after the coarse registration; merge the fifth image and the sixth image after the coarse registration, and the coarse registration And a seventh image after the coarse registration to obtain a reference image.
  • the reference image and the image to be registered are binarized images.
  • a two-dimensional discrete Fourier transform is used to determine the first offset, the sixth and fifth images, the seventh and fifth images, and / or the eighth image and The offset of the fifth image.
  • the device 100 further includes a bright spot detection unit 106.
  • the bright spot detection unit 106 is configured to: preprocess the image to obtain a preprocessed image; determine a threshold value to simplify the preprocessed image, The pixel value of the pixel point on the preprocessed image is assigned a first preset value, and the pixel value of the pixel point on the preprocessed image that is not less than a critical value is assigned a second preset value to obtain simplicity Image; determining the first bright spot detection threshold c1 based on the pre-processed image; identifying candidate bright spots on the image based on the pre-processed image and the simplified image, including determining pixels that meet at least two of the following conditions a) -c) The point matrix is a candidate bright spot.
  • the pixel value of the central pixel of the pixel matrix is the largest.
  • the pixel matrix can be expressed as k1 * k2, k1 and k2 are both odd numbers greater than 1.
  • K1 * k2 pixel matrix contains k1 * k2 pixels
  • the pixel value of the central pixel of the pixel matrix is the second preset value and the connected pixels of the pixel matrix are greater than 2/3 * k1 * k2, and c)
  • the pixel value of the central pixel point of the pixel point matrix in the processed image is greater than the third preset value and satisfies g1 * g2> c1.
  • G1 is the second of the m1 * m2 range centered on the central pixel point of the pixel point matrix Correlation coefficient of the gaussian distribution, g2 is a pixel in the m1 * m2 range, m1 and m2 are both odd numbers greater than 1, and the m1 * m2 range contains m1 * m2 pixels.
  • the bright spot detection unit 106 further includes a method for determining whether the candidate bright spot is a bright spot, including: determining a second bright spot detection threshold based on the pre-processed image, and determining that the pixel value is not less than the second bright spot detection.
  • the candidate bright spots of the threshold are bright spots.
  • the pixel value of the candidate bright spot is the pixel value of the pixel point where the coordinates of the candidate bright spot are located.
  • determining whether the candidate bright spot is a bright spot in the bright spot detection unit 106 includes: dividing the preprocessed image into a set of regions of a predetermined size, and sorting pixel values of pixels in the region, To determine a second bright spot detection threshold corresponding to the region, and for a candidate bright spot located in the region, it is determined that a candidate bright spot whose pixel value is not less than the second bright spot detection threshold corresponding to the region is a bright spot.
  • preprocessing the image in the bright spot detection unit 106 includes: determining the background of the image using an open operation, converting the image to a first image based on the background using a top hat operation, and performing Gaussian blur processing on the first image, Obtain a second image, sharpen the second image, and obtain a preprocessed image.
  • determining a threshold value in the bright spot detection unit 106 to simplify the pre-processed image to obtain a simplified image includes: determining a threshold value based on the background and the pre-processed image, and comparing the pre-processed image. The pixel values and thresholds of the pixels to obtain a simplified image.
  • g2 is a pixel in the corrected m1 * m2 range, and the correction is performed according to the proportion of pixels with a second preset value in the corresponding m1 * m2 range of the simplified image.
  • An embodiment of the present invention further provides a base recognition device 1000 for implementing the base recognition method in any one of the above specific embodiments of the present invention.
  • the device 1000 is configured to apply an image obtained from a base extension reaction to an image.
  • the bright spots match the bright spots set of the corresponding sequencing template, and base recognition is performed based on the bright spots on the match.
  • a portion of the nucleic acid molecules appear as bright spots on the image obtained from the base extension reaction, and the bright spot collection corresponding to the sequencing template is obtained by the method for constructing a sequencing template based on an image and / or the device for constructing a sequencing template based on an image in any of the above embodiments. Construct.
  • the base recognition device 1000 if the distance between any bright spot on the image obtained from the base extension reaction in the bright spot set corresponding to the sequencing template is smaller than the third predetermined pixel, it is determined that the This bright spot on the image matches the bright spot set corresponding to the sequencing template.
  • a computer program product includes instructions for constructing a sequencing template based on an image.
  • the instruction causes the computer to execute the image-based construction in any one of the embodiments of the present invention. Methods for sequencing templates.
  • another computer program product is also provided.
  • the product includes instructions for realizing base recognition.
  • the instructions cause the computer to execute the base recognition method according to any one of the embodiments of the present invention. .
  • controller in addition to implementing the controller / processor in a pure computer-readable program code manner, the controller can be controlled by logic gates, switches, ASICs, and editable logic by logically changing the method steps. Controller and embedded microcontroller to achieve the same function. Therefore, such a controller / processor can be considered as a hardware component, and a device included therein for implementing various functions can also be considered as a structure within the hardware component. Or even, the means for implementing various functions can be regarded as a structure that can be both a software module implementing the method and a hardware component.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Organic Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Multimedia (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Biomedical Technology (AREA)
  • Genetics & Genomics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Biochemistry (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Microbiology (AREA)
  • Software Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)

Abstract

一种基于图像构建测序模板的方法、装置和系统。所称的图像包括分别对应A/U、T、G和C四种碱基延伸反应时的一个相同视野的第一、第二、第三和第四图像,第一、第二、第三和第四图像分别包括图像M1和M2、图像N1和N2、图像P1和P2、图像Q1和Q2,该构建测序模板的方法包括:组合图像M1、M2、N1、N2、P1、P2、Q1和Q2中的任两图像以进行亮斑匹配,并且使图像M1、N1、N2、P1、P2、Q1和Q2均至少一次参与该组合,获得包含第一重合亮斑的多个组合图像(S10),在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;合并多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合(S20)。该方法能够有效地获取对应核酸模板的亮斑集合。

Description

基于图像构建测序模板的方法、碱基识别方法和装置 技术领域
本发明涉及图像处理和信息识别领域,特别地,涉及一种基于图像构建测序模板的方法、一种碱基识别方法、一种基于图像构建测序模板的装置、一种碱基识别装置和一种计算机程序产品。
背景技术
在相关技术中,包括在基于成像系统多次对生化反应中的核酸分子(模板)进行图像采集以测定该核酸分子的核苷酸顺序的测序平台中,如何处理以及关联多次不同时间点所采集的图像包括图像上的信息,以有效且准确地获得核酸模板的至少一部分的核苷酸组成和顺序,是值得关注的问题。
发明内容
本发明实施方式旨在至少解决相关技术中存在的技术问题之一或者至少提供一种可选择的实用方案。
依据本发明的一个实施方式,提供一种基于图像构建测序模板的方法,所称的图像包括分别对应A/U、T、G和C四种碱基延伸反应时的一个相同视野的第一图像、第二图像、第三图像和第四图像,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,定义顺序和/或同时实现一次四种类型碱基延伸反应为一轮测序反应,第一图像包括图像M1和图像M2,第二图像包括图像N1和图像N2,第三图像包括图像P1和图像P2,第四图像包括图像Q1和图像Q2,图像M1和图像M2分别来自两轮测序反应,图像N1和图像N2分别来自两轮测序反应,图像P1和图像P2分别来自两轮测序反应,图像Q1和图像Q2分别来自两轮测序反应,该方法包括:组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与组合,获得包含第一重合亮斑的多个组合图像,在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;以及合并多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合。
依据本发明的一个实施方式,提供一种基于图像构建测序模板的装置,该装置用以实施上述本发明实施方式中的基于图像构建测序模板的方法的全部或部分步骤。所称的图像包括分别对应A/U、T、G和C四种碱基延伸反应时的一个相同视野的第一图像、第二图像、第三图像和第四图像,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,定义顺序和/或同时实现一次四种类型碱基延伸反应为一轮测序反应,第一图像包括图像M1和图像M2,第二图像包括图像N1和图像N2,第三图像包括图像P1和图像P2,第四图像包括图像Q1和图像Q2,图像M1和图像M2分别来自两轮测序反应,图像N1和图像N2分别来自两轮测序反应,图像P1和图像P2分别来自两轮测序反应,图像Q1和图像Q2分别来自两轮测序反应,该装置包括:组合单元,用于组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与该组合,获得包含第一重合亮斑的多个组合图像,在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;以及合并单元,用于合并来自所述组合单元的多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合。
依据本发明的一个实施方式,提供一种计算机可读存储介质,用于存储供计算机执行的程序,执行所述程序包括完成上述任一实施方式中的基于图像构建测序模板的方法。计算机可读存储介质包括但不限于只读存储器、随机存储器、磁盘或光盘等。
依据本发明的一个实施方式,还提供一种终端,一种计算机程序产品,该产品包括指令,该指令在计算机执行所称的程序时,使该计算机执行上述本发明实施方式中的基于图像的构建测序模板的方法。
利用上述基于图像构建测序模板的方法、装置、计算机可读存储介质和/或计算机程序产品构建获得的测序模板,是一个对应测序模板的亮斑集合,该亮斑集合能有效、准确且全面的反映测序模板的信息,利于进一步的碱基的准确识别(base call),即准确识别获取模板核酸的至少一部分的核苷酸序列。
依据本发明的另一个实施方式,提供一种碱基识别方法,该方法包括将获自碱基延伸反应的图像上的亮斑匹配到对应测序模板的亮斑集合,依据匹配上的亮斑进行碱基识别,获自碱基延伸反应的图像对应的视野中存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在获自碱基延伸反应的图像上表现为亮斑,对应测序模板的亮斑集合通过上述本发明实施方式中的基于图像构建测序模板的方法、装置、计算机可读存储介质和/或计算机程序产品来构建获取。
依据本发明的一个实施方式,提供一种碱基识别装置,该装置用于实施上述本发明实施方式中的碱基识别方法,该装置用于将获自碱基延伸反应的图像上的亮斑匹配到对应测序模板的亮斑集合,依据匹配上的亮斑进行碱基识别,获自碱基延伸反应的图像对应的视野中存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在获自碱基延伸反应的图像上表现为亮斑,对应测序模板的亮斑集合通过上述本发明实施方式中的基于图像构建测序模板的方法和/或装置构建获得。
依据本发明的一个实施方式,提供一种计算机可读存储介质,用于存储供计算机执行的程序,执行所述程序包括完成上述任一实施方式中的碱基识别方法。计算机可读存储介质包括但不限于只读存储器、随机存储器、磁盘或光盘等。
依据本发明的一个实施方式,还提供一种计算机程序产品,该产品包括实现碱基识别的指令,该指令在计算机执行所称的程序时,使该计算机执行上述本发明实施方式中的碱基识别的方法。
利用该碱基识别方法、装置、计算机可读存储介质和/或计算机程序产品,基于构建得的对应测序模板的亮斑集合,能够识别碱基延伸反应时与模板核酸结合的碱基的类型,能够用于实现模板核酸序列的准确测定。
本发明实施方式的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明实施方式的实践了解到。
附图说明
图1是本发明具体实施方式中的基于图像构建测序模板的方法的流程示意图。
图2是本发明具体实施方式中的基于亮斑对图像Repeat1、Repeat5、Repeat6和Repeat7进行组合和合并以构建测序模板的示意图。
图3是本发明的具体实施方式中的纠偏过程和纠偏结果的示意图。
图4是本发明具体实施方式中的候选亮斑的对应的矩阵以及连同像素示意图。
图5是本发明具体实施方式中的以像素点矩阵的中心像素点为中心的m1*m2范围的像素值示意图。
图6是本发明具体实施方式中的依据第二亮斑检测阈值进行判定之前和之后的亮斑检测结果对比示意图。
图7是本发明具体实施方式中的基于图像构建测序模板的装置的结构示意图。
具体实施方式
下面详细描述本发明的实施方式,实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本发明,而不能理解为对本发明的限制。
在本发明的描述中,术语“第一”、“第二”、“第三”、“第四”等仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量或者顺序。在本发明的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
请参阅图1,本发明实施方式提供一种基于图像构建测序模板的方法,所称的图像采集自一个相同视野,包括分别采集于A/U、T、G和C四种碱基延伸反应时的第一图像、第二图像、第三图像和第四图像,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,定义顺序和/或同时实现一次四种类型碱基延伸反应为一轮测序反应,第一图像包括图像M1和图像M2,第二图像包括图像N1和图像N2,第三图像包括图像P1和图像P2,第四图像包括图像Q1和图像Q2,图像M1和图像M2分别来自两轮测序反应,图像N1和图像N2分别来自两轮测序反应,图像P1和图像P2分别来自两轮测序反应,图像Q1和图像Q2分别来自两轮测序反应,该方法包括:S10组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与组合,获得包含第一重合亮斑的多个组合图像,在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;以及S20合并多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合。所称的“亮斑”也称为“亮点”(spots或peaks),指图像上的发光点,一个发光点占有至少一个像素点。所称“像素点”同“像素”。
该方法通过对多个图像上的亮斑先取交集再取并集,能够获得与模板核酸分子对应的亮斑集合。利用该方法获得的测序模板,是一个对应测序模板的亮斑集合,该亮斑集合能有效、准确且全面的反映测序模板的信息,获得的亮斑集合能够进一步用于碱基的准确识别(base call),即用于准确获取模板核酸的至少一部分的核苷酸序列。
所称的一轮测序反应,顺序和/或同时实现一次四种类型碱基延伸反应,可以是四种类型碱基反应底物(例如核苷酸类似物/碱基类似物)同时于一个碱基延伸反应体系中实现一轮测序反应,可以是两种类型碱基类似物于一个碱基延伸反应体系中、另外两种类型反应底物于下一个碱基延伸反应体系以实现一轮测序反应,也可以是一种类型碱基类似物于一个碱基延伸反应体系中、依次在四个连续的碱基延伸反应体系中加入该四种类型碱基类似物以实现一轮测序反应。可知,第一图像、第二图像、第三图像和第四图像可以采集自两次碱基延伸反应或者更多次的碱基延伸反应。另外,一个碱基延伸反应可能包含一次图像采集,也可能包含多次图像采集。
在一个示例中,一轮测序反应包括多次碱基延伸反应,例如单色测序,利用的四种类型碱基对应的反应底物(核苷酸类似物)均带有同一种荧光染料,一轮测序反应包括四次碱基延伸反应(4repeats),对于一个视野来说,一次碱基延伸反应包含一次图像采集,图像M1、图像N1、图像P1和图像Q1分别为来自一轮测序反应的四次碱基延伸反应的同一视野。
在另一个示例中,例如单分子双色测序反应,利用的四种类型碱基对应的反应底物(核苷酸类似物)中的两种带有一种荧光染料、另两种带有另一种不同激发波长的荧光染料,一轮测序反应包括两次碱基延伸反应,带有不同染料的两种类型碱基反应底物于一次碱基延伸反应中进行结合反应,对于一个视野,一次碱基延伸反应包括两次于不同激发波长下的图像采集,图像M1、图像N1、图像P1和图像Q1分别来自一轮测序反应的两次碱基延伸反应的两种激发波长下的同一视野。
在又一个示例中,一轮测序反应包括一次碱基延伸反应,例如二代测序平台的双色测序反应,四种类型碱基反应底物(例如核苷酸类似物)分别带有染料a、带有染料b、带有染料a和染料b以及不带任何染料,染料a和染料b的激发波长不一样;四种类型反应底物于同一次碱基延伸反应中实现一轮测序反应,一次碱基延伸反应包括两次于不同激发波长下的图像采集,第一图像同第三图像、第二图像同第四图像,图像M1和图像N1分别来自不同轮测序反应或者同一轮测序反应中的不同激发波长下的同一视野。
在某些具体实施方式中,S20合并多个组合图像上的第一重合亮斑,包括对不同组合图像中的第一重合亮斑进行一次或多次匹配,以获得对应测序模板的亮斑集合。如此,利于获得准确的、与模板核酸分子一一对应的亮斑的集合,利于基于图像信息构建准确的模板。
在某些具体实施方式中,图像M1、图像N1、图像P1和图像Q1为顺序获得,图像M2、图像N2、图像P2和图像Q2为顺序获得,即图像M1、图像N1、图像P1和图像Q1于一轮测序反应中获得,图像M2、图像N2、图像P2和图像Q2于另一轮测序反应中获得,S10包括:间隔S个图像对图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2进行两两组合,获得K个组合图像以及对组合图像上的亮斑进行匹配,弃组合图像上的非重合亮斑,S为整数,0≤S≤S max,S max=参与组合的图像总数-4。可以计算出,K=[(参与组合的图像总数B-S-1)+1]*(参与组合的图像总数B-S-1)/2,即
Figure PCTCN2018101819-appb-000001
例如,S=2时,K=15。如此,能充分的利用尽可能少的图像信息构建完整的测序模板。
对于一轮测序反应包含四次碱基延伸反应,即每次碱基延伸反应只含一种类型核苷酸类似物,较佳地,S大于1,更佳地,S大于2,有利于避免或减少生化试验因素带来的噪声对基于图像构建测序模板带来的干扰,利于有效且准确地确定模板。
在一个实施例中,参与组合的图像的总数为12,S=2,如此,能够获得较完整的测序模板,减少下机数据(读段)的损失。
在另一个实施例中,参与组合的图像的总数为8,S=2。当Repeat=5的时候,将图像Repeat1-5(Repeat1和Repeat5的图像),Repeat2-5分别进行重合亮斑匹配,然后将匹配结果合并至模板容器(Template;初始为空)中;在一个示例中,因Repeat4图像用于参考图像的构建,为减少计算量,模板构建从图像Repeat5开始;Repeat=6时,分别对图像Repeat1-6、Repeat2-6、Repeat3-6进行重合亮斑匹配,再合并匹配结果至Template中;Repeat=7时,分别对图像Repeat1-7、Repeat2-7、Repeat3-7、Repeat4-7进行重合亮斑匹配,再合并匹配结果至Template中;Repeat=8时,分别对图像Repeat1-8、Repeat2-8、Repeat3-8、Repeat4-8、Repeat5-8进行重合亮斑匹配,再合并匹配结果至Template中;最后统计所有Template容器中的亮斑,输出,每个亮斑坐标表示一条链,即一条reads。模板构建成功后,能知道reads总数TotalRead。图2为该过程的示意图,图2的上面四个图像依次为Repeat1、Repeat5、Repeat6和Repeat7,中间的图像变化示意Repeat1和Repeat5重合亮斑匹配的过程和结果,下图示意图像Repeat1、Repeat5、Repeat6和Repeat7重合亮斑匹配的结果。
在一个示例中,成像系统中,电子传感器的尺寸为6.5μm,显微镜放大倍率60倍,能看到的最小尺寸就是0.1μm。对应核酸分子的亮斑的大小一般为小于10*10像素。
所称的第一预定像素,在一个示例中,为1.05像素。
在一个示例中,设置距离大于1.85像素的两个第一重合亮斑为两个第一重合亮斑。
在一个示例中,舍弃距离一个重合亮斑大于1.05像素但小于另一个重合亮斑1.85像素的重合亮斑。如此,有利于构建准确的测序模板。
在某些具体实施方式中,图像为经过配准的图像。如此,利于准确地获取对应与测序模板的亮斑集合。
本发明实施方式对实现图像配准即纠偏的方式不作限制。在一些示例中,利用如下方法进行图像配准,包括:基于参考图像对待配准图像进行第一配准,参考图像和待配准图像对应相同对象,参考图像和待配准图像均包含多个亮斑,包括确定待配准图像上的预定区域和参考图像上的相应预定区域的第一偏移量,基于第一偏移量移动待配准图像上的所有亮斑,获得第一配准后的待配准图像;基于参考图像对第一配准后的待配准图像进行第二配准,包括合并第一配准后的待配准图像和参考图像,获得合并图像,计算合并图像上的预定区域的所有重合亮斑的偏移量,以确定第二偏移量,距离小于预定像素的两个或多个亮斑为一个重合亮斑,基于该第二偏移量移动第一配准后的待配准图像上的所有亮斑,以实现对待配准图像的配准。该图像配准方法通过两次关联配准,可相对称为粗配准和细配准,包括利用图像上的亮斑进行细配准,能够基于少量数据信息快速地实现图像的高精度纠偏,特别适于高精度图像纠偏要求的场景。例如,单分子级别的图像检测,比如来自第三代测序平台的测序反应的图像。所称单分子级别指分辨率为单个或少数几个分子的大小,例如10个、8个、5个、4个或3个以下分子。
在某些具体实施方式中,待配准图像即构建测序模板的图像来自利用光学成像原理进行序列测定的测序平台。所称的测序,也称为序列测定,指核酸序列测定,包括DNA测序和/或RNA测序,包括长片段测序和/或短片段测序,测序生化反应包括碱基的延伸。测序可以通过测序平台进行,测序平台可选择但不限于Illumina公司的Hisq/Miseq/Nextseq测序平台、Thermo Fisher/Life Technologies公司的Ion Torrent平台、华大基因的BGISEQ平台和单分子测序平台;测序方式可以选择单端测序,也可以选择双末端测序;获得的测序结果/数据即测读出来的片段,称为读段(reads),读段的长度称为读长。所称的“亮斑”对应延伸碱基或碱基簇的光学信号。
所称的图像上的预定区域,可以是整个图像,也可以是图像的一部分。在一个示例中,图像上的预定区域为图像的一部分,例如为图像中心的512*512区域。所称的图像中心,为该视野的中心,成像系统的光轴与成像平面的交点可称为图像中心点,以该中心点为中心的区域可视为图像中心区域。
在某些具体实施方式中,待配准图像来自核酸测序平台,该平台包括成像系统和核酸样本承载系统,带有光学检测标记的待测核酸分子固定于反应器中,该反应器也称为芯片,芯片装载在一个可移动台子上,通过该移动台子带动芯片运动来实现对位于芯片不同位置(不同视野)的待测核酸分子进行图像采集。一般地,光学系统和/或移动台子的运动存在精度限制,例如,指令指定运动至某个位置和该机械结构实际运动达到的位置存在偏差,特别是在对精度高要求的应用情景,由此,在依据指令移动硬件以对不同时间点的同一位置(视野)进行多次图像采集的过程中,不同时间点采集的同一视野的多个图像难以完全对齐,对该些图像进行纠偏对齐,有利于基于该多个时间点采集的多个图像中的信息的变化来准确确定核酸分子核苷酸顺序。
在某些具体实施方式中,所称的参考图像是通过构建获得的,参考图像可以在对待配准图像进行配准时构建,也可以预先构建保存需要时调用。
在一些示例中,构建参考图像包括:获取第五图像和第六图像,第五图像和第六图像与待配准图像对应相同对象;基于第五图像对第六图像进行粗配准,包括确定第六图像和第五图像的偏移量,基于该偏移量移动第六图像,获得粗配准后的第六图像;合并第五图像和粗配准后的第六图像,以获得参考图像,第五图像和第六图像均包含多个亮斑。如此,利用构建获得包含更多或相对更完整的信息的图像,利用该图像作为纠偏的基准,利于实现更准确的图像配准。对于核酸序列测定得到的图像,利用多个图像进行参考图像构建,利于使得该参考图像获得完整的对应核酸分子的亮斑信息,利于基于亮斑的图像纠偏。
在一些实施例中,第五图像、第六图像分别来自核酸序列测定反应(测序反应)的不同时刻的同一个视野。在一个示例中,一轮测序反应包括多次碱基延伸反应,例如单色测序,利用的四种类型碱基对 应的反应底物(核苷酸类似物)均带有同一种荧光染料,一轮测序反应包括四次碱基延伸反应(4repeats),对于一个视野来说,一次碱基延伸反应包含一次图像采集,第五图像和第六图像分别来自不同次的碱基延伸反应的同一视野。如此,通过处理以及集合第五图像和第六图像的信息获得的参考图像作为纠偏的基准,利于进行更准确的图像纠偏。
在另一个示例中,单分子双色测序反应,利用的四种类型碱基对应的反应底物(核苷酸类似物)中的两种带有一种荧光染料、另两种带有另一种不同激发波长的荧光染料,一轮测序反应包括两次碱基延伸反应,带有不同染料的两种类型碱基反应底物于一次碱基延伸反应中进行结合反应,对于一个视野,一次碱基延伸反应包括两次于不同激发波长下的图像采集,第五图像和第六图像分别来自不同次的碱基延伸反应或者同一次碱基延伸反应中的不同激发波长下的同一视野。如此,通过处理以及集合第五图像和第六图像的信息获得的参考图像作为纠偏的基准,利于进行更准确的图像纠偏。
在又一个示例中,一轮测序反应包括一次碱基延伸反应,例如二代测序平台的双色测序反应,四种类型碱基反应底物(例如核苷酸类似物)分别带有染料a、带有染料b、带有染料a和染料b以及不带任何染料,染料a和染料b的激发波长不一样;四种类型反应底物于同一次碱基延伸反应中实现一轮测序反应,第五图像和第六图像分别来自不同轮测序反应或者同一轮测序反应中的不同激发波长下的同一视野。如此,通过处理以及集合第五图像和第六图像的信息获得的参考图像作为纠偏的基准,利于进行更准确的图像纠偏。
第五图像和/或第六图像,可以是一个图像也可以是多个图像。在一个示例中,第五图像为第一图像,第六图像为第二图像。进一步地,在一些具体实施方式中,还包括利用第七图像和第八图像构建所称的参考图像,待配准图像、第五图像、第六图像、第七图像和第八图像来自测序反应的相同视野,第五图像、第六图像、第七图像和第八图像分别对应A/U、T、G和C四种类型碱基延伸反应时的视野,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,构建参考图像还包括:基于第五图像对第七图像进行粗配准,包括确定第七图像和第五图像的偏移量,基于该偏移量移动第七图像,获得粗配准后的第七图像;基于第五图像对第八图像进行粗配准,包括确定第八图像和第五图像的偏移量,基于该偏移量移动第八图像,获得粗配准后的第八图像;合并第五图像和粗配准后的第六图像、粗配准后的第七图像以及粗配准后的第八图像,以获得参考图像。
本发明实施方式对第一配准的实现方式不作限制,例如可利用傅里叶变换,使用频域配准,确定第一偏移量。具体地,例如可参考Kenji TAKITA et al,IEICE TRANS.FUNDAMENTALS,VOL.E86-A,NO.8 AUGUST 2003.中的纯相位相关函数(Phase-Only Correlation Function)中的二维离散傅里叶变换确定第一偏移量、第六图像和第五图像的偏移量、第七图像和第五图像的偏移量和/或第八图像和第五图像的偏移量。第一配准/粗配准可达到1像素(1pixel)的精度。如此,可快速准确地确定第一偏移量和/或构建利于精确纠偏的参考图像。
在某些具体实施方式中,参考图像和待配准图像为二值化图像。如此,利于减少运算量快速纠偏。
在一个示例中,待纠偏图像和参考图像均为二值化图像,即图像中的各个像素非a即b,例如a为1,b为0,像素标记为1的较像素标记为0的亮,或者说强度大;参考图像是利用一轮测序反应的四次碱基延伸反应的图像repeat1、repeat2、repeat3和repeat4构建的,第五图像、第六图像选自图像repeat1-4中的任一个、两个或三个。
在一个示例中,第五图像为图像repeat1,图像repeat2、repeat3和repeat4为第六图像,基于图像图像repeat1依次对图像repeat2-4进行粗配准,分别获得粗配准后的图像repeat2-4;合并图像repeat1和粗配准后的图像repeat2-4,获得参考图像。所称的合并图像为合并图像中的重合亮斑。主要基于对应核酸分子的亮斑的大小和成像系统分辨率,在一个示例中,设定两个图像上距离不大于1.5个像素的两个亮 斑为重合亮斑。这里,采用4个repeat的合成的图像中心区域作为参考图像,一来利于使得参考图像具有足够量的亮斑,利于后续配准,二来检测及定位出的图像中心区域中的亮斑,亮斑信息是相对更准确的,利于准确配准。
在一个示例中,进行如下步骤对图像进行纠偏:1)对采集自另一轮测序反应的一次碱基延伸反应的某个视野的图像repeat5进行粗纠偏,repeat5为二值化后的图像,取该图像中心例如512*512区域,与repeat1-4合成的中心图像(相应参考图像的中心512*512区域),进行二维离散傅里叶变换,使用频域配准,得到偏移量offset(x0,y0),即实现图像粗配准,x0、y0能达到1pixel的精度;2)将上述粗配准后的图像和参考图像基于图像上的亮斑进行合并(merge),包括计算repeat5图像的中心区域内与参考图像相应区域内的重合亮斑的偏移量offset(x1,y1)=待纠偏图像的该亮斑的坐标位置-参考图像上的相应亮斑的坐标位置,可表示为offset(x1,y1)=curRepeatPoints-basePoints;求取所有重合亮斑的平均偏移量,从而得到[0,0]到[1,1]范围内的细偏移量。在一个示例中,设定两个图像上距离不大于1.5个像素的两个亮斑为重合亮斑;3)综上,得到一个视野图像(fov)不同cycle的偏移量(x0,y0)-(x1,y1),对于一个亮斑(peak)可表示为:curRepeatPoints+(x0,y0)-(x1,y1),curRepeatPoints表示该亮斑原始坐标,即在纠偏前的图像中的坐标。上述图像纠偏获得的纠偏结果具有较高的准确性,且纠偏精度小于或等于0.1像素。图3示意纠偏过程及结果,图3中,基于图像A对图像C进行纠偏,图像A和图像C中的圆圈表示亮斑、相同数字标记的亮斑为重合亮斑,图像C->A表示纠偏结果,即图像C对齐至图像A的结果。
本发明的实施方式对图像上亮斑的识别检测方式不作限定。在某些具体实施方式中,进行图像配准还包括识别亮斑,包括利用k1*k2矩阵对图像进行亮斑检测,判定矩阵的中心像素值不小于矩阵非中心任一像素值的矩阵对应一个候选亮斑,以及确定候选亮斑是否为亮斑,k1和k2均为大于1的奇数,k1*k2矩阵包含k1*k2个像素点。所称的图像选自待配准图像、构建参考图像的图像中的至少一个。利用该方式检测图像上的亮斑,能够快速有效地实现图像上的亮斑(spots或peaks)的检测,特别是对采集自核酸序列测定反应的图像。该方法对待检测图像即原始输入数据没有特别的限制,适用于任何利用光学检测原理进行核酸序列测定的平台所产生的图像的处理分析,包括但不限于二代和三代测序,具有高准确性和高效的特点,能从图像中获取更多的代表序列的信息。特别是对于随机图像及高准确度要求的信号识别,尤其具有优势。
在一些实施例中,图像来自核酸序列测定反应,核酸分子上带有光学可检测标记,利如荧光标记,荧光分子在特定波长激光照射下能够被激发发出荧光,通过成像系统采集图像。采集到的图像包括可能与荧光分子所在位置相对应的光斑/亮斑。可以理解地,当处于焦面位置时,所采集到的图像中的与荧光分子所在位置相对应的亮斑的尺寸较小且亮度较高;当位于非焦面位置时,所采集到的图像中的与荧光分子所在位置相对应的亮斑的尺寸较大且亮度较低。另外,视野中的可能存在其它非目标或者后续难以利用的物质/信息,比如杂质等;进一步地,在对单分子视野进行拍照中,大量分子聚集(簇)等也会干扰目标单分子信息采集。所称的单分子为一个少数几个分子,例如分子数目不大于10,例如为一个、两个、三个、四个、五个、六个、八个或者十个。
在一些示例中,矩阵的中心像素值大于第一预设值,矩阵非中心任一像素值大于第二预设值,第一预设值和第二预设值与图像的平均像素值相关。
在一些实施例中,可以利用k1*k2矩阵对图像进行遍历检测,所称的第一预设值和/或第二预设值的设置与该图像的平均像素值相关。对于灰度图像,像素值同灰度值。k1*k2矩阵,k1、k2可以相等也可以不相等。在一个示例中,成像系统相关参数为:物镜60倍,电子传感器的尺寸为6.5μm,经过显微镜成的像再经过电子传感器,能看到的最小尺寸为0.1μm,获得的图像或者输入的图像可为512*512、 1024*1024或2048*2048的16位的灰度或彩色图像,k1和k2的取值范围均为大于1且小于10。在一个示例中,k1=k2=3;在另一个示例中,k1=k2=5。若图像是彩色图像,彩色图像的一个像素点具有三个像素值,可以将彩色图像转化为灰度图像,再进行亮斑检测,以降低图像检测过程的计算量和复杂度。可选择但不限于利用浮点算法、整数方法、移位方法或平均值法等将非灰度图像转换成灰度图像。
在一个示例中,发明人经过大量图像处理统计,取第一预设值为该图像的平均像素的1.4倍,取第二预设值为该图像的平均像素值的1.1倍,能够排除干扰、获得来自于光学检测标记的亮斑检测结果。
可利用大小、与理想亮斑的相似程度和/或强度来对候选亮斑进一步进行筛选判断。在某些具体实施方式中,利用候选亮斑对应的连通域的大小来定量反映比较图像上候选亮斑的大小,以此来筛选判断候选亮斑是否为要的亮斑。
在一个示例中,确定候选亮斑是否为亮斑包括:计算一个候选亮斑对应的连通域的大小Area=A*B,判定对应的连通域的大小大于第三预设值的候选亮斑为一个亮斑,A表示以该候选亮斑对应的矩阵的中心的所在行的相连像素/连通像素的大小,B表示以该候选亮斑对应的矩阵的中心的所在列的相连像素/连通像素的大小,定义一个k1*k2矩阵中大于平均像素值的相连像素为一个所称的候选亮斑对应的连通域。如此,能够能够有效获得对应标记分子且符合后续序列识别的亮斑,获得核酸序列信息。
在一个例子中,以该图像的平均像素值作为基准,相邻的不小于平均像素值的两个或多个像素为所称的相连像素/连通像素(pixel connectivity),如图4所示,加粗加大的表示候选亮斑对应的矩阵的中心,粗线框表示候选亮斑对应的3*3矩阵,标记为1的像素为不小于该图像的平均像素值的像素点,标记为0的像素为小于平均像素值的像素点,可看出A=3,B=6,该候选亮斑对应的连通域的大小为A*B=3*6。
所称的第三预设值可依据该图像上所有候选亮斑对应的连通域的大小这一信息来确定。例如通过计算该图上各候选亮斑对应的连通域的大小,取亮斑的连通域大小的平均值代表该图像一个特性,作为第三预设值;又例如,可将该图像上各个候选亮斑对应的连通域大小按从小到大排序,取第50、第60、第70、第80或第90百分位数连通域大小作为该第三预设值。如此,可有效获得亮斑信息,利于后续识别核酸序列。
在某些示例中,通过统计设置参数来定量反映比较候选亮斑的强度特征,以此来筛选候选亮斑。在一个示例中,确定候选亮斑是否为亮斑包括:计算一个候选亮斑的分值Score=((k1*k2-1)CV-EV)/((CV+EV)/(k1*k2)),判定分值大于第四预设值的候选亮斑为一个亮斑,CV表示候选亮斑对应的矩阵的中心像素值,EV表示亮斑对应的矩阵的非中心像素值的总和。如此,能够能够有效获得对应标记分子且符合后续序列识别的亮斑,获得核酸序列信息。
所称的第四预设值可依据该图像上所有候选亮斑的分值的大小这一信息来确定。例如,当该图像上的候选亮斑的数量大于一定数目符合统计上对量的要求,例如该图像上候选亮斑的数目大于30,可计算且将该图像的所有候选亮斑的Score值按升序排序,第四预设值可设置为第50、第60、第70、第80或90分位数Score值,如此,可排除掉小于第50、第60、第70、第80或第90分位数Score值的候选亮斑,利于有效获得目标亮斑,利于后续碱基序列准确识别。进行该处理或者说该筛选设置的依据是,一般地,认为中心与边缘强度/像素值差异大且汇聚的亮斑为与待检分子所在位置相对应的亮斑。一般情况下,图像上的候选亮斑的数量大于50、大于100或大于1000。
在某些示例中,结合形态和强度/亮度对候选亮斑进行筛选。在一个示例中,确定候选亮斑是否为亮斑包括:计算一个候选亮斑对应的连通域的大小Area=A*B,以及计算一个候选亮斑的分值Score=((k1*k2-1)CV-EV)/((CV+EV)/(k1*k2)),A表示以该候选亮斑对应的矩阵的中心的所在行的相连像素/连通像素的大小,B表示以该候选亮斑对应的矩阵的中心的所在列的相连像素/连通像素的大小,定义一个k1*k2矩阵中大于平均像素值的相连像素为一个所称的候选亮斑对应的连通域,CV表示候选亮 斑对应的矩阵的中心像素值,EV表示亮斑对应的矩阵的非中心像素值的总和;判定对应的连通域的大小大于第三预设值且分值大于第四预设值的候选亮斑为一个亮斑。如此,能够有效地获得对应核酸分子且利于后续序列识别的亮斑信息。对于所称的第三预设值和/或第四预设值,可以参照前面具体实施方式进行考虑和设置。
在某些具体实施方式中,图像配准方法还包括亮斑识别检测,包括:预处理图像,获得预处理后的图像,所称的图像选自第一图像、第二图像、第三图像、第四图像、第五图像、第六图像、第七图像和第八图像中的至少一个;确定临界值以简化预处理后的图像,包括对小于临界值的预处理后的图像上的像素点的像素值赋值为第一预设值,对不小于临界值的预处理后的图像上的像素点的像素值赋值为第二预设值,以获得简化图像;基于预处理后的图像确定第一亮斑检测阈值c1;基于预处理后的图像和简化图像识别图像上的候选亮斑,包括判定满足以下a)-c)中至少两个条件的像素点矩阵为一个候选亮斑,a)在预处理后的图像中,像素点矩阵的中心像素点的像素值为最大,像素点矩阵可表示为r1*r2,r1和r2均为大于1的奇数,r1*r2像素点矩阵包含r1*r2个像素点,b)在简化图像中,像素点矩阵的中心像素点的像素值为第二预设值并且像素点矩阵的连通像素大于2/3*r1*r2,以及c)在预处理后的图像中的像素点矩阵的中心像素点的像素值大于第三预设值,并且满足g1*g2>c1,g1为以像素点矩阵的中心像素点为中心的m1*m2范围的二维高斯分布的相关系数,g2为该m1*m2范围的像素,m1和m2均为大于1的奇数,m1*m2范围包含m1*m2个像素点;以及确定候选亮斑是否为亮斑。利用该方式检测图像上的亮斑,包括利用发明人通过大量数据训练确定的判断条件或判断条件的组合,能够快速有效地实现图像上的亮斑的检测,特别是对采集自核酸序列测定反应的图像。该方法对待检测图像即原始输入数据没有特别的限制,适用于任何利用光学检测原理进行核酸序列测定的平台所产生的图像的处理分析,包括但不限于二代和三代测序,具有高准确性和高效的特点,能从图像中获取更多的代表序列的信息。特别是对于随机图像及高准确度要求的信号识别,尤其具有优势。
对于灰度图像,像素值同灰度值。若图像是彩色图像,彩色图像的一个像素点具有三个像素值,可以将彩色图像转化为灰度图像,再进行亮斑检测,以降低图像检测过程的计算量和复杂度。可选择但不限于利用浮点算法、整数方法、移位方法或平均值法等将非灰度图像转换成灰度图像。
在一些实施例中,预处理图像包括:利用开运算确定图像的背景;基于背景,利用顶帽运算将图像转化为第一图像;对第一图像进行高斯模糊处理,获得第二图像;对第二图像进行锐化,以获得所称的预处理后的图像。如此,能对图像进行有效的降噪或者说提高图像的信噪比,利于亮斑的准确检测。开运算是一种形态学处理,即先膨胀后腐蚀的过程,腐蚀操作会使得前景(感兴趣的部分)变小,而膨胀会使得前景变大;开运算可以用来消除小物体,在纤细点处分离物体,并且在平滑较大物体的边界的同时不明显改变其面积。该实施方式对图像做开运算的结构元p1*p2(用来处理图像的基本模板)的大小不作特别限制,p1和p2为奇数。在一个示例中,结构元p1*p2可以为15*15、31*31等,最终都能够获得利于后续处理分析的预处理后的图像。
顶帽运算往往用来分离比临近点(亮点/亮斑)亮一些的斑块,在一幅图像具有大幅的背景,而微小物品比较有规律的情况下,可以使用顶帽运算进行背景提取。在一个示例中,对图像进行顶帽变换包括先对图像做开运算,进而利用原图像减去开运算结果,获得第一图像即顶帽变换后的图像。顶帽变换的数学表达式为dst=tophat(src,element)=src-open(src,element)。发明人认为,开运算的结果放大了裂缝或者局部低亮度的区域,因此从原图中减去开运算后的图,得到的图像突出了比原图轮廓周围的区域更明亮的区域,这一操作与选择的核的大小相关,可以认为与亮点/亮斑的预期大小相关,若亮点不是预期大小,处理后的效果会使得整张图产生许多小凸起,具体可以参考虚焦图片,即亮点/亮斑晕染成一团。在一个示例中,亮点的预期大小即选择的核的大小为3*3,得到的顶帽变换后的图像利于后续进一步去噪处理。 高斯模糊(Gaussian Blur)也称为高斯滤波,是一种线性平滑滤波,适用于消除高斯噪声,广泛应用于图像处理的减噪过程。通俗的讲,高斯滤波就是对整幅图像进行加权平均的过程,每一个像素点的值,都由其本身和邻域内的其他像素值经过加权平均后得到。高斯滤波的具体操作是:用一个模板(或称卷积、掩模)扫描图像中的每一个像素,用模板确定的邻域内像素的加权平均灰度值去替代模板中心像素点的值。在一个示例中,对第一图像进行高斯模糊处理,在OpenCV中使用高斯滤波GaussianBlur函数进行,高斯分布参数Sigma取0.9,所使用的二维滤波器矩阵(卷积核)是3*3,从图像角度看经过该高斯模糊处理后,第一图像上的小突起被抹平,图像边缘光滑。进一步地,对第二图像即高斯过滤后的图像进行锐化,例如进行二维拉普拉斯锐化,从图像角度看经过处理后,边缘被锐化,高斯模糊后的图像得以恢复。
在一些实施例中,简化预处理后的图像包括:基于背景和预处理后的图像,确定临界值;比较预处理后的图像上的像素点的像素值与临界值,对小于临界值的预处理后的图像上的像素点的像素值赋值为第一预设值,对不小于临界值的预处理后的图像上的像素点的像素值赋值为第二预设值,获得简化图像。如此,根据发明人大量测试数据总结的确定临界值的方式以及确定的临界值,据此将预处理后的图像简化,例如二值化,利于后续亮斑准确检测,利于后续碱基准确识别、获得高质量数据等。
具体地,在一些示例中,获得简化图像包括:将预处理后获得的锐化后的结果除以开运算结果,获得和图像像素点对应的一组数值;通过该组数值,确定二值化预处理后的图像的临界值。例如,可将该组数值按大小升序排列,取该组数值中第20、30或40百分位数对应的数值作为二值化临界值/阈值。如此,获得的二值化图像利于后续亮斑的准确检测识别。
在一个示例中,图像预处理时的开运算的结构元为p1*p2,所称的将预处理后的图像(锐化后的结果)除以开运算结果,获得一组和结构元一样大小的数组/矩阵p1*p2,在每个数组中,将该数组包含的p1*p2个数值按大小升序排列,取该数组中第三十百分位数对应的数值作为该区域(数值矩阵)的二值化临界值/阈值,如此,分别确定阈值对图像上的各个区域进行二值化,最终获得的二值化结果在去噪的同时更加突出所需信息,利于后续亮斑的准确检测。
在一些示例中,利用大津法进行第一亮斑检测阈值的确定。大津法(OTSU算法)也可称为最大类间方差法,大津法利用类间方差最大来分割图像,意味着错分概率小,准确性高。假设预处理后的图像的前景和背景的分割阈值为T(c1),属于前景的像素点数占整幅图像的比例为w 0,其平均灰度为μ 0;属于背景的像素点数占整幅图像的比例为w 1,其平均灰度为μ 1。待处理图像的总平均灰度记为μ,类间方差记为var,则有:
μ=ω 0011;var=ω 00-μ) 211-μ) 2,将后者代入前者,得到等价公式:var=ω 0ω 110) 2。采用遍历的方法得到使类间方差最大的分割阈值T,即为所求的第一亮斑检测阈值c1。
在一些实施例中,基于预处理后的图像和简化图像识别图像上的候选亮斑,包括判断同时满足a)-c)三个条件的像素点矩阵为一个候选亮斑。如此,能有效地提高后续基于亮斑信息确定核酸序列的准确性和下机数据的质量。
具体地,在一个示例中,候选亮斑的判定需要满足的条件包括a),k1、k2可以相等也可以不相等。在一个示例中,成像系统相关参数为:物镜60倍,电子传感器的尺寸为6.5μm,经过显微镜成的像再经过电子传感器,能看到的最小尺寸为0.1μm,获得的图像或者输入的图像可为512*512、1024*1024或2048*2048的16位的灰度或彩色图像,k1和k2的取值范围均为大于1且小于10。在一个示例中,在一个预处理后的图像中,依据亮斑的预期大小设置k1=k2=3;在另一个示例中,设置k1=k2=5。
在一个示例中,候选亮斑的判定需要满足的条件包括b),在简化图像中,像素点矩阵的中心像素点的像素值为第二预设值,并且该像素点矩阵的连通像素大于2/3*k1*k2,即中心像素点的像素值大于临界值且连通像素大于矩阵的三分之二。这里,相邻的像素值都为第二预设值的两个或多个像素为所称的相连像素/连通像素(pixel connectivity),例如,简化图像为二值化图像,第一预设值为0,第二预设值为1,如图4所示,加粗加大的表示所称的像素点矩阵的中心,粗线框表示像素点矩阵3*3,即k1=k2=3,该矩阵的中心像素点的像素值为1,连通像素为4,小于2/3*k1*k2=6,该像素点矩阵不满足条件b),非候选亮斑。
在一个示例中,候选亮斑的判定需要满足的条件包括c),在预处理图像中,g2为修正后的m1*m2范围的像素,即为修正后的m1*m2范围像素总和。在一个例子中,依据简化图像相应m1*m2范围中像素值为第二预设值的像素点所占的比例进行修正,例如,如图5所示,m1=m2=5,所称的简化图像相应m1*m2范围中像素值为第二预设值的像素点所占的比例为13/25(13个“1”),修正后的g2为原来的13/25。如此,利于更准确的检测识别亮斑,利于后续亮斑信息的分析读取。
在一些示例中,所称的判定候选亮斑是否为亮斑还包括:基于预处理后的图像确定第二亮斑检测阈值,以及判定像素值不小于第二亮斑检测阈值的候选亮斑为亮斑。在具体示例中,以候选亮斑的坐标所在的像素点的像素值作为该候选亮斑的像素值。通过利用基于预处理后的图像确定的第二亮斑检测阈值对候选亮斑的进一步筛选,能够排除掉至少一部分更可能是图像背景但亮度(强度)和/或形状表现为“亮斑”的亮斑,利于后续基于亮斑的序列的准确识别,提高下机数据的质量。
在一个示例中,可利用重心法获取候选亮斑的坐标,包括亚像素级坐标。利用双线性插值法计算候选亮斑的坐标位置的灰度值。
在某些具体示例中,判定候选亮斑是否为亮斑包括:将预处理后的图像划分为预定大小的一组区域(block),对该区域中的像素点的像素值进行排序,以确定该区域对应的第二亮斑检测阈值;对于位于区域的候选亮斑,判定像素值不小于该区域对应的第二亮斑检测阈值的候选亮斑为亮斑。如此,区分图像的不同区域的差异比如光强的整体落差,分开进行亮斑的进一步检测识别,利于准确识别亮斑并且获得更多的亮斑。
所称的将预处理后的图像划分为预定大小的一组区域(block),block之间可以有重叠也可以没有重叠。在一个示例中,block之间没有重叠。在一些实施例中,预处理后的图像的大小不小于512*512,例如为512*512、1024*1024、1800*1800或者2056*2056等,所称预定大小的区域可以设为为200*200。如此,利于快速计算判断识别亮斑。
在一些实施例中,确定该区域对应的第二亮斑检测阈值时,对每个block中的像素点的像素值按大小进行升序排列,取p10+(p10-p1)*4.1作为该block对应的第二亮斑检测阈值,即该block的背景,p1表示第百分之一分位的像素值,p10表示第百分之十分位的像素值。该阈值是发明人通过大量数据训练测试得出的较为稳定的阈值,能够消除大量背景上的亮斑。可以理解地,当光学系统调整,图像整体像素分布发生改变时,此阈值可能需要适当调整。图6为进行该处理之前和之后的亮斑检测结果对比示意图,即排除掉区域背景前后的亮斑检测结果示意图,图6的上半部分为作该处理后的亮斑检测结果、下半部分为不作该处理的亮斑检测结果,十字标记的为候选亮斑或亮斑。
本发明的实施方式还提供一种碱基识别方法,包括将获自碱基延伸反应的图像上的亮斑匹配到对应测序模板的亮斑集合,依据匹配上的亮斑进行碱基识别,获自碱基延伸反应的图像对应的视野中存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在获自碱基延伸反应的图像上表现为亮斑,对应测序模板的亮斑集合通过上述任一实施例中的基于图像构建测序模板的方法获得。
上述对任一实施方式中的基于图像构建测序模板的方法的技术特征和优点的描述,同样适用本发明这一实施方式中的碱基识别方法,在此不再赘述。
具体地,可以利用遍历的方式将获自碱基延伸反应的图像上的亮斑与构建的亮斑集合进行匹配。在某些具体实施方式中,对应测序模板的亮斑集合中存在与获自碱基延伸反应的图像上的任一亮斑的距离小于第三预定像素,则判定获自碱基延伸反应的图像上的该亮斑匹配上对应测序模板的亮斑集合。在一个示例中,所称的第三预定像素为2。如此,能够实现碱基的准确识别,获得模板的部分碱基序列(读段)。
上述在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的序列表,可以具体实现在任何计算机可读存储介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读存储介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读存储介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读存储介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
本发明的实施方式还提供一种基于图像构建测序模板的装置100,如图7所示,用于实施上述本发明任一实施例中的基于图像构建测序模板的方法,所称的图像包括分别对应A/U、T、G和C四种碱基延伸反应时的一个相同视野的第一图像、第二图像、第三图像和第四图像,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,定义顺序和/或同时实现一次四种类型碱基延伸反应为一轮测序反应,第一图像包括图像M1和图像M2,第二图像包括图像N1和图像N2,第三图像包括图像P1和图像P2,第四图像包括图像Q1和图像Q2,图像M1和图像M2分别来自两轮测序反应,图像N1和图像N2分别来自两轮测序反应,图像P1和图像P2分别来自两轮测序反应,图像Q1和图像Q2分别来自两轮测序反应,该装置包括:组合单元110,用于组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与该组合,获得包含第一重合亮斑的多个组合图像,在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;合并单元130,用于合并来自所述组合单元的多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合。
上述对本发明任一实施例中的基于图像构建测序模板的方法的技术特征和优点的描述,同样适用本发明这一实施方式中的装置100,在此不再赘述。
例如,在合并单元130中,合并多个组合图像上的第一重合亮斑,包括对不同组合图像中的第一重合亮斑进行一次或多次匹配,以获得对应测序模板的亮斑集合。
在一些示例中,图像M1、图像N1、图像P1和图像Q1为顺序获得,图像M2、图像N2、图像P2和图像Q2为顺序获得,组合单元130用于:间隔S个图像对图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2进行两两组合,获得K个组合图像以及对组合图像上的亮斑进行匹配,弃组合图像上的非重合亮斑,S为整数,0≤S≤Smax,Smax=参与组合的图像总数-4,K=[(参与组合的图像总数-S-1)+1]*(参与组合的图像总数-S-1)/2。
在一些示例中,图像为经过配准的图像。
具体地,装置100还包括配准单元108,配准单元用于图像配准,配准单元包括第一配准模块和第二配准模块,第一配准模块用于基于参考图像对待配准图像进行第一配准,参考图像和待配准图像对应相同视野,包括确定待配准图像上的预定区域和参考图像上的相应预定区域的第一偏移量,基于第一偏移量移动待配准图像上的所有亮斑,获得第一配准后的待配准图像;第二配准模块用于基于参考图像对第一配准后的待配准图像进行第二配准,包括合并第一配准后的待配准图像和参考图像,获得合并图像,计算合并图像上的预定区域的所有第二重合亮斑的偏移量,以确定第二偏移量,在合并图像上的距离小于第二预定像素的两个或多个亮斑为一个第二重合亮斑,以及基于该第二偏移量移动第一配准后的待配准图像上的所有亮斑,以实现对待配准图像的配准。
在一些示例中,参考图像通过构建获得,配准单元108还包括参考图像构建模块,参考图像构建模块用于:获取第五图像和第六图像,第五图像和第六图像与待配准图像对应相同视野;基于第五图像对第六图像进行粗配准,包括确定第六图像和第五图像的偏移量,基于该偏移量移动第六图像,获得粗配准后的第六图像;合并第五图像和粗配准后的第六图像,以获得参考图像。
在一些示例中,在利用参考图像构建模块构建参考图像时,还包括利用第七图像和第八图像,第七图像和第八图像与待配准图像来自测序反应的相同视野,第五图像、第六图像、第七图像和第八图像分别对应A/U、T、G和C四种类型碱基延伸反应时的视野,构建参考图像还包括:基于第五图像对第七图像进行粗配准,包括确定第七图像和第五图像的偏移量,基于该偏移量移动第七图像,获得粗配准后的第七图像;基于第五图像对第八图像进行粗配准,包括确定第八图像和第五图像的偏移量,基于该偏移量移动第八图像,获得粗配准后的第八图像;合并第五图像和粗配准后的第六图像、粗配准后的第七图像以及粗配准后的第八图像,以获得参考图像。
在一些示例中,参考图像和待配准图像为二值化图像。
在一些示例中,利用二维离散傅里叶变换确定第一偏移量、第六图像和第五图像的偏移量、第七图像和第五图像的偏移量和/或第八图像和第五图像的偏移量。
在一些示例中,装置100还包括亮斑检测单元106,亮斑检测单元106用于:预处理图像,获得预处理后的图像;确定临界值以简化预处理后的图像,包括对小于临界值的预处理后的图像上的像素点的像素值赋值为第一预设值,对不小于临界值的预处理后的图像上的像素点的像素值赋值为第二预设值,以获得简化图像;基于预处理后的图像确定第一亮斑检测阈值c1;基于预处理后的图像和简化图像识别图像上的候选亮斑,包括判定满足以下a)-c)中至少两个条件的像素点矩阵为一个候选亮斑,a)在预处理后的图像中,像素点矩阵的中心像素点的像素值为最大,像素点矩阵可表示为k1*k2,k1和k2均为大于1的奇数,k1*k2像素点矩阵包含k1*k2个像素点,b)在简化图像中,像素点矩阵的中心像素点的像素值为第二预设值并且像素点矩阵的连通像素大于2/3*k1*k2,以及c)在预处理后的图像中的像素点矩阵的中心像素点的像素值大于第三预设值,并且满足g1*g2>c1,g1为以像素点矩阵的中心像素点为中心的m1*m2范围的二维高斯分布的相关系数,g2为该m1*m2范围的像素,m1和m2均为大于1的奇数,m1*m2范围包含m1*m2个像素点。
在一些示例中,亮斑检测单元106还包括用于判定候选亮斑是否为亮斑,包括:基于预处理后的图像确定第二亮斑检测阈值,以及判定像素值不小于第二亮斑检测阈值的候选亮斑为亮斑。
在一些示例中,候选亮斑的像素值为该候选亮斑的坐标所在的像素点的像素值。
在一些示例中,在亮斑检测单元106中判定候选亮斑是否为亮斑包括:将预处理后的图像划分为预定大小的一组区域,对该区域中的像素点的像素值进行排序,以确定该区域对应的第二亮斑检测阈值,对于位于区域的候选亮斑,判定像素值不小于该区域对应的第二亮斑检测阈值的候选亮斑为亮斑。
在一些示例中,在亮斑检测单元106中预处理图像,包括:利用开运算确定图像的背景,基于背景,利用顶帽运算将图像转化为第一图像,对第一图像进行高斯模糊处理,获得第二图像,对第二图像进行锐化,获得预处理后的图像。
在一些示例中,在亮斑检测单元106中确定临界值以简化预处理后的图像,获得简化图像,包括:基于背景和预处理后的图像,确定临界值,比较预处理后的图像上的像素点的像素值与临界值,以获得简化图像。
在一些示例中,g2为修正后的m1*m2范围的像素,依据简化图像相应m1*m2范围中像素值为第二预设值的像素点所占的比例进行修正。
本发明的实施方式还提供一种碱基识别装置1000,该装置用以实现上述本发明任一具体实施方式中的碱基识别方法,该装置1000用于将获自碱基延伸反应的图像上的亮斑匹配到对应测序模板的亮斑集合,依据匹配上的亮斑进行碱基识别,获自碱基延伸反应的图像对应的视野中存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在获自碱基延伸反应的图像上表现为亮斑,对应测序模板的亮斑集合通过上述任一实施例中的基于图像构建测序模板的方法和/或基于图像构建测序模板的装置构建。
具体地,在碱基识别装置1000中,对应测序模板的亮斑集合中存在与获自碱基延伸反应的图像上的任一亮斑的距离小于第三预定像素,则判定获自碱基延伸反应的图像上的该亮斑匹配上对应测序模板的亮斑集合。
依据本发明的实施方式,还提供一种计算机程序产品,该产品包括实现基于图像构建测序模板的指令,指令在计算机执行程序时,使计算机执行上述本发明任一具体实施方式中的基于图像构建测序模板的方法。
依据本发明的实施方式,还提供另一种计算机程序产品,该产品包括实现碱基识别的指令,指令在计算机执行程序时,使计算机执行上述本发明任一具体实施方式中的碱基识别方法。
本领域技术人员知晓,除了以纯计算机可读程序代码方式实现控制器/处理器外,完全可以通过将方法步骤进行逻辑变成来使得控制器以逻辑门、开关、专用集成电路、可编辑逻辑控制器和嵌入微控制器等的形式来实现相同的功能。因此,这种控制器/处理器可以被认为是一种硬件部件,而对其内包括的用于实现各种功能的装置也可以视为硬件部件内的结构。或者甚至,可以将用于实现各种功能的装置视为既可以是实现方法的的软件模块又可以是硬件部件内的结构。
在本说明书的描述中,一个实施方式、一些实施方式、一个或一些具体实施方式、一个或一些实施例、示例等的描述意指结合该实施方式或示例描述的具体特征、结构或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构等特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管已经示出和描述了本发明的实施例,本领域的普通技术人员可以理解:在不脱离本发明的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由权利要求及其等同限定。

Claims (38)

  1. 一种基于图像构建测序模板的方法,其特征在于,图像包括分别对应A/U、T、G和C四种碱基延伸反应时的一个相同视野的第一图像、第二图像、第三图像和第四图像,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,定义顺序和/或同时实现一次四种类型碱基延伸反应为一轮测序反应,
    第一图像包括图像M1和图像M2,第二图像包括图像N1和图像N2,第三图像包括图像P1和图像P2,第四图像包括图像Q1和图像Q2,
    图像M1和图像M2分别来自两轮测序反应,图像N1和图像N2分别来自两轮测序反应,图像P1和图像P2分别来自两轮测序反应,图像Q1和图像Q2分别来自两轮测序反应,方法包括:
    组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与组合,获得包含第一重合亮斑的多个组合图像,在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;
    合并多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合。
  2. 权利要求1的方法,其特征在于,合并多个组合图像上的第一重合亮斑,包括对不同组合图像中的第一重合亮斑进行一次或多次匹配,以获得对应测序模板的亮斑集合。
  3. 权利要求1的方法,其特征在于,图像M1、图像N1、图像P1和图像Q1为顺序获得,图像M2、图像N2、图像P2和图像Q2为顺序获得,
    组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与组合,获得包含第一重合亮斑的多个组合图像,包括:
    间隔S个图像对图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2进行两两组合,获得K个组合图像以及对组合图像上的亮斑进行匹配,弃组合图像上的非重合亮斑,S为整数,0≤S≤S max,S max=参与组合的图像总数-4。
  4. 权利要求1-3任一方法,其特征在于,图像为经过配准的图像。
  5. 权利要求4的方法,其特征在于,配准图像,包括:
    基于参考图像对待配准图像进行第一配准,参考图像和待配准图像对应相同视野,包括,
    确定待配准图像上的预定区域和参考图像上的相应预定区域的第一偏移量,基于第一偏移量移动待配准图像上的所有亮斑,获得第一配准后的待配准图像;
    基于参考图像对第一配准后的待配准图像进行第二配准,包括,
    合并第一配准后的待配准图像和参考图像,获得合并图像,
    计算合并图像上的预定区域的所有第二重合亮斑的偏移量,以确定第二偏移量,在合并图像上的距离小于第二预定像素的两个或多个亮斑为一个第二重合亮斑,
    基于该第二偏移量移动第一配准后的待配准图像上的所有亮斑,以实现对待配准图像的配准。
  6. 权利要求5的方法,其特征在于,参考图像通过构建获得,构建参考图像包括:
    获取第五图像和第六图像,第五图像和第六图像与待配准图像对应相同视野;
    基于第五图像对第六图像进行粗配准,包括确定第六图像和第五图像的偏移量,基于该偏移量移动第六图像,获得粗配准后的第六图像;
    合并第五图像和粗配准后的第六图像,以获得参考图像。
  7. 权利要求6的方法,其特征在于,构建参考图像还包括利用第七图像和第八图像,第七图像和第八图像与待配准图像来自测序反应的相同视野,第五图像、第六图像、第七图像和第八图像分别对应A/U、T、G和C四种类型碱基延伸反应时的视野,构建参考图像还包括:
    基于第五图像对第七图像进行粗配准,包括确定第七图像和第五图像的偏移量,基于该偏移量移动第七图像,获得粗配准后的第七图像;
    基于第五图像对第八图像进行粗配准,包括确定第八图像和第五图像的偏移量,基于该偏移量移动第八图像,获得粗配准后的第八图像;
    合并第五图像和粗配准后的第六图像、粗配准后的第七图像以及粗配准后的第八图像,以获得参考图像。
  8. 权利要求5-7任一方法,其特征在于,参考图像和待配准图像为二值化图像。
  9. 权利要求5-8任一方法,其特征在于,利用二维离散傅里叶变换确定第一偏移量、第六图像和第五图像的偏移量、第七图像和第五图像的偏移量和/或第八图像和第五图像的偏移量。
  10. 权利要求1-9任一方法,其特征在于,还包括检测图像上的亮斑,包括:
    预处理图像,获得预处理后的图像;
    确定临界值以简化预处理后的图像,包括对小于临界值的预处理后的图像上的像素点的像素值赋值为第一预设值,对不小于临界值的预处理后的图像上的像素点的像素值赋值为第二预设值,以获得简化图像;
    基于预处理后的图像确定第一亮斑检测阈值c1;
    基于预处理后的图像和简化图像识别图像上的候选亮斑,包括判定满足以下a)-c)中至少两个条件的像素点矩阵为一个候选亮斑,
    a)在预处理后的图像中,像素点矩阵的中心像素点的像素值为最大,像素点矩阵可表示为r1*r2,r1和r2均为大于1的奇数,r1*r2像素点矩阵包含r1*r2个像素点,
    b)在简化图像中,像素点矩阵的中心像素点的像素值为第二预设值并且像素点矩阵的连通像素大于2/3*r1*r2,以及
    c)在预处理后的图像中的像素点矩阵的中心像素点的像素值大于第三预设值,并且满足g1*g2>c1,g1为以像素点矩阵的中心像素点为中心的m1*m2范围的二维高斯分布的相关系数,g2为该m1*m2范围的像素,m1和m2均为大于1的奇数,m1*m2范围包含m1*m2个像素点。
  11. 权利要求10的方法,其特征在于,还包括判定候选亮斑是否为亮斑,包括:
    基于预处理后的图像确定第二亮斑检测阈值,以及
    判定像素值不小于第二亮斑检测阈值的候选亮斑为亮斑。
  12. 权利要求11的方法,其特征在于,候选亮斑的像素值为该候选亮斑的坐标所在的像素点的像素值。
  13. 权利要求11或12的方法,其特征在于,判定候选亮斑是否为亮斑包括:
    将预处理后的图像划分为预定大小的一组区域,
    对该区域中的像素点的像素值进行排序,以确定该区域对应的第二亮斑检测阈值,
    对于位于区域的候选亮斑,判定像素值不小于该区域对应的第二亮斑检测阈值的候选亮斑为亮斑。
  14. 权利要求10-13任一方法,其特征在于,预处理图像,包括:
    利用开运算确定图像的背景,
    基于背景,利用顶帽运算将图像转化为第一图像,
    对第一图像进行高斯模糊处理,获得第二图像,
    对第二图像进行锐化,获得预处理后的图像。
  15. 权利要求14的方法,其特征在于,确定临界值以简化预处理后的图像,获得简化图像,包括:
    基于背景和预处理后的图像,确定临界值,
    比较预处理后的图像上的像素点的像素值与临界值,以获得简化图像。
  16. 权利要求10-15任一方法,其特征在于,g2为修正后的m1*m2范围的像素,依据简化图像相应m1*m2范围中像素值为第二预设值的像素点所占的比例进行修正。
  17. 一种碱基识别方法,其特征在于,包括将获自碱基延伸反应的图像上的亮斑匹配到对应测序模板的亮斑集合,依据匹配上的亮斑进行碱基识别,获自碱基延伸反应的图像对应的视野中存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在获自碱基延伸反应的图像上表现为亮斑,对应测序模板的亮斑集合通过权利要求1-16任一方法构建。
  18. 权利要求17的方法,其特征在于,对应测序模板的亮斑集合中存在与获自碱基延伸反应的图像上的任一亮斑的距离小于第三预定像素,则判定获自碱基延伸反应的图像上的该亮斑匹配上对应测序模板的亮斑集合。
  19. 一种基于图像构建测序模板的装置,其特征在于,所述图像包括分别对应A/U、T、G和C四种碱基延伸反应时的一个相同视野的第一图像、第二图像、第三图像和第四图像,碱基延伸反应时的该视野存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在图像上表现为亮斑,定义顺序和/或同时实现一次四种类型碱基延伸反应为一轮测序反应,
    第一图像包括图像M1和图像M2,第二图像包括图像N1和图像N2,第三图像包括图像P1和图像P2,第四图像包括图像Q1和图像Q2,
    图像M1和图像M2分别来自两轮测序反应,图像N1和图像N2分别来自两轮测序反应,图像P1和图像P2分别来自两轮测序反应,图像Q1和图像Q2分别来自两轮测序反应,该装置包括:
    组合单元,用于组合图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2中的任两图像以进行亮斑匹配,并且使图像M1、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2均至少一次参与该组合,获得包含第一重合亮斑的多个组合图像,在组合图像上距离小于第一预定像素的两个或多个亮斑为一个第一重合亮斑;
    合并单元,用于合并来自所述组合单元的多个组合图像上的第一重合亮斑,以获得一个对应测序模板的亮斑集合。
  20. 权利要求19的装置,其特征在于,在所述合并单元中,合并多个组合图像上的第一重合亮斑,包括对不同组合图像中的第一重合亮斑进行一次或多次匹配,以获得对应测序模板的亮斑集合。
  21. 权利要求19的装置,其特征在于,图像M1、图像N1、图像P1和图像Q1为顺序获得,图像M2、图像N2、图像P2和图像Q2为顺序获得,所述组合单元用于:
    间隔S个图像对图像M1、图像M2、图像N1、图像N2、图像P1、图像P2、图像Q1和图像Q2进行两两组合,获得K个组合图像以及对组合图像上的亮斑进行匹配,弃组合图像上的非重合亮斑,S为整数,0≤S≤S max,S max=参与组合的图像总数-4。
  22. 权利要求19-21任一装置,其特征在于,所述图像为经过配准的图像。
  23. 权利要求22的装置,其特征在于,还包括配准单元,所述配准单元用于图像配准,所述配准单元包括第一配准模块和第二配准模块,
    所述第一配准模块用于基于参考图像对待配准图像进行第一配准,参考图像和待配准图像对应相同视野,包括确定待配准图像上的预定区域和参考图像上的相应预定区域的第一偏移量,基于第一偏移量移动待配准图像上的所有亮斑,获得第一配准后的待配准图像;
    所述第二配准模块用于基于参考图像对第一配准后的待配准图像进行第二配准,包括合并第一配准后的待配准图像和参考图像,获得合并图像,计算合并图像上的预定区域的所有第二重合亮斑的偏移量,以确定第二偏移量,在合并图像上的距离小于第二预定像素的两个或多个亮斑为一个第二重合亮斑,以及基于该第二偏移量移动第一配准后的待配准图像上的所有亮斑,以实现对待配准图像的配准。
  24. 权利要求22的装置,其特征在于,所述参考图像通过构建获得,所述配准单元还包括参考图像构建模块,所述参考图像构建模块用于:
    获取第五图像和第六图像,第五图像和第六图像与待配准图像对应相同视野;
    基于第五图像对第六图像进行粗配准,包括确定第六图像和第五图像的偏移量,基于该偏移量移动第六图像,获得粗配准后的第六图像;
    合并第五图像和粗配准后的第六图像,以获得参考图像。
  25. 权利要求24的装置,其特征在于,在利用所述参考图像构建模块构建参考图像时,还包括利用第七图像和第八图像,第七图像和第八图像与待配准图像来自测序反应的相同视野,第五图像、第六图像、第七图像和第八图像分别对应A/U、T、G和C四种类型碱基延伸反应时的视野,构建参考图像还包括:
    基于第五图像对第七图像进行粗配准,包括确定第七图像和第五图像的偏移量,基于该偏移量移动第七图像,获得粗配准后的第七图像;
    基于第五图像对第八图像进行粗配准,包括确定第八图像和第五图像的偏移量,基于该偏移量移动第八图像,获得粗配准后的第八图像;
    合并第五图像和粗配准后的第六图像、粗配准后的第七图像以及粗配准后的第八图像,以获得参考图像。
  26. 权利要求23-25任一装置,其特征在于,参考图像和待配准图像为二值化图像。
  27. 权利要求23-26任一装置,其特征在于,利用二维离散傅里叶变换确定第一偏移量、第六图像和第五图像的偏移量、第七图像和第五图像的偏移量和/或第八图像和第五图像的偏移量。
  28. 权利要求19-27任一装置,其特征在于,还包括亮斑检测单元,所述亮斑检测单元用于:
    预处理图像,获得预处理后的图像;
    确定临界值以简化预处理后的图像,包括对小于临界值的预处理后的图像上的像素点的像素值赋值为第一预设值,对不小于临界值的预处理后的图像上的像素点的像素值赋值为第二预设值,以获得简化图像;
    基于预处理后的图像确定第一亮斑检测阈值c1;
    基于预处理后的图像和简化图像识别图像上的候选亮斑,包括判定满足以下a)-c)中至少两个条件的像素点矩阵为一个候选亮斑,
    a)在预处理后的图像中,像素点矩阵的中心像素点的像素值为最大,像素点矩阵可表示为r1*r2,r1和r2均为大于1的奇数,r1*r2像素点矩阵包含r1*r2个像素点,
    b)在简化图像中,像素点矩阵的中心像素点的像素值为第二预设值并且像素点矩阵的连通像素大于2/3*r1*r2,以及
    c)在预处理后的图像中的像素点矩阵的中心像素点的像素值大于第三预设值,并且满足g1*g2>c1,g1为以像素点矩阵的中心像素点为中心的m1*m2范围的二维高斯分布的相关系数,g2为该m1*m2范围的像素,m1和m2均为大于1的奇数,m1*m2范围包含m1*m2个像素点。
  29. 权利要求28的装置,其特征在于,所述亮斑检测单元还包括用于判定候选亮斑是否为亮斑,包括:
    基于预处理后的图像确定第二亮斑检测阈值,以及
    判定像素值不小于第二亮斑检测阈值的候选亮斑为亮斑。
  30. 权利要求29的装置,其特征在于,候选亮斑的像素值为该候选亮斑的坐标所在的像素点的像素值。
  31. 权利要求29或30的装置,其特征在于,在所述亮斑检测单元中判定候选亮斑是否为亮斑包括:
    将预处理后的图像划分为预定大小的一组区域,
    对该区域中的像素点的像素值进行排序,以确定该区域对应的第二亮斑检测阈值,
    对于位于区域的候选亮斑,判定像素值不小于该区域对应的第二亮斑检测阈值的候选亮斑为亮斑。
  32. 权利要求28-31任一装置,其特征在于,在所述亮斑检测单元中预处理图像,包括:
    利用开运算确定图像的背景,
    基于背景,利用顶帽运算将图像转化为第一图像,
    对第一图像进行高斯模糊处理,获得第二图像,
    对第二图像进行锐化,获得预处理后的图像。
  33. 权利要求28的装置,其特征在于,在所述亮斑检测单元中确定临界值以简化预处理后的图像,获得简化图像,包括:
    基于背景和预处理后的图像,确定临界值,
    比较预处理后的图像上的像素点的像素值与临界值,以获得简化图像。
  34. 权利要求28-33任一装置,其特征在于,g2为修正后的m1*m2范围的像素,依据简化图像相应m1*m2范围中像素值为第二预设值的像素点所占的比例进行修正。
  35. 一种碱基识别装置,其特征在于,该装置用于将获自碱基延伸反应的图像上的亮斑匹配到对应测序模板的亮斑集合,依据匹配上的亮斑进行碱基识别,获自碱基延伸反应的图像对应的视野中存在多个带有光学可检测标记的核酸分子,至少一部分核酸分子在获自碱基延伸反应的图像上表现为亮斑,对应测序模板的亮斑集合通过权利要求19-34任一装置构建。
  36. 权利要求35的装置,其特征在于,对应测序模板的亮斑集合中存在与获自碱基延伸反应的图像上的任一亮斑的距离小于第三预定像素,则判定获自碱基延伸反应的图像上的该亮斑匹配上对应测序模板的亮斑集合。
  37. 一种计算机程序产品,包括指令,所述指令在所述计算机执行所述程序时,使所述计算机执行权利要求1-16任一方法。
  38. 一种计算机程序产品,包括指令,所述指令在所述计算机执行所述程序时,使所述计算机执行权利要求17或18的方法。
PCT/CN2018/101819 2018-08-22 2018-08-22 基于图像构建测序模板的方法、碱基识别方法和装置 WO2020037574A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18930701.0A EP3843033A4 (en) 2018-08-22 2018-08-22 METHOD FOR CONSTRUCTION OF A SEQUENCING TEMPLATE BASED ON AN IMAGE AND BASIC DETECTION METHOD AND DEVICE
US17/270,418 US11170506B2 (en) 2018-08-22 2018-08-22 Method for constructing sequencing template based on image, and base recognition method and device
PCT/CN2018/101819 WO2020037574A1 (zh) 2018-08-22 2018-08-22 基于图像构建测序模板的方法、碱基识别方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/101819 WO2020037574A1 (zh) 2018-08-22 2018-08-22 基于图像构建测序模板的方法、碱基识别方法和装置

Publications (1)

Publication Number Publication Date
WO2020037574A1 true WO2020037574A1 (zh) 2020-02-27

Family

ID=69592157

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/101819 WO2020037574A1 (zh) 2018-08-22 2018-08-22 基于图像构建测序模板的方法、碱基识别方法和装置

Country Status (3)

Country Link
US (1) US11170506B2 (zh)
EP (1) EP3843033A4 (zh)
WO (1) WO2020037574A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3843032A4 (en) * 2018-08-22 2021-09-15 GeneMind Biosciences Company Limited METHOD AND DEVICE FOR IMAGE REGISTRATION AND COMPUTER PROGRAM PRODUCT

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150317433A1 (en) * 2014-04-30 2015-11-05 Complete Genomics, Inc. Using doublet information in genome mapping and assembly
CN105524827A (zh) * 2015-12-02 2016-04-27 北京中科紫鑫科技有限责任公司 一种具有联动调整的dna图像采集测序系统
CN105551034A (zh) * 2015-12-10 2016-05-04 北京中科紫鑫科技有限责任公司 一种dna测序的图像识别的预处理方法及装置
CN106295124A (zh) * 2016-07-27 2017-01-04 广州麦仑信息科技有限公司 利用多种图像检测技术综合分析基因子图相似概率量的方法
CN107945150A (zh) * 2016-10-10 2018-04-20 深圳市瀚海基因生物科技有限公司 基因测序的图像处理方法及系统
CN108229098A (zh) * 2016-12-09 2018-06-29 深圳市瀚海基因生物科技有限公司 单分子的识别、计数方法及装置

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7732138B2 (en) * 2001-11-07 2010-06-08 Diagcor Bioscience Incorporation Limited Rapid genotyping analysis and the device thereof
JP5297207B2 (ja) * 2006-03-10 2013-09-25 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ スペクトル分析を介したdnaパターンの同定方法及びシステム
JP2007315772A (ja) 2006-05-23 2007-12-06 Canon Inc 蛍光検出装置および生化学反応分析装置
WO2008005361A2 (en) * 2006-06-30 2008-01-10 Jpl Llc Embedded data dna sequence security system
US8617811B2 (en) * 2008-01-28 2013-12-31 Complete Genomics, Inc. Methods and compositions for efficient base calling in sequencing reactions
CN101206116B (zh) 2007-12-07 2010-08-18 北京机械工业学院 目标点全局自动定位方法
JP5499732B2 (ja) 2009-06-23 2014-05-21 ソニー株式会社 生体サンプル像取得装置、生体サンプル像取得方法及び生体サンプル像取得プログラム
US8965076B2 (en) * 2010-01-13 2015-02-24 Illumina, Inc. Data processing system and methods
CN101950419B (zh) 2010-08-26 2012-09-05 西安理工大学 同时存在平移和旋转情况下的快速图像配准方法
CN102174384B (zh) 2011-01-05 2014-04-02 深圳华因康基因科技有限公司 对基因测序仪的测序及信号处理进行控制的方法及系统
JP5413408B2 (ja) 2011-06-09 2014-02-12 富士ゼロックス株式会社 画像処理装置、プログラム及び画像処理システム
CN102354398A (zh) 2011-09-22 2012-02-15 苏州大学 基于密度中心与自适应的基因芯片处理方法
CN102663720B (zh) 2012-03-31 2014-06-04 哈尔滨工业大学 一种基于最小均方误差准则的图像拼接方法
KR101348680B1 (ko) 2013-01-09 2014-01-09 국방과학연구소 영상추적기를 위한 표적포착방법 및 이를 이용한 표적포착장치
US20140349281A1 (en) 2013-05-22 2014-11-27 Sunpower Technologies Llc System and Method for Dispensing Barcoded Solutions
CN104297249A (zh) 2014-09-15 2015-01-21 浙江大学 基于心肌细胞传感器的药物心脏毒性检测分析方法
CN104318568B (zh) 2014-10-24 2017-07-28 武汉华目信息技术有限责任公司 一种图像配准的方法和系统
CN104376537B (zh) 2014-11-25 2018-01-30 中国兵器工业集团第二一四研究所苏州研发中心 一种去除emccd图像亮点的方法
JP7092503B2 (ja) 2014-12-30 2022-06-28 ベンタナ メディカル システムズ, インコーポレイテッド 共発現解析のためのシステム及び方法
CN105039147B (zh) 2015-06-03 2016-05-04 西安交通大学 一种高通量基因测序碱基荧光图像捕获系统装置及方法
CN105205788B (zh) 2015-07-22 2018-06-01 哈尔滨工业大学深圳研究生院 一种针对高通量基因测序图像的去噪方法
CN105389581B (zh) 2015-10-15 2019-08-06 哈尔滨工程大学 一种胚芽米胚芽完整度智能识别系统及其识别方法
CN105303533B (zh) 2015-11-03 2018-11-30 华中科技大学 一种超声图像滤波方法
CN105741266B (zh) 2016-01-22 2018-08-21 北京航空航天大学 一种病理图像细胞核快速定位方法
US11347965B2 (en) * 2019-03-21 2022-05-31 Illumina, Inc. Training data generation for artificial intelligence-based sequencing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150317433A1 (en) * 2014-04-30 2015-11-05 Complete Genomics, Inc. Using doublet information in genome mapping and assembly
CN105524827A (zh) * 2015-12-02 2016-04-27 北京中科紫鑫科技有限责任公司 一种具有联动调整的dna图像采集测序系统
CN105551034A (zh) * 2015-12-10 2016-05-04 北京中科紫鑫科技有限责任公司 一种dna测序的图像识别的预处理方法及装置
CN106295124A (zh) * 2016-07-27 2017-01-04 广州麦仑信息科技有限公司 利用多种图像检测技术综合分析基因子图相似概率量的方法
CN107945150A (zh) * 2016-10-10 2018-04-20 深圳市瀚海基因生物科技有限公司 基因测序的图像处理方法及系统
CN108229098A (zh) * 2016-12-09 2018-06-29 深圳市瀚海基因生物科技有限公司 单分子的识别、计数方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KENJI TAKITA ET AL., IEICE TRANS. FUNDAMENTALS, vol. E86-A, no. 8, August 2003 (2003-08-01)
See also references of EP3843033A4

Also Published As

Publication number Publication date
US20210217171A1 (en) 2021-07-15
EP3843033A4 (en) 2021-11-24
US11170506B2 (en) 2021-11-09
EP3843033A1 (en) 2021-06-30

Similar Documents

Publication Publication Date Title
WO2021030952A1 (zh) 碱基识别方法、系统、计算机程序产品和测序系统
CN107945150B (zh) 基因测序的图像处理方法及系统及计算机可读存储介质
WO2020037573A1 (zh) 检测图像上的亮斑的方法、装置和计算机程序产品
CN102426649B (zh) 一种简单的高准确率的钢印数字自动识别方法
WO2020037572A1 (zh) 检测图像上的亮斑的方法和装置、图像配准方法和装置
JP6791245B2 (ja) 画像処理装置、画像処理方法及び画像処理プログラム
CN109117796B (zh) 碱基识别方法及装置、生成彩色图像的方法及系统
US20100034444A1 (en) Image analysis
CN112289377B (zh) 检测图像上的亮斑的方法、装置和计算机程序产品
CN113012757B (zh) 识别核酸中的碱基的方法和系统
WO2019196019A1 (zh) 荧光图像配准方法、基因测序仪及系统、存储介质
CN112289381B (zh) 基于图像构建测序模板的方法、装置和计算机产品
CN110246139B (zh) 基于双阈值的浮游生物原位图像roi快速提取方法
WO2020037574A1 (zh) 基于图像构建测序模板的方法、碱基识别方法和装置
WO2020037571A1 (zh) 基于图像构建测序模板的方法、装置和计算机程序产品
WO2020037570A1 (zh) 图像配准方法、装置和计算机程序产品
CN112712058A (zh) 一种字符识别提取方法
CN112285070B (zh) 检测图像上的亮斑的方法和装置、图像配准方法和装置
CN112288783B (zh) 基于图像构建测序模板的方法、碱基识别方法和装置
CN114972240A (zh) 一种数字病理学图像缺失组织的自动检测及量化方法
CN113128500A (zh) 一种基于Mask-RCNN的非机动车车牌识别方法及系统
CN112288781A (zh) 图像配准方法、装置和计算机程序产品
CN109087290B (zh) 基于光谱估计与电子分光技术的光学元件表面疵病检测方法
JP2005250786A (ja) 画像認識方法
Marcuzzo et al. Automatic cell segmentation from confocal microscopy images of the Arabidopsis root

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18930701

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018930701

Country of ref document: EP

Effective date: 20210322