WO2022188696A1 - 自动识别和分析胶图的方法 - Google Patents

自动识别和分析胶图的方法 Download PDF

Info

Publication number
WO2022188696A1
WO2022188696A1 PCT/CN2022/079171 CN2022079171W WO2022188696A1 WO 2022188696 A1 WO2022188696 A1 WO 2022188696A1 CN 2022079171 W CN2022079171 W CN 2022079171W WO 2022188696 A1 WO2022188696 A1 WO 2022188696A1
Authority
WO
WIPO (PCT)
Prior art keywords
mean
band
image
lane
indel
Prior art date
Application number
PCT/CN2022/079171
Other languages
English (en)
French (fr)
Inventor
李英连
盛夏
Original Assignee
南京金斯瑞生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京金斯瑞生物科技有限公司 filed Critical 南京金斯瑞生物科技有限公司
Publication of WO2022188696A1 publication Critical patent/WO2022188696A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N27/00Investigating or analysing materials by the use of electric, electrochemical, or magnetic means
    • G01N27/26Investigating or analysing materials by the use of electric, electrochemical, or magnetic means by investigating electrochemical variables; by using electrolysis or electrophoresis
    • G01N27/416Systems
    • G01N27/447Systems using electrophoresis

Definitions

  • the present invention relates to electrophoresis band technology in the field of biochemistry, and more particularly to a method for automatically identifying and analyzing gel images.
  • Electrophoresis or electrophoresis banding, is now widely used in biology and medicine.
  • Electrophoretic bands can be visualized in the form of a gel image by optical instruments. As a result, researchers can identify and analyze the gel map, specifically the lanes and bands on the gel map. Regardless of the identification of the glue map, or the analysis of the glue map, there are a lot of manual operations today.
  • the manual analysis of the glue map first needs to manually confirm the sample information, judge the expected results of each sample on the glue map, and compare and judge whether the information content matches. Then, according to the judgment result, cut the map and save it, and mark whether there is a problem.
  • This manual operation process not only takes a lot of time, energy and cost, but also has a high probability of error, and there are errors in the judgments of different personnel. This is because, for the same sample, the standard of judgment may be inconsistent, and the interference of human factors is relatively large.
  • the gel image is in the form of a picture, and image processing technology can be added to the process of image processing, that is, the image processing method is used to process the gel image.
  • the present invention proposes a technology for automatically identifying and analyzing the gel map, which can realize the automatic analysis of the gel map and cut the sample gel map.
  • identify the image of the obtained electropherogram of the sample correct the image, and then identify the plasmid, product, and marker lane corresponding to each sample in the image, and determine the position of the corresponding lane in the image.
  • the marker lane mentioned here that is, the Marker lane
  • the ruler can generally be expressed as "Marker" in English, which refers to the marker used to represent the molecular weight of DNA in electrophoresis in molecular biology experiments.
  • Using different Marker types will form ruler lanes with different number and size of bands. For example, using DL3000 DNA Marker will form a total of 8 bands from 100bp to 3000bp, and using 10K DNA Marker will form a total of 500bp to 10000bp. 9-band lanes.
  • the position of the glue hole on each picture By determining the position of the glue hole on each picture, half of the glue hole is reserved when cutting the picture.
  • the sample information analysis stage first determine whether the scale used for the sample adopts the KB standard or the DL3000 standard, then determine the size of each band, and determine the number of bands in the plasmid and product lanes. Since the problem judgment logic and standard are unified, the technical solution of the present invention can automatically judge whether the glue running is correct. Moreover, in response to the incorrect running of the glue, the type of the problem can be specifically distinguished, which can guide the adjustment of subsequent test parameters.
  • a first aspect of the present invention relates to a method of automatically identifying and analyzing a gel image.
  • the method may include the following steps: obtaining an original image of the gel image of the electropherogram; performing image preprocessing on the original image of the gel image; identifying swimming lanes and bands on the gel image that has undergone image preprocessing; Band, determine the plasmid, product and ruler lane of each sample; analyze the plasmid, product, and band in the ruler lane of each sample to judge whether the sample is running the gel correctly or not; The gel image clipping of the sample is preserved.
  • the method further includes: for the samples with incorrect running of glue, judging the reasons for the incorrect running of glue and saving them.
  • the image preprocessing includes: correcting the direction of the image according to the direction of the strips in the original image of the glue image; normalizing the brightness and contrast of the image; removing the background color .
  • the method further comprises: converting the pre-processed gel image into a grayscale image; filtering the grayscale image; converting the grayscale image into a grayscale image using an edge detection algorithm Binary image.
  • identifying the lanes and bands on the image-preprocessed gel image includes: extracting a pixel matrix of the pre-processed gel image; identifying the gel image according to a column summation algorithm The start and end positions of the lanes on the map; the maximum and minimum band positions on the gel map are determined by a row sum algorithm.
  • determining the plasmids, products and scale lanes of each sample includes: (6-1) extracting the plasmids, products and scale lanes of each sample Image pixel matrix of ruler lanes; (6-2) Determine the position of the bands in each lane; (6-3) Calculate the number of bands in each lane; (6-4) Determine the plasmids of each sample , product, and the position of the lane corresponding to the ruler.
  • step (6-2) includes: summing the pixel matrix by row, obtaining the sum of the pixels in each row, and obtaining the row and data; determining the row and data curve, smoothing the row and the data curve, and calculating The smoothed lines and data peak points are obtained; the position of the peak point is recorded as the recorded band position.
  • the above step (6-3) includes: summing the pixel matrix by column, obtaining the sum of the pixels in each column, and obtaining the column and data; determining the column and data curve, smoothing the column and data curve, and calculating The smoothed column and data peak and fall points are obtained; when the number of pixels between the peak and fall points accounts for more than 80% of the total pixels in the lane, it is judged as a banded bar. Band; calculate the number of bands in each lane; wherein, the peak point is the position where the column and the data obtain the maximum value of the second derivative, and the falling peak point is where the column and the data obtain the minimum value of the second derivative Location;
  • x 1 is the pixel column sum of the first column
  • x i is the pixel column sum of the i-th column
  • x n is the pixel column sum of the last column
  • 2 ⁇ i ⁇ n, and i and n are natural numbers.
  • the plasmids, products, and bands in the scale swimming lane of each sample are analyzed, and judging whether the sample runs the gel correctly or not includes: according to the scale used to generate the scale swimming lane.
  • the type determines the position and size of each band in the ruler lane; calculates the size of each band in the plasmid and product lanes, and judges whether the sample runs the gel correctly or not.
  • the above method includes: (10-1) According to the type of the ruler, determine the size of each strip of the ruler, and calculate the distance ratio between the strips of the ruler and the size of the traction force between the strips, and set them as the strips of the ruler. Spacing parameter between the bands; (10-2) Extract the image pixel matrix of the ruler lane, calculate the size of each band in the ruler lane; (10-3) Extract the image pixel matrix of the plasmid lane, calculate the number of bands in the plasmid lane (10-4) Extract the image pixel matrix of the product swimming lane, calculate the number of bands in the product swimming lane and the size of each band, and judge whether the sample runs the gel correctly or not.
  • the above-mentioned step (10-2) includes: determining the size of each band in the ruler swimming lane according to the type of the ruler, the acquired number of bands in the ruler swimming lane and the spacing ratio; when the number of the ruler swimming lane bands is more than When the number of bands that the ruler type should have and the ratio of the upper and lower spacing of the bands is incorrect, it is determined that the band is an error band, and the error band is deleted; When the number of bands is reduced, the smoothness is reduced and the number and size of bands are recalculated.
  • the above step (10-3) includes: calculating the size of each band in the plasmid swimming lane; determining the main band according to the actual sample size;
  • the above step (10-4) includes: calculating the size of each band in the product swimming lane; presetting the number of actual sample bands as a standard quantity, and presetting the size of the actual sample band as a standard parameter;
  • the band size includes the standard parameters and the sum of each standard parameter, it is judged that the enzyme cleavage is incomplete;
  • the band size includes the sum of standard parameters and non-standard parameters, it is determined that there is a miscellaneous band
  • the band in the swimming lane forms a band
  • the band satisfies: (the average value of the pixel row sum of the 10th-100th row after the peak point - the average value of the pixel row sum of the 30 rows before the peak point from the starting point) / (the peak point pixel row When the sum-peak pixel line sum)>30%, it is judged as plasmid cutting paste;
  • the reasons for the incorrect running of the gel include at least one of the following: supercoiling, plasmid degradation, plasmid dropout, genome, size error, incomplete enzyme cleavage, heterobands, cleavage paste.
  • the method may further include: searching for the position of the glue hole; for the sample running the glue correctly, according to the position of the glue hole, cutting the glue map of a single sample according to the requirement of retaining half of the glue hole reserve.
  • a second aspect of the present invention relates to an apparatus for automatically identifying and analyzing a gel image, the apparatus comprising a processor and a memory.
  • a computer program is stored in the memory. The computer program, when executed by a processor, can implement the method according to the first aspect of the present invention.
  • a third aspect of the present invention relates to a computer readable medium.
  • the computer-readable medium may be used to record instructions executable by a processor. The instructions, when executed by the processor, cause the processor to implement the method according to the first aspect of the present invention.
  • gel image identification and analysis can automatically identify and analyze band information with high accuracy, especially for samples that are not uniformly manipulated manually, to achieve uniformity standard.
  • FIG. 1 is a flowchart of a method of automatically identifying and analyzing a gel image according to an embodiment of the present invention.
  • Figure 2 shows an example of a raw gel map.
  • FIG. 3 shows the gel map of the example gel map of FIG. 2 after it has been analyzed and cut.
  • Figure 4 is a schematic diagram of the type of problem that determines the incorrect run of glue in the glue map analysis stage.
  • the present invention is implemented by a computer, thereby excluding various human factors including human judgment, manual operation, and human intervention. At the same time, it avoids the problems that often occur in the manual identification and analysis of glue maps, such as inconsistent judgment standards, incomplete or non-standard records of problem types or causes.
  • FIG. 1 is a flowchart of a method of automatically identifying and analyzing a gel image according to an embodiment of the present invention.
  • FIG. 1 a method for automatically identifying and analyzing a gel image according to an embodiment of the present invention is illustrated by a flowchart of a method 100 .
  • the whole scheme can be divided into two parts: image recognition and band analysis.
  • the entire image needs to be preprocessed first, and then the lanes and bands of each sample are identified finely.
  • step S110 an original image of the gel image of the electropherogram is obtained.
  • Figure 2 shows an example of a raw gel map.
  • image preprocessing is performed on the original picture of the glue map.
  • the image preprocessing may at least include: correcting the direction of the image according to the direction of the strips in the original image of the glue image; normalizing the brightness and contrast of the image; and removing the background color.
  • the image recognition process can be divided into the following parts:
  • Image rotation The image is skewed when taking pictures. According to the direction of the strip, the direction of the picture is automatically corrected. For example, in the original glue map shown in Figure 2, the orientation of the image needs to be adjusted to a position where the strips are all in the horizontal direction. Therefore, the original image needs to be image rotated.
  • Picture brightness and contrast adjustment There is a sequence of shooting times in actual production. With the loss of the camera lamp tube, etc., the brightness and clarity of the captured pictures are inconsistent. Therefore, before doing image recognition, the contrast and brightness of the image should be adjusted, and all images should be adjusted to the same level for analysis. That is, the so-called normalization or normalization processing in image processing.
  • Remove background color The background color of the picture will exist when shooting, and the color of the background area will be removed through the edge detection algorithm.
  • the preprocessed gel image can be converted into a grayscale image, and then the grayscale image is filtered; finally, the grayscale image can be converted into a binary image by using an edge detection algorithm.
  • the multi-template matching method can be used to find the position of the glue hole.
  • lanes and bands can be identified by image processing.
  • step S130 lanes and bands are identified on the gel image that has undergone image preprocessing.
  • the start and end positions of the lanes can be identified on the image-preprocessed gel map. Then, determine the maximum and minimum band positions on the analysis gel map.
  • identifying lanes and bands on an image-preprocessed gel map may include: extracting a pixel matrix of the pre-processed gel map image; End position; the maximum and minimum band positions on the gel map are determined by a row sum algorithm.
  • step S140 according to the identified swimming lanes and bands, the plasmids, products and scale lanes of each sample are determined. For example, in one embodiment, for the identified lanes and bands, the number of bands in each lane needs to be calculated. Afterwards, the positions of the lanes corresponding to the plasmids, products, and rulers of each sample were determined. Thus, by distinguishing the lanes corresponding to the plasmids, products, and scales of each sample, the bands in each lane can be analyzed.
  • determining the plasmids, products and ruler lanes of each sample may include the following sub-steps: (1) extracting the image pixel matrix of the plasmids, products and ruler lanes of each sample; (2) Determine the position of the band in each lane; (3) Calculate the number of bands in each lane; (4) Determine the position of the lane corresponding to the plasmid, product, and scale of each sample.
  • the pixel matrix can be summed by row to obtain the sum of the pixels of each row to obtain the row and data; determine the row and data curve, smooth the row and data curve, and calculate Smoothed lines and data peak points; record the position of the peak point as the recorded band position.
  • the pixel matrix can be summed by column to obtain the sum of the pixels of each column to obtain the column and data; the column and data curve are determined, and the column and data curve are smoothed, And calculate the peak and fall points of the column and data after smoothing; when the number of pixels between the peak point and the peak point accounts for more than 80% of the total pixels in the lane, it is judged as a band. bands; count the number of bands in each lane.
  • the peak point is the position where the maximum value of the second derivative of the column sum data is obtained
  • the peak falling point is the position where the minimum value of the second derivative of the column sum data is obtained.
  • the content on the picture may be analyzed first to identify the content of the swimlane and the strip.
  • an opening and closing operation can be performed to remove small impurities, and then grayscale processing is performed on the image.
  • Find the start and end positions of the swimlane by summing the grayscale images in columns.
  • step S150 the plasmids, products, and bands in the scale lanes of each sample are analyzed. Specifically, the plasmids, products, number of bands in the ruler lanes, and band positions were determined for each sample. Determine the size of the band according to the position of the band in the ruler lane to determine the size of each band in the plasmid and product lanes.
  • step S160 it is judged whether the glue running is correct or not. That is to say, according to the known product size, it can be judged whether the sample runs correctly or not.
  • steps S150 and S160 may include the following sub-steps: (10-1) According to the type of the scale, determine the size of each strip of the scale, and calculate the spacing ratio between the strips of the scale and the distance between the strips (10-2) Extract the image pixel matrix of the ruler lane, and calculate the size of each band in the ruler lane; (10-3) Extract the image pixels of the plasmid lane Matrix, calculate the number of bands in the plasmid lane and the size of each band, and judge whether the sample runs the gel correctly; (10-4) Extract the image pixel matrix of the product lane, and calculate the number of bands in the product lane and the size of each band , to judge whether the sample runs the glue correctly or not.
  • the above-mentioned sub-step (10-2) may include: determining the size of each band in the ruler swimming lane according to the type of the ruler, the acquired number of bands in the ruler swimming lane and the spacing ratio; when the number of bands in the ruler swimming lane When there are more than the expected number of strips of the ruler type and the ratio of the upper and lower spacing of the strips is incorrect, the strip is determined to be an error strip, and the error strip is deleted; When there is a number of bands, reduce the smoothness and recalculate the number and size of bands.
  • the above-mentioned sub-step (10-3) may include: calculating the size of each band in the plasmid swimming lane; determining the main band according to the actual sample size;
  • mean_indel_up_15 is the average value of the 15th to 19th rows above the strip
  • mean_gene_indel_max is the pixel row of the strip and the upper and lower two rows and the mean value of the peak position
  • mean_background is between the two lanes
  • mean_gene_after is the pixel row sum of the middle row between the strip and the first strip below
  • a (mean_indel_up_15/(mean_gene_indel_max-mean_background))- (mean_gene_indel_max/mean_background).
  • the background between the two lanes is the background pixel matrix between the plasmid lane and the first lane behind it.
  • the above-mentioned sub-step (4) may include: calculating the size of each band in the product swimming lane; presetting the number of actual sample bands as a standard quantity, and presetting the size of the actual sample band as a standard parameter;
  • the band size includes the standard parameters and the sum of each standard parameter, it is judged that the enzyme cleavage is incomplete;
  • the band size includes the sum of standard parameters and non-standard parameters, it is determined that there is a miscellaneous band
  • the band in the swimming lane forms a band
  • the band satisfies: (the average value of the pixel row sum of the 10th-100th row after the peak point - the average value of the pixel row sum of the 30 rows before the peak point from the starting point) / (the peak point pixel row When the sum-peak pixel line sum)>30%, it is judged as plasmid cutting paste;
  • mean_indel_up_15 is the average value of the 15th to 19th rows above the strip
  • mean_gene_indel_max is the pixel row of the strip and the upper and lower two rows and the mean value of the peak position
  • mean_background is between the two lanes
  • mean_gene_after is the pixel row sum of the middle row between the strip and the first strip below
  • a (mean_indel_up_15/(mean_gene_indel_max-mean_background))- (mean_gene_indel_max/mean_background).
  • the background between the two lanes is the background pixel matrix between the product lane and the first lane behind it.
  • step S160 If the judgment result of step S160 is "Yes", that is, it is judged that the glue running result is correct, the method 100 proceeds to step S170, and for the sample with correct glue running, the glue image of a single sample is cut and retained. Specifically, because the position of the glue hole has been searched before, for the sample with the correct glue running, according to the position of the glue hole, the glue map of a single sample is cut and reserved according to the requirement of retaining half of the glue hole.
  • step S160 determines whether the glue running result is incorrect. If the judgment result of step S160 is "No", that is, it is judged that the glue running result is incorrect, the method 100 proceeds to step S180, and for the samples with incorrect glue running, the reason for the incorrect running glue is judged and saved.
  • the reasons for running the gel incorrectly can include at least one of the following: supercoiled, plasmid degradation, plasmid dropout, genome, size error, incomplete enzyme digestion, stray bands, slicing, and the like.
  • the content of band analysis includes at least the following contents: judging the plasmids, products of each sample, the number of bands and band positions in the ruler lane, and determining the size of the plasmid and each product by the position of the ruler bands , according to the known product size, determine whether the sample is running correctly or not, and determine the type of problem (supercoiled, plasmid degradation, plasmid loss, genome, wrong size, incomplete enzyme digestion, hybrid band) for the sample in question. , slicing, etc.).
  • the glue map of a single sample is cut and retained according to the requirement of retaining half of the glue holes, and the reasons for the problem of sample preservation are determined.
  • the band position can be confirmed first, that is, band formation judgment can be performed. Specifically, the image pixel matrix of the lanes corresponding to plasmids, products, and rulers of each sample can be extracted. By judging the degree of brightness concentration in the swimming lane, the area with relative brightness concentration is extracted and recorded as the position of the possible band. Then, it is judged whether a band is formed by the shape of the area. More specifically, the image pixel matrix of the swimlane to be calculated can be extracted, the matrix pixels are summed row by row, and the row sum data can be obtained. Determine the row and data curve, smooth the row and data curve, and calculate the smoothed row and data peak point. Record the position of the peak point as the recorded band position.
  • the sizes of plasmids and products were determined according to the position of the bands in the ruler lanes. Specifically, the pixel matrix of the ruler lanes is pre-extracted. According to the size of each strip of the ruler, the interval ratio interval between the corresponding strips in each ruler and the traction force between the ruler strips are analyzed, and set as the interval parameter interval between the ruler strips. From this, the size of each band in the plasmid and product lanes was calculated.
  • judgment criteria can be obtained by the following methods. For example, data on plasmid degradation and the presence of genomes and supercoils on plasmids in a large number of samples can be extracted from the production of existing large sample information. The brightness value of the position of the band was compared with the brightness value of different positions above and below the band to determine the criterion for plasmid degradation. Comparing the brightness value of the band position with the maximum brightness of the main band and the sum of the brightness of the whole region, the criteria for judging the presence of genome on the plasmid and the criteria for supercoiling are determined.
  • the brightness value of the band position is compared with the brightness value of the sample band, as well as the brightness and contrast of all pixel points of all bands in the sample to determine the judgment standard for the existence of the genome. Compare the brightness value of the strip position with the brightness values of different positions above and below the strip to determine the judgment standard for cutting paste.
  • the product size of the known sample determine whether the sample runs correctly or not, and determine the type of problem (supercoiled, plasmid degradation, plasmid loss, genome, wrong size, incomplete enzyme digestion, hybrid tape, slicing, etc.).
  • the glue map of a single sample is cut and retained according to the requirement of retaining half of the glue holes, and the reasons for the problem of sample preservation are determined.
  • the method of the present invention can not only judge whether the sample runs correctly or not, but also can judge the type of the problem (supercoiling, plasmid degradation, plasmid loss, genome, wrong size, incomplete enzyme digestion) for the problematic sample , miscellaneous tape, cutting paste, etc.).
  • Example 1 is used to explain the process of gel image identification.
  • the KB standard with 9 bands from 500bp to 10000bp and the DL3000 standard with 8 bands from 100bp to 3000bp may coexist in the same gel image, first determine which scale corresponds to each sample in the picture. kind of standard.
  • the size of the band or the position in the gel image is in bp (base pair) as a unit or scale, that is, the number of base pairs.
  • the corrected image uses the SSIM (Structural SIMilarity) algorithm idea to adjust the brightness and contrast, and unifies the image standards with different brightness and contrast caused by environmental factors such as different batches and equipment loss.
  • SSIM Structuretural SIMilarity
  • Cut the background of the adjusted gel image use the dynamic pixel percentage quantile of the picture to remove the background part, and use the image morphological processing method salt and pepper denoising algorithm and Gaussian filter denoising algorithm to remove the noise in the picture.
  • the template matching method is used to cut out the glue map area to be processed. And use the method of multi-template matching to find the position of the glue hole, which is convenient to cut only half of the glue hole for more accurate positioning when saving the glue map.
  • the original image is cut out according to the position of the original image where the binary image is analyzed by the strip position.
  • salt-and-pepper denoising algorithm As the salt-and-pepper denoising algorithm, what needs to be removed is salt-and-pepper noise, also known as impulse noise, which is a random white or black point. There may be black pixels in bright areas or white pixels in dark areas (or both). Salt and pepper noise may be caused by sudden strong disturbances to the image signal, analog-to-digital converters, or bit transmission errors.
  • impulse noise also known as impulse noise
  • Gaussian filtering is a linear smoothing filter, suitable for removing Gaussian noise, and is widely used in the noise reduction process of image processing.
  • Gaussian filtering is a process of weighted averaging of the entire image. The value of each pixel is obtained by the weighted average of itself and other pixel values in the neighborhood.
  • the specific operation of Gaussian filtering is: use a template (or convolution, mask) to scan each pixel in the image, and replace the value of the center pixel of the template with the weighted average gray value of the pixels in the neighborhood determined by the template.
  • Figure 2 shows an example of a raw gel map.
  • every 3 lanes is the same sample. That is, the 1-3 swimming lanes in Fig. 2 are respectively the plasmid swimming lane, the product swimming lane and the ruler swimming lane of the same sample; similarly, the 4-6 swimming lanes are the plasmid swimming lane, the product swimming lane and the ruler swimming lane of another sample respectively; other For the samples, see lanes 7-9, 10-12, etc.
  • FIG. 3 shows the gel map of the example gel map of FIG. 2 after it has been analyzed and cut.
  • Example 2 is used to illustrate the process of sample analysis.
  • the image pixel matrix of the lane corresponding to the plasmid, product and marker (Marker) of each sample is extracted.
  • the degree of brightness concentration in the swimming lane the area with relative brightness concentration is extracted and recorded as the position of the possible band. Whether a band is formed is judged by the shape of the area, and the band is recorded as the position of the band.
  • extract the swimming lane pixel matrix to be calculated sum the matrix pixels by row to obtain the row and data; determine the row and data curve, smooth the row and data curve, and calculate the smoothed row and data.
  • Data peak point record the position of the peak point as the recorded band position.
  • the peak point is the position where the maximum value of the second derivative of the column sum data is obtained
  • the peak falling point is the position where the minimum value of the second derivative of the column sum data is obtained.
  • the so-called band size is the position of the band on the gel map.
  • a large number of pixel matrices of the ruler lanes are extracted in advance, and according to the size of each strip of the ruler (such as DL3000Marker: the size of each strip (that is, the position from top to bottom) is 3000, 2000, 1500, 1000 respectively.
  • each band of 10kb (that is, the position from top to bottom) is 10000, 8000, 6000, 5000, 4000, 3000, 2000, 1000, 500bp), and the corresponding bands in each scale are analyzed The interval between the scales and the size of the traction force between the ruler strips are set as the interval parameter interval between the ruler strips, so as to calculate the size of each strip.
  • the data of plasmid degradation and the presence of genome and supercoiling on the plasmid in a large number of samples were extracted.
  • the brightness value of the strip position is compared with the brightness values of different positions above and below the strip to determine the judgment standard of plasmid degradation; Determine the criteria for determining the genome on the plasmid and the criteria for supercoiling.
  • FIG. 3 shows the gel map of the example gel map of FIG. 2 after it has been analyzed and cut.
  • Figure 4 is a schematic diagram of the type of problem that determines the incorrect run of glue in the glue map analysis stage.
  • lane 1 is the original plasmid
  • lane 2 is the digested plasmid and target fragment (product)
  • lane 3 is the ruler.
  • the actual product band size of the first sample in Figure 4 (that is, in the lower left box) is 2090bp/2671bp, but there are obviously three bands in the product lane, two of which are consistent with the actual size, and the third is the other two.
  • Band analysis of the plasmid swimming lane record the position and degree of band formation, calculate the size of each band in the plasmid swimming lane according to the size of the standard species and the traction force between the ruler bands, and determine the size of the band according to the actual sample size.
  • the main band is the second band in the plasmid swimming lane; there is a band above the main band, and the band meets the criterion for supercoiling, and is determined to be supercoiled.
  • product swimming lane band analysis record the banded band position in the swimming lane, calculate the size of each band in the product swimming lane according to the size of the ruler band and the traction force between the ruler bands;
  • the band size of the actual sample is 2090bp and 2671bp
  • the preset standard number of actual sample band is 2, and the standard parameters of the band size are 2090bp and 2671bp;
  • the invention can be applied to the fields that need to analyze the gel map, such as qc gel map and protein gel map.
  • the method according to the embodiment of the present invention realizes automatic identification, analysis, and cutting of the glue map, calculates the information in the glue map, and compares it with the actual sample information to determine whether the glue running result of the sample is correct.
  • a unified standard has been formulated to reduce inconsistencies in judgment standards caused by human factors.
  • the present invention is realized by computer, completely automatically judges and cuts the graph without human interference, and realizes the purpose of high efficiency and unity.
  • Non-transitory computer readable media include various types of tangible storage media.
  • non-transitory computer-readable media examples include magnetic recording media (such as floppy disks, magnetic tapes, and hard drives), magneto-optical recording media (such as magneto-optical disks), CD-ROMs (Compact Disc Read Only Memory), CD-Rs, CD-Rs /W and semiconductor memories such as ROM, PROM (Programmable ROM), EPROM (Erasable PROM), Flash ROM and RAM (Random Access Memory).
  • these programs can be provided to a computer by using various types of transitory computer-readable media. Examples of transitory computer-readable media include electrical signals, optical signals, and electromagnetic waves. Transitory computer readable media can be used to provide the program to a computer through wired or wireless communication paths such as wires and optical fibers.
  • an apparatus for automatically identifying and analyzing a gel image including a processor and a memory.
  • a computer program is stored in the memory.
  • the computer program when executed by the processor, can implement the aforementioned method of automatically identifying and analyzing a gel image.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Electrochemistry (AREA)
  • Physics & Mathematics (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Image Processing (AREA)

Abstract

一种自动识别和分析胶图的方法(100),首先,对获得的样本电泳图进行图像的识别,将图像矫正,然后识别图像中的每个样本对应的质粒、产物、标尺泳道,并确定相应泳道在图片中的位置,通过确定每张图片上的胶孔位置,实现切图时保留半个胶孔。在样本信息分析阶段,首先确定样本用的标尺类型,然后确定每条条带的大小,并确定质粒和产物泳道的条带数。自动识别和分析胶图的方法(100)可以实现胶图自动分析并切割样本胶图,能自动判断出跑胶是否正确,针对跑胶不正确的情况,能具体分析问题的类型。

Description

自动识别和分析胶图的方法
相关申请的交叉引用
本申请要求2021年3月8日提交的申请号为202110251652.2的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本发明涉及生物化学领域中的电泳条带技术,更具体涉及自动识别和分析胶图的方法。
背景技术
电泳技术,或电泳条带技术,现在已经广泛地应用于生物和医学等领域。
许多分子,如氨基酸、多肽、核苷酸等都具有可电离基团,在电场的作用下,这些带电分子会向着与其所带电荷极性相反的电极方向移动,即产生电泳的现象。由于各种离子的移动速率不同,就可以把各种不同物质分成了若干排,在光学仪器下便显示为一条条的光带,这就是电泳条带。人们可以根据条带的宽窄和位置利用已有的资料或软件,来分析各种物质的成分和含量。
电泳条带可以通过光学仪器,以胶图的形式展现出来。由此,科研人员可以对胶图,具体是胶图上的泳道和条带,进行识别与分析。无论胶图的识别,还是胶图的分析,现今都存在大量的人工操作。
胶图人工分析首先需要人工确认样本信息,判断胶图上的每个样本跑胶出来的期望结果,对比判断信息内容是否匹配。然后,根据判断结果切图保存,标注是否有问题。这种人工操作过程中不仅花费大量的时间、精力和成本,也会有较高的出错的机率,而且不同人员的判断存在着误差。这是因为,对待同一样本,判断的标准有可能不一致,人为因素干扰比较大。
现有技术中存在对胶图的自动识别的研究,但是目前的研究需要人工主动调节参数,并且没有自动分析功能,或者识别与分析功能与现实需求相比还存在很大不足。
发明内容
如上所述,对胶图进行自动识别与分析,根据分析结果进行问题判断与剪切,为解决人工操作问题提供了一个新思路。
此外,胶图是图片形式,可以将图像处理技术加入到图片处理的过程中,即运用图像处理的方法处理胶图。
有鉴于此,本发明提出了一种自动识别和分析胶图的技术,可以实现胶图自动分析并切割样本胶图。首先,对获得的样本电泳图进行图像的识别,将图片矫正,然后识别图像中的每个样本对应的质粒、产物、标尺(Marker)泳道,并确定相应泳道在图片中的位置。这里所提到的标尺泳道,即Marker泳道,是采用DNA标尺进行跑胶所形成的泳道。标尺一般可以用英文表示为“Marker”,在分子生物学实验中指的是电泳中用来代表DNA分子量大小的标记物。选用不同的Marker类型,会形成具有不同条带数量和大小的标尺泳道,例如,选用DL3000 DNA Marker会形成从100bp到3000bp共8条条带的泳道,选用10K DNA Marker会形成从500bp到10000bp共9条条带的泳道。
通过确定每张图片上的胶孔位置,实现切图时保留半个胶孔。在样本信息分析阶段,首先确定样本用的标尺是采用KB标准还是DL3000标准,然后确定每条条带的大小,并确定质粒和产物泳道的条带数。由于统一了问题判断逻辑和标准,所以本发明的技术方案能够自动判断出跑胶是否正确。而且,针对跑胶不正确的情况,还能具体区分出出问题的类型,从而可以指导后续试验参数的调整。
根据本发明中的实施例,本发明的第一方面涉及一种自动识别和分析胶图的方法。该方法可以包括如下的步骤:获取电泳图的胶图原始图片;对胶图原始图片进行图像预处理;在经过图像预处理的胶图上识别泳道与条带;根据所识别出的泳道与条带,确定每个样本的质粒、产物和标尺泳道;对每个样本的质粒、产物、标尺泳道中的条带进行分析,判断样本跑胶的正确与否;对于跑胶正确的样本,将单个样本的胶图剪切保留。
在一种优选实施方式中,所述方法进一步包括:对于跑胶不正确的样本,判断跑胶不正确的原因并保存。
在一种优选实施方式中,所述方法中,图像预处理包括:根据胶图原始图片中条带的方向,对图片的方向进行矫正;对图片的亮度和对比度进行归一化;去除背景颜色。
在一种优选实施方式中,所述方法进一步包括:将经过预处理的胶图图像转换成灰度图像;对所述灰度图像进行滤波处理;利用边缘检测算法将所述灰度图像转换成二值图像。
在一种优选实施方式中,所述方法中,在经过图像预处理的胶图上识别泳道与条带包括:提取所述经预处理的胶图图像的像素矩阵;按照列求和算法识别胶图上泳道的开始位置和结束位置;按照行求和算法确定胶图上的最大和最小条带位置。
在一种优选实施方式中,所述方法中,根据所识别出的泳道与条带,确定每个样本的质粒、产物和标尺泳道包括:(6-1)提取每个样本的质粒、产物和标尺泳道的图像像素矩阵;(6-2)确定每条泳道中条带的位置;(6-3)计算每条泳道中成带的条带数量;(6-4)确定每个样本的质粒、产物、标尺所对应的泳道的位置。
具体地,上述步骤(6-2)包括:将像素矩阵按行求和,求得每行像素之和,得到行和数据;确定行和数据曲线,对行和数据曲线进行平滑处理,并计算出平滑处理后的行和数据峰值点;将峰值点的位置记作记录的条带位置。
具体地,上述步骤(6-3)包括:将像素矩阵按列求和,求得每列像素之和,得到列和数据;确定列和数据曲线,对列和数据曲线进行平滑处理,并计算出平滑处理后的列和数据起峰点和落峰点;当处于起峰点和落峰点间的像素点数量占该泳道总像素点的比例达到80%以上,则判断为成带的条带;计算每条泳道中成带的条带数量;其中,所述起峰点为列和数据求二阶导数最大值所在位置,所述落峰点为列和数据求二阶导数最小值所在位置;
其中,所述的列和数据求二阶导数S=y i+1-y i,表示列和数据曲线的切线斜率的变化率。其中,y i表示一阶导数,y i=x i+1-x i
在本发明的实施例中,通过提取某一泳道的像素矩阵,获得该泳道所有列的像素列和x 1…x i…x n,一般地,一条泳道的n的范围为60-180。其中x 1为第1列的像素列和;x i为第i列的像素列和;x n为最后一列的像素列和;2≤i≤n,且i和n为自然数。
在一种优选实施方式中,所述方法中,对每个样本的质粒、产物、标尺泳道中的条带进行分析,判断样本跑胶的正确与否包括:根据生成标尺泳道所采用的标尺的类型确定标尺泳道中各条带的位置和大小;计算质粒和产物泳道中各条带的大小,判断样本跑胶的正确与否。
具体地,上述方法包括:(10-1)根据标尺的类型,确定标尺各条带的大小,并计算出标尺各条带之间的间距比例及条带之间牵引力大小,设置为标尺各条带之间的间距 参数;(10-2)提取标尺泳道的图像像素矩阵,计算标尺泳道中各条带的大小;(10-3)提取质粒泳道的图像像素矩阵,计算质粒泳道中条带数量和各条带的大小,判断样本跑胶正确与否;(10-4)提取产物泳道的图像像素矩阵,计算产物泳道中条带数量和各条带的大小,判断样本跑胶正确与否。
更具体地,上述步骤(10-2)包括:根据标尺的类型、获取的标尺泳道中条带数量和间距比例,确定所述标尺泳道中各条带的大小;当标尺泳道条带数量多于该标尺类型的应有条带数量且有条带上下间距比例不正确时,判定该条带为错误条带,删除该错误条带;当标尺泳道条带数量少于该标尺类型的应有条带数量时,则降低平滑度,重新计算条带数量和大小。
更具体地,上述步骤(10-3)包括:计算质粒泳道中各条带的大小;根据实际样本大小,确定主条带;
当主条带不成带,且像素矩阵行和数据曲线不成峰时,判定为质粒降解;
当主条带下方有条带时,判定为质粒掉带;
当主条带上方有条带,且满足:
(1)所述条带像素总和/泳道内所有条带的像素总和>0.15;和
(2)所述条带最大像素所在行的像素均值>两泳道之间背景的相同行像素均值+15,和/或,所述条带最大像素所在行的像素均值>1.35*两泳道之间背景的相同行像素均值时,判定为超螺旋;
当泳道包含大于10000bp的条带,且该条带满足:
(a)mean_background/(mean_gene_indel_max-mean_background)<3.3且0<a<0.5;
和/或,(b)3.2≤mean_background/(mean_gene_indel_max-mean_background)<3.5,且0.35<a<0.5;
和/或,(c)mean_background/(mean_gene_indel_max-mean_background)<2,且mean_gene_indel_max≥(mean_gene_after+2.2),和/或mean_gene_indel_max≥(mean_indel_up_15+3)时,判定为基因组;
其中,mean_indel_up_15为所述条带上方第15~19四行行和平均值;mean_gene_indel_max为所述条带的像素行和峰值所在位置的上下各两行行和平均值;mean_background为在两泳道之间背景中与所述mean_gene_indel_max相同四行的像素行和平均值;mean_gene_after为所述条带到下方第一个条带之间的中间行像素行和;a =(mean_indel_up_15/(mean_gene_indel_max-mean_background))-(mean_gene_indel_max/mean_background);其中所述两泳道之间背景为质粒泳道与其后面的第一条泳道之间的背景像素矩阵。
更具体地,上述步骤(10-4)包括:计算产物泳道中各条带的大小;预设实际样本条带的数量作为标准数量,预设实际样本条带的大小作为标准参数;
当泳道中条带数量与标准数量一致,且条带大小与标准参数不一致时,判定为大小错误;
当泳道中条带数量多于标准数量,且条带大小包含标准参数及各标准参数之和时,判定为酶切不完全;
当泳道中条带数量多与标准数量,且条带大小包含标准参数及非各标准参数之和时,判定为存在杂带;
当泳道中条带成带,且该条带满足:(落峰点后第10-100行像素行和的均值-起点至起峰点前30行像素行和的均值)/(峰值点像素行和-起峰点像素行和)>30%时,判定为质粒切糊;
当泳道中包含大于10000bp的条带,且该条带满足:
(a)mean_background/(mean_gene_indel_max-mean_background)<3.3且0<a<0.5;
和/或,(b)3.2≤mean_background/(mean_gene_indel_max-mean_background)<3.5,且0.35<a<0.5;
和/或,(c)mean_background/(mean_gene_indel_max-mean_background)<2,且(mean_gene_indel_max≥(mean_gene_after+2.2),和/或mean_gene_indel_max≥(mean_indel_up_15+3)时,判定为基因组;
其中,mean_indel_up_15为所述条带上方第15~19四行行和平均值;mean_gene_indel_max为所述条带的像素行和峰值所在位置的上下各两行行和平均值;mean_background为在两泳道之间背景中与所述mean_gene_indel_max相同四行的像素行和平均值;mean_gene_after为所述条带到下方第一个条带之间的中间行像素行和;a=(mean_indel_up_15/(mean_gene_indel_max-mean_background))-(mean_gene_indel_max/mean_background);其中所述两泳道之间背景为产物泳道与其后面的第一条泳道之间的背景像素矩阵。
在一种优选实施方式中,所述方法中,跑胶不正确的原因包括以下的至少一种:超螺旋、质粒降解、质粒掉带、基因组、大小错误、酶切不完全、杂带、切糊。
在一种优选实施方式中,所述方法可以进一步包括:查找胶孔的位置;对于跑胶正确的样本,根据胶孔的位置,将单个样本的胶图按照保留半个胶孔的需求剪切保留。
根据本发明中的实施例,本发明的第二方面涉及一种自动识别和分析胶图的装置,该装置包括处理器和存储器。在存储器中存储有计算机程序。所述计算机程序在由处理器执行时,可实现如本发明第一方面所述的方法。
根据本发明中的实施例,本发明的第三方面涉及一种计算机可读介质。该计算机可读介质可以用于记录可由处理器执行的指令。所述指令在被处理器执行时,使得处理器实现如本发明第一方面所述的方法。
如本发明的实施例中所述,胶图识别和分析可以在计算机软件的自动识别和分析的引导下,以较高精度自动识别分析条带信息,尤其对于人工操作不统一的样品,实现统一标准。
使用计算机程序编程,基于人工判断标准,以及胶图的排布和跑胶的基本原理与性质,设计合理的识别算法和分析技术,可以精准、高效地完成整张胶图的识别、分析以及剪切过程。
附图说明
本发明包括说明书附图,其应为视为包含在说明书中并且构成说明书的一部分,且与说明书一起示出了本发明的各种示例性实施例、特征和方面,并且用于解释本发明的原理。在附图中:
图1是根据本发明的实施例的自动识别和分析胶图的方法的流程图。
图2示出了原始胶图的一个示例。
图3示出了图2的示例胶图在被分析和切割后的胶图。
图4是在胶图分析阶段判断跑胶不正确的问题类型的示意图。
具体实施方式
以下将参考附图详细说明本发明的各种示例性实施例、特征和方面。附图中相同的附图标记表示功能相同或相似的元件。尽管在附图中示出了实施例的各种方面,但是除非特别指出,不必按比例绘制附图。
在这里专用的词“示例性”意为“用作例子、实施例或说明性”。这里作为“示例性”所说明的任何实施例不必解释为优于或好于其他实施例。
另外,为了更好的说明本发明,在下文的具体实施方式中给出了众多的具体细节。本领域技术人员应当理解,没有某些具体细节,本发明同样可以实施。在一些实例中,对于本领域技术人员熟知的方法、手段、元件和电路未作详细描述,以便于凸显本发明的主旨。
本领域技术人员应当理解并认识到,本发明是由计算机实现的,从而排除了包括人为判断、人工操作、人为干预在内的各种人为因素。同时,避免了判断标准不统一、问题类型或原因记录不完整或不规范等胶图的人工识别与分析中经常会出现的问题。
图1是根据本发明的实施例的自动识别和分析胶图的方法的流程图。
如图1所示,通过方法100的流程图来说明根据本发明的实施例的自动识别和分析胶图的方法。
整个方案可分为两部分:图像识别和条带分析。
首先来看图像识别。
在图像识别阶段,需要先进行整个图像的预处理,然后再去精细地识别每个样品的泳道与条带。
在步骤S110,获取电泳图的胶图原始图片。图2示出了原始胶图的一个示例。
在步骤S120,对胶图原始图片进行图像预处理。图像预处理至少可以包括:根据胶图原始图片中条带的方向,对图片的方向进行矫正;对图片的亮度和对比度进行归一化;去除背景颜色。
更具体地说,在一个示例中,图像识别处理可分为如下部分:
图像旋转:图片在拍照时存在摆放歪斜情况。按照条带的方向,自动将图片的方向进行矫正。例如,在图2所示的原始胶图中,需要将图片方向调整为条带方向均为水平方向的位置。因此,需要将原始图片进行图像旋转。
图片亮度和对比度调整:实际生产中存在拍摄时间的先后顺序。随着拍摄仪灯管损耗等情况,拍摄的图片亮度和清晰度不一致。因此,在做图像识别之前,应调整图像的对比度和亮度,将所有图片调至同一水平上分析。也就是图像处理中所谓的归一化或标准化处理。
去除背景颜色:图片在拍摄时会存在背景颜色,通过边缘检测算法,将背景区域的颜色去除。
更具体地说,可以将经过预处理的胶图图像转换成灰度图像,然后对灰度图像进行滤波处理;最后,利用边缘检测算法将灰度图像转换成二值图像。
在只剩下胶图的图片上,可以运用多模板匹配的方法查找胶孔的位置。在实际生产中切胶的过程存在多切的过程,因此在查找胶孔的位置的过程中需将分析的区域剪切下来。
图像的预处理结束后,可以通过图像处理识别泳道与条带。
在图1的流程图中,在步骤S130,在经过图像预处理的胶图上识别泳道与条带。例如,在一种实施例中,可以在经过图像预处理的胶图上识别泳道的开始位置和结束位置。然后,确定分析胶图上的最大和最小条带位置。
更具体地,在经过图像预处理的胶图上识别泳道与条带可能会包括:提取所述经预处理的胶图图像的像素矩阵;按照列求和算法识别胶图上泳道的开始位置和结束位置;按照行求和算法确定胶图上的最大和最小条带位置。
在步骤S140,根据所识别出的泳道与条带,确定每个样本的质粒、产物和标尺泳道。例如,在一种实施例中,对于所识别出的泳道与条带,需要计算每条泳道的条带数量。之后,确定每个样本的质粒、产物、标尺所对应的泳道的位置。由此,将每个样本的质粒、产物、标尺所对应的泳道相区分,就可以针对各个泳道中的条带进行分析。
更具体地,根据所识别出的泳道与条带,确定每个样本的质粒、产物和标尺泳道可以包括如下子步骤:(1)提取每个样本的质粒、产物和标尺泳道的图像像素矩阵;(2)确定每条泳道中条带的位置;(3)计算每条泳道中成带的条带数量;(4)确定每个样本的质粒、产物、标尺所对应的泳道的位置。
在上述的子步骤(2)中,可以将像素矩阵按行求和,求得每行像素之和,得到行和数据;确定行和数据曲线,对行和数据曲线进行平滑处理,并计算出平滑处理后的行和数据峰值点;将峰值点的位置记作记录的条带位置。
在上述的子步骤(3)中,类似地,可以将像素矩阵按列求和,求得每列像素之和,得到列和数据;确定列和数据曲线,对列和数据曲线进行平滑处理,并计算出平滑处理后的列和数据起峰点和落峰点;当处于起峰点和落峰点间的像素点数量占该泳道总像素点的比例达到80%以上,则判断为成带的条带;计算每条泳道中成带的条带数量。这里,所述起峰点为列和数据求二阶导数最大值所在位置,所述落峰点为列和数据求二阶导数最小值所在位置。
根据以上的描述,本领域技术人员应该理解,在步骤S130和S140中,可以先对图片上的内容进行分析,以对泳道与条带内容进行识别。在一个实施例中,可以运行开闭运算去除小杂质,再对图像进行灰度化处理。将灰度图按照列求和的算法,找到泳道开始位置和结束位置。再次进行边缘检测,找到胶图的图像边框,确定分析胶图的最大和最小条带位置,去除胶孔残留物质对判断条带大小的影响,并计算出每个泳道开始和结束的位置。计算每条泳道的条带数据量,并找到每个样本的质粒、产物、标尺泳道位置。
接下来,将针对条带进行分析。
在步骤S150,对每个样本的质粒、产物、标尺泳道中的条带进行分析。具体地说,确定每个样本的质粒、产物、标尺泳道里的条带数量和条带位置。根据标尺泳道的条带位置确定条带的大小,从而确定出质粒和产物泳道中各条带的大小。
在步骤S160,判断跑胶正确与否。也就是说,根据已知产物大小,判断出样本跑胶的正确与否。
对每个样本的质粒、产物、标尺泳道中的条带进行分析(步骤S150),判断样本跑胶的正确与否(步骤S160)是一个连贯的过程,具体地说,可以包括:根据生成标尺泳道所采用的标尺的类型确定标尺泳道中各条带的位置和大小;计算质粒和产物泳道中各条带的大小,判断样本跑胶的正确与否。
更具体地说,步骤S150和S160可以包括如下的子步骤:(10-1)根据标尺的类型,确定标尺各条带的大小,并计算出标尺各条带之间的间距比例及条带之间牵引力大小,设置为标尺各条带之间的间距参数;(10-2)提取标尺泳道的图像像素矩阵,计算标尺泳道中各条带的大小;(10-3)提取质粒泳道的图像像素矩阵,计算质粒泳道中条带数量和各条带的大小,判断样本跑胶正确与否;(10-4)提取产物泳道的图像像素矩阵,计算产物泳道中条带数量和各条带的大小,判断样本跑胶正确与否。
进一步地,上述的子步骤(10-2)可以包括:根据标尺的类型、获取的标尺泳道中条带数量和间距比例,确定所述标尺泳道中各条带的大小;当标尺泳道条带数量多于该标尺类型的应有条带数量且有条带上下间距比例不正确时,判定该条带为错误条带,删除该错误条带;当标尺泳道条带数量少于该标尺类型的应有条带数量时,则降低平滑度,重新计算条带数量和大小。
进一步地,上述的子步骤(10-3)可以包括:计算质粒泳道中各条带的大小;根据实际样本大小,确定主条带;
当主条带不成带,且像素矩阵行和数据曲线不成峰时,判定为质粒降解;
当主条带下方有条带时,判定为质粒掉带;
当主条带上方有条带,且满足:(1)所述条带像素总和/泳道内所有条带的像素总和>0.15;和
(2)所述条带最大像素所在行的像素均值>两泳道之间背景的相同行像素均值+15,和/或,所述条带最大像素所在行的像素均值>1.35*两泳道之间背景的相同行像素均值时,判定为超螺旋;
当泳道包含大于10000bp的条带,且该条带满足:
(a)mean_background/(mean_gene_indel_max-mean_background)<3.3且0<a<0.5;
和/或,(b)3.2≤mean_background/(mean_gene_indel_max-mean_background)<3.5,且0.35<a<0.5;
和/或,(c)mean_background/(mean_gene_indel_max-mean_background)<2,且mean_gene_indel_max≥(mean_gene_after+2.2),和/或mean_gene_indel_max≥(mean_indel_up_15+3)时,判定为基因组;
这里,mean_indel_up_15为所述条带上方第15~19四行行和平均值,mean_gene_indel_max为所述条带的像素行和峰值所在位置的上下各两行行和平均值,mean_background为在两泳道之间背景中与所述mean_gene_indel_max相同四行的像素行和平均值,mean_gene_after为所述条带到下方第一个条带之间的中间行像素行和,a=(mean_indel_up_15/(mean_gene_indel_max-mean_background))-(mean_gene_indel_max/mean_background)。两泳道之间背景为质粒泳道与其后面的第一条泳道之间的背景像素矩阵。
进一步地,上述的子步骤(4)可以包括:计算产物泳道中各条带的大小;预设实际样本条带的数量作为标准数量,预设实际样本条带的大小作为标准参数;
当泳道中条带数量与标准数量一致,且条带大小与标准参数不一致时,判定为大小错误;
当泳道中条带数量多于标准数量,且条带大小包含标准参数及各标准参数之和时,判定为酶切不完全;
当泳道中条带数量多与标准数量,且条带大小包含标准参数及非各标准参数之和时,判定为存在杂带;
当泳道中条带成带,且该条带满足:(落峰点后第10-100行像素行和的均值-起点至起峰点前30行像素行和的均值)/(峰值点像素行和-起峰点像素行和)>30%时,判定为质粒切糊;
当泳道中包含大于10000bp的条带,且该条带满足:
(a)mean_background/(mean_gene_indel_max-mean_background)<3.3且0<a<0.5;
和/或,(b)3.2≤mean_background/(mean_gene_indel_max-mean_background)<3.5,且0.35<a<0.5;
和/或,(c)mean_background/(mean_gene_indel_max-mean_background)<2,且(mean_gene_indel_max≥(mean_gene_after+2.2),和/或mean_gene_indel_max≥(mean_indel_up_15+3)时,判定为基因组;
这里,mean_indel_up_15为所述条带上方第15~19四行行和平均值,mean_gene_indel_max为所述条带的像素行和峰值所在位置的上下各两行行和平均值,mean_background为在两泳道之间背景中与所述mean_gene_indel_max相同四行的像素行和平均值,mean_gene_after为所述条带到下方第一个条带之间的中间行像素行和,a=(mean_indel_up_15/(mean_gene_indel_max-mean_background))-(mean_gene_indel_max/mean_background)。两泳道之间背景为产物泳道与其后面的第一条泳道之间的背景像素矩阵。
如果步骤S160的判断结果为“是”,即判断跑胶结果正确,则方法100前进到步骤S170,对于跑胶正确的样本,将单个样本的胶图剪切保留。具体地,因为之前已经查找过胶孔的位置,所以对于跑胶正确的样本,根据胶孔的位置,将单个样本的胶图按照保留半个胶孔的需求剪切保留。
否则,如果步骤S160的判断结果为“否”,即判断跑胶结果不正确,则方法100前进到步骤S180,对于跑胶不正确的样本,判断跑胶不正确的原因并保存。跑胶不正确的原因可以包括以下的至少一种:超螺旋、质粒降解、质粒掉带、基因组、大小错误、酶切不完全、杂带、切糊等等。
也就是说,条带分析的内容至少包括以下的内容:判断每个样本的质粒、产物、标尺泳道里的条带数量和条带位置,由标尺条带的位置确定质粒和每条产物的大小,根据已知产物大小,判断出样本跑胶的正确与否,并针对有问题的样本判断出问题类型(超螺旋、质粒降解、质粒掉带、基因组,大小错误、酶切不完全、杂带、切糊等问题)。
根据判断出来的结果,将单个样本的胶图按照保留半个胶孔的需求剪切保留,针对有问题样本保存问题原因。
在进行条带分析时,可以先对条带位置进行确认,即进行成带判断。具体地说,可以提取每个样品的质粒、产物和标尺所对应的泳道的图像像素矩阵。通过判断泳道中亮度聚集度的方法,提取出相对亮度集中的区域,作为可能是条带的位置记录下来。然后通过区域形状判断是否成带,如判断成带,则确认记录为条带的位置。更具体地说,可以提取将要计算的泳道的图像像素矩阵,将矩阵像素按行求和,求得行和数据。确定行和数据曲线,对行和数据曲线进行平滑处理,并计算出平滑处理后的行和数据峰值点。将峰值点的位置记作记录的条带位置。
之后,根据标尺泳道的条带位置确定质粒和产物的大小。具体地说,预先提取标尺泳道的像素矩阵。根据标尺的各条带大小,分析出每种标尺中相应条带之间的间距比例区间及标尺条带之间牵引力大小,设置为标尺条带之间的间距参数区间。由此计算出质粒和产物的泳道中各条带的大小。
在判断是否跑胶正确时,需要确定一系列的判断标准。这些判断标准可以通过如下的方法来获取。例如,可以根据生产现有的大量样品信息,提取大量样品中质粒降解以及质粒上存在基因组以及超螺旋的数据。将条带位置的亮度值与条带上、下方不同位置的亮度值对比,确定质粒降解的判断标准。将条带位置的亮度值与主条带的最大亮度及整体区域亮度和的对比,确定质粒上存在基因组的判断标准和超螺旋的判断标准。
另一方面,可以提取大量样品中存在基因组和切糊的数据。将条带位置的亮度值与样品条带亮度值对比,以及与样品中所有条带的所有像素点的亮度和对比,确定存在基因组的判断标准。将条带位置的亮度值与条带上、下方不同位置的亮度值对比,确定切糊的判断标准。
根据已知样本的产物大小,判断出样本跑胶的正确与否,并针对有问题的样本判断出问题类型(超螺旋、质粒降解、质粒掉带、基因组,大小错误、酶切不完全、杂带、切糊等问题)。
根据判断出来的结果,将单个样本的胶图按照保留半个胶孔的需求剪切保留,针对有问题样本保存问题原因。
由此,本发明的方法不仅可以判断出样本跑胶的正确与否,而且可以针对有问题的样本判断出问题类型(超螺旋、质粒降解、质粒掉带、基因组,大小错误、酶切不完全、杂带、切糊等问题)。
以下来看两个具体实施例。
实施例1:
实施例1用于解释说明胶图识别的过程。
因为同一张胶图中可能同时存在从500bp到10000bp共9条条带的KB标准和从100bp到3000bp共8条条带的DL3000标准,所以首先要确定图片中的每个样本对应的标尺属于哪种标准。这里条带的大小或在胶图中的位置是以bp(base pair)为单位或尺度的,即碱基对的数量。
首先,读取胶图原始下机文件,使用霍夫变换(Hough Transformation)算法判断原始图片是否存在拍照拍歪的情况。如若有倾斜的情况,需要计算倾斜角度,然后使用图像旋转操作将图片自动矫正。
矫正后的图片使用结构相似性(SSIM(Structural SIMilarity))算法思想做亮度和对比度调整,将因批次不同、设备损耗等环境因素引起的亮度和对比度不同的图像标准统一化。
将调整好的胶图进行背景切割,利用图片动态像素百分比分位数去除背景部分,同时利用图像形态学处理的方法椒盐去噪算法和高斯滤波去噪算法去除图片中的噪声。
因胶图在人为切胶的时候会出现多切的情况,因此利用模板匹配的方法将要处理的胶图区域剪切出来。并利用多模板匹配的方法找到胶孔的位置,方便在将胶图保存时只切半个胶孔以做更准确的定位。
对要处理的图像区域(处理图)进行灰度化处理,将灰度图按照按列求和的算法,找到泳道开始和结束的位置。对处理图进行边缘检测算法,再运用阈值法将灰度图转化成二值图,对二值图进一步做高斯去噪,使得二值图内只含有条带信息(去掉二值图内的噪声)。按照二值图行求和算法,去除下方因胶孔残留导致的影响,得到条带位置分析二值图。根据联通算法,求出条带位置分析二值图中的每条泳道的开始和结束位置。并计算出每条泳道的二值条带数量,根据每条泳道的条带数量以及泳道间距判断出标尺泳道,并根据标尺泳道找出每个样本的质粒和产物的泳道。根据以上操作,将原图按照条带位置分析二值图所在原图位置剪切出来。
上文中提到了椒盐去噪算法和高斯滤波去噪算法。在椒盐去噪算法中,需要去除的是椒盐噪声,也称为脉冲噪声,是一种随机出现的白点或者黑点。可能是亮的区域有黑 色像素或是在暗的区域有白色像素(或是两者皆有)。椒盐噪声的成因可能是影像讯号受到突如其来的强烈干扰而产生、类比数位转换器或位元传输错误等。
高斯滤波是一种线性平滑滤波,适用于消除高斯噪声,广泛应用于图像处理的减噪过程。高斯滤波就是对整幅图像进行加权平均的过程。每一个像素点的值,都由其本身和邻域内的其他像素值经过加权平均后得到。高斯滤波的具体操作是:用一个模板(或称卷积、掩模)扫描图像中的每一个像素,用模板确定的邻域内像素的加权平均灰度值去替代模板中心像素点的值。
如前所述,图2示出了原始胶图的一个示例。如图2中所示,每3条泳道为同一个样品。即,图2中的第1-3泳道分别为同一个样品的质粒泳道、产物泳道以及标尺泳道;同理,第4-6泳道分别为另一个样品的质粒泳道、产物泳道以及标尺泳道;其他的样品则参看第7-9、10-12等泳道。
图3示出了图2的示例胶图在被分析和切割后的胶图。
实施例2:
实施例2用于解释说明样品分析的过程。
(1)条带位置和成带判断
首先提取出每个样品的质粒、产物和标尺(Marker)所对应的泳道的图像像素矩阵。通过判断泳道中亮度聚集度的方法,提取出相对亮度集中的区域,作为可能是条带的位置记录下来。通过区域形状判断是否成带,成带就记录为条带的位置。
更具体地,提取将要计算的泳道像素矩阵;将矩阵像素按行求和,求得行和数据;确定行和数据曲线,对行和数据曲线进行平滑处理,并计算出平滑处理后的行和数据峰值点;将峰值点的位置记作记录的条带位置。
计算出条带位置后,再对像素矩阵列求和,求得列和数据;确定列和数据曲线,对列和数据曲线进行平滑处理,并计算出平滑处理后的列和起峰点和落峰点;当处于起峰点和落峰点间的像素点数量占该泳道总像素点的比例达到80%以上,则判断为成带,否则认为该条带不成带。
这里,所述起峰点为列和数据求二阶导数最大值所在位置,所述落峰点为列和数据求二阶导数最小值所在位置。
(2)条带大小确定
如前所述,所谓的条带大小,即条带在胶图上所处的位置。对不同类型的标尺,预先提取大量的标尺泳道的像素矩阵,并根据标尺的各条带大小(如DL3000Marker:各条带大小(即从上往下的位置)分别为3000,2000,1500,1000,750,500,250,100bp;10KB Marker:10kb各条带大小(即从上往下的位置)为10000,8000,6000,5000,4000,3000,2000,1000,500bp),分析出每种标尺中相应条带之间的间距比例区间及标尺条带之间牵引力大小,设置为标尺条带之间的间距参数区间,从而计算出各条带的大小。
(3)判断标准设立
根据生产现有的大量样品信息,提取大量样品中质粒降解以及质粒上存在基因组以及超螺旋的数据。通过局部区域对比度分析,即将条带位置的亮度值与条带上、下方不同位置的亮度值对比,确定质粒降解的判断标准;以及通过与主条带的最大亮度及整体区域亮度和的对比,确定质粒上基因组的判断标准和超螺旋的判断标准。
提取大量样品中存在基因组和切糊的数据,通过局部区域对比度分析,即将条带位置的亮度值与样品条带亮度值对比,以及样品中所有条带的所有像素点的亮度和对比,确定存在基因组的判断标准;将条带位置的亮度值与条带上、下方不同位置的亮度值对比,确定切糊的判断标准。
具体可以图3-4为例。
图3示出了图2的示例胶图在被分析和切割后的胶图。
图4是在胶图分析阶段判断跑胶不正确的问题类型的示意图。
在图3和图4中,从左到右每3个泳道为同一个样品,泳道1为原始质粒,泳道2为酶切后的质粒和目标片段(产物),泳道3为标尺。
图3表明的分析结果为:图3中四个样品都是正确的,不存在上文中的问题,正常切图后,保存即可。
图4的第一个样品(即左下角框内)的实际产物条带大小为2090bp/2671bp,但是产物泳道明显存在三条带,其中两条大小与实际大小一致,第三条大小是另外两条大小之和(箭头处),因此,第三条带是酶切不完全的条带。具体可参见以下的判断依据。
1)对于标尺泳道的分析:根据预先设置的标尺种类,提取记录的条带位置信息,并根据标尺条带数量以及条带之间间距比来进行判断。如果条带多于标准且间距比不对,删除错误条带位置;如果条带少于标准,则返回提取的原始数据,降低平滑度(再做质粒和产物泳道的位置计算时,采用调整的平滑度参数),重新计算标尺条带位置,确定标尺中每条条带的大小。
2)对质粒泳道条带分析:记录条带的位置和成带程度,根据标准种类大小以及标尺条带之间牵引力大小,计算出质粒泳道中每条条带的大小,根据实际样本大小,确定主条带,为质粒泳道中的第二条条带;主条带上方有条带,且该条带满足超螺旋的判断标准,判定为超螺旋。
3)对产物泳道条带分析:记录泳道中的成带的条带位置,根据标尺条带大小以及标尺条带之间牵引力大小,计算出产物泳道中每条条带的大小;
根据实际样本的条带大小为2090bp和2671bp,预设实际样本条带的标准数量为2,条带大小的标准参数为2090bp和2671bp;
结果显示,泳道中的条带数量为3条,多于标准数量,其中两条与标准参数一致,且多出的条带大小是标准参数之和,因此,判定为酶切不完全(如图4中框内的样品)。
4)结果输出
如判定条带正确,且不存在上述打回原因,切割图片保存,如图3。
如判定样本跑胶存在问题,则分析条带,保存问题原因,如图4。
本发明可以运用到qc胶图、蛋白胶图等需要分析胶图的领域。
如上所述,根据本发明的实施例的方法实现了自动识别、分析、切割胶图,计算胶图中的信息,并与实际样本信息对比,判断样本的跑胶结果是否正确。制定了统一的标准,减少人为因素导致的判断标准不一致。
本发明通过计算机实现,完全自动判断并切图,无人为干扰,实现高效,统一的目的。
此外,本领域普通技术人员应该认识到,本发明的方法可以实现为计算机程序。如上结合图1所述,通过一个或多个程序执行上述实施例的方法,包括指令来使得计算机或处理器执行结合附图所述的算法。这些程序可以使用各种类型的非瞬时计算机可读介质存储并提供给计算机或处理器。非瞬时计算机可读介质包括各种类型的有形存贮介质。非瞬时计算机可读介质的示例包括磁性记录介质(诸如软盘、磁带和硬盘驱动器)、磁光记录介质(诸如磁光盘)、CD-ROM(紧凑盘只读存储器)、CD-R、CD-R/W以及半导体存储器(诸如ROM、PROM(可编程ROM)、EPROM(可擦写PROM)、闪存ROM和RAM(随机存取存储器))。进一步,这些程序可以通过使用各种类型的瞬时计算机可读介质而提供给计算机。瞬时计算机可读介质的示例包括电信号、光信号 和电磁波。瞬时计算机可读介质可以用于通过诸如电线和光纤的有线通信路径或无线通信路径提供程序给计算机。
例如,根据本发明的一个实施例,可以提供一种自动识别分析胶图的装置,该装置包括处理器和存储器。在所述存储器中,存储有计算机程序。所述计算机程序在由所述处理器执行时,可实现如前所述的自动识别和分析胶图的方法。
因此,根据本发明,还可以提议一种计算机程序或一种计算机可读介质,用于记录可由处理器执行的指令,所述指令在被处理器执行时,使得处理器实现如前所述的自动识别和分析胶图的方法。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的普通技术人员能理解本文披露的各实施例。

Claims (17)

  1. 一种自动识别和分析胶图的方法,包括:
    获取电泳图的胶图原始图片;
    对胶图原始图片进行图像预处理;
    在经过图像预处理的胶图上识别泳道与条带;
    根据所识别出的泳道与条带,确定每个样本的质粒、产物和标尺泳道;
    对每个样本的质粒、产物、标尺泳道中的条带进行分析,判断样本跑胶的正确与否;
    对于跑胶正确的样本,将单个样本的胶图剪切保留。
  2. 根据权利要求1所述的方法,其特征在于,所述方法进一步包括:
    对于跑胶不正确的样本,判断跑胶不正确的原因并保存。
  3. 根据权利要求1所述的方法,其特征在于,所述的图像预处理包括:
    根据胶图原始图片中条带的方向,对图片的方向进行矫正;
    对图片的亮度和对比度进行归一化;
    去除背景颜色。
  4. 根据权利要求3所述的方法,其特征在于,所述方法进一步包括:
    将经过预处理的胶图图像转换成灰度图像;
    对所述灰度图像进行滤波处理;
    利用边缘检测算法将所述灰度图像转换成二值图像。
  5. 根据权利要求1所述的方法,其特征在于,所述的在经过图像预处理的胶图上识别泳道与条带包括:
    提取所述经预处理的胶图图像的像素矩阵;
    按照列求和算法识别胶图上泳道的开始位置和结束位置;
    按照行求和算法确定胶图上的最大和最小条带位置。
  6. 根据权利要求1所述的方法,其特征在于,所述的根据所识别出的泳道与条带,确定每个样本的质粒、产物和标尺泳道包括:
    (6-1)提取每个样本的质粒、产物和标尺泳道的图像像素矩阵;
    (6-2)确定每条泳道中条带的位置;
    (6-3)计算每条泳道中成带的条带数量;
    (6-4)确定每个样本的质粒、产物、标尺所对应的泳道的位置。
  7. 根据权利要求6所述的方法,其特征在于,所述步骤(6-2)包括:
    将像素矩阵按行求和,求得每行像素之和,得到行和数据;
    确定行和数据曲线,对行和数据曲线进行平滑处理,并计算出平滑处理后的行和数据峰值点;
    将峰值点的位置记作记录的条带位置。
  8. 根据权利要求6所述的方法,其特征在于,所述步骤(6-3)包括:
    将像素矩阵按列求和,求得每列像素之和,得到列和数据;
    确定列和数据曲线,对列和数据曲线进行平滑处理,并计算出平滑处理后的列和数据起峰点和落峰点;
    当处于起峰点和落峰点间的像素点数量占该泳道总像素点的比例达到80%以上,则判断为成带的条带;
    计算每条泳道中成带的条带数量;
    其中,所述起峰点为列和数据求二阶导数最大值所在位置,所述落峰点为列和数据求二阶导数最小值所在位置。
  9. 根据权利要求1所述的方法,其特征在于,所述的对每个样本的质粒、产物、标尺泳道中的条带进行分析,判断样本跑胶的正确与否包括:
    根据标尺泳道所采用的标尺的类型确定标尺泳道中各条带的位置和大小;
    计算质粒和产物泳道中各条带的大小,判断样本跑胶的正确与否。
  10. 根据权利要求9所述的方法,其特征在于,所述方法包括:
    (10-1)根据标尺的类型,确定标尺各条带的大小,并计算出标尺各条带之间的间距比例及条带之间牵引力大小,设置为标尺各条带之间的间距参数;
    (10-2)提取标尺泳道的图像像素矩阵,计算标尺泳道中各条带的大小;
    (10-3)提取质粒泳道的图像像素矩阵,计算质粒泳道中条带数量和各条带的大小,判断样本跑胶正确与否;
    (10-4)提取产物泳道的图像像素矩阵,计算产物泳道中条带数量和各条带的大小,判断样本跑胶正确与否。
  11. 根据权利要求10所述的方法,其特征在于,所述步骤(10-2)包括:
    根据标尺的类型、获取的标尺泳道中条带数量和间距比例,确定所述标尺泳道中各条带的大小;
    当标尺泳道条带数量多于该标尺类型的应有条带数量且有条带上下间距比例不正确时,判定该条带为错误条带,删除该错误条带;
    当标尺泳道条带数量少于该标尺类型的应有条带数量时,则降低平滑度,重新计算条带数量和大小。
  12. 根据权利要求10所述的方法,其特征在于,所述步骤(10-3)包括:
    计算质粒泳道中各条带的大小;
    根据实际样本大小,确定主条带;
    当主条带不成带,且像素矩阵行和数据曲线不成峰时,判定为质粒降解;
    当主条带下方有条带时,判定为质粒掉带;
    当主条带上方有条带,且满足:
    (1)所述条带像素总和/泳道内所有条带的像素总和>0.15;和
    (2)所述条带最大像素所在行的像素均值>两泳道之间背景的相同行像素均值+15,和/或,所述条带最大像素所在行的像素均值>1.35*两泳道之间背景的相同行像素均值时,
    判定为超螺旋;
    当泳道包含大于10000bp的条带,且该条带满足:
    (a)mean_background/(mean_gene_indel_max-mean_background)<3.3且0<a<0.5;
    和/或,
    (b)3.2≤mean_background/(mean_gene_indel_max-mean_background)<3.5,且0.35<a<0.5;
    和/或,
    (c)mean_background/(mean_gene_indel_max-mean_background)<2,且mean_gene_indel_max≥(mean_gene_after+2.2),和/或mean_gene_indel_max≥(mean_indel_up_15+3)时,
    判定为基因组;
    其中,mean_indel_up_15为所述条带上方第15~19四行行和平均值;
    mean_gene_indel_max为所述条带的像素行和峰值所在位置的上下各两行行和平均值;
    mean_background为在两泳道之间背景中与所述mean_gene_indel_max相同四行的像素行和平均值;
    mean_gene_after为所述条带到下方第一个条带之间的中间行像素行和;
    a=(mean_indel_up_15/(mean_gene_indel_max-mean_background))-(mean_gene_indel_max/mean_background);
    其中所述两泳道之间背景为质粒泳道与其后面的第一条泳道之间的背景像素矩阵。
  13. 根据权利要求10所述的方法,其特征在于,所述步骤(10-4)包括:
    计算产物泳道中各条带的大小;
    预设实际样本条带的数量作为标准数量,预设实际样本条带的大小作为标准参数;
    当泳道中条带数量与标准数量一致,且条带大小与标准参数不一致时,判定为大小错误;
    当泳道中条带数量多于标准数量,且条带大小包含标准参数及各标准参数之和时,判定为酶切不完全;
    当泳道中条带数量多与标准数量,且条带大小包含标准参数及非各标准参数之和时,判定为存在杂带;
    当泳道中条带成带,且该条带满足:(落峰点后第10-100行像素行和的均值-起点至起峰点前30行像素行和的均值)/(峰值点像素行和-起峰点像素行和)>30%时,判定为质粒切糊;
    当泳道中包含大于10000bp的条带,且该条带满足:
    (a)mean_background/(mean_gene_indel_max-mean_background)<3.3且0<a<0.5;
    和/或,
    (b)3.2≤mean_background/(mean_gene_indel_max-mean_background)<3.5,且0.35<a<0.5;
    和/或,
    (c)mean_background/(mean_gene_indel_max-mean_background)<2,且(mean_gene_indel_max≥(mean_gene_after+2.2),和/或mean_gene_indel_max≥(mean_indel_up_15+3)时,
    判定为基因组;
    其中,mean_indel_up_15为所述条带上方第15~19四行行和平均值;
    mean_gene_indel_max为所述条带的像素行和峰值所在位置的上下各两行行和平均值;
    mean_background为在两泳道之间背景中与所述mean_gene_indel_max相同四行的像素行和平均值;
    mean_gene_after为所述条带到下方第一个条带之间的中间行像素行和;
    a=(mean_indel_up_15/(mean_gene_indel_max-mean_background))-(mean_gene_indel_max/mean_background);
    其中所述两泳道之间背景为产物泳道与其后面的第一条泳道之间的背景像素矩阵。
  14. 根据权利要求2所述的方法,其特征在于,所述的跑胶不正确的原因包括以下的至少一种:
    超螺旋、质粒降解、质粒掉带、基因组、大小错误、酶切不完全、杂带、切糊。
  15. 根据权利要求1所述的方法,其特征在于,所述方法进一步包括:
    查找胶孔的位置;
    对于跑胶正确的样本,根据胶孔的位置,将单个样本的胶图按照保留半个胶孔的需求剪切保留。
  16. 一种自动识别和分析胶图的装置,包括:
    处理器;
    存储器,其中存储有计算机程序,所述计算机程序在由处理器执行时,可实现如权利要求1-15中任意一项所述的方法。
  17. 一种计算机可读介质,用于记录可由处理器执行的指令,所述指令在被处理器执行时,使得处理器实现如权利要求1-15中任意一项所述的方法。
PCT/CN2022/079171 2021-03-08 2022-03-04 自动识别和分析胶图的方法 WO2022188696A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110251652 2021-03-08
CN202110251652.2 2021-03-08

Publications (1)

Publication Number Publication Date
WO2022188696A1 true WO2022188696A1 (zh) 2022-09-15

Family

ID=83226264

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/079171 WO2022188696A1 (zh) 2021-03-08 2022-03-04 自动识别和分析胶图的方法

Country Status (1)

Country Link
WO (1) WO2022188696A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937052A (zh) * 2023-03-14 2023-04-07 四川福莱宝生物科技有限公司 一种凝胶电泳图像的处理方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08327595A (ja) * 1995-05-26 1996-12-13 Toyo Roshi Kaisha Ltd 電気泳動ゲルからの核酸回収方法及びそれに用いる核酸回収チップ
CN103529038A (zh) * 2013-10-16 2014-01-22 无锡优创生物科技有限公司 一种凝胶中dna条带智能识别系统及方法
CN105302553A (zh) * 2015-10-21 2016-02-03 北京工业大学 一种基于vc++的凝胶图像识别和定位的方法
CN106558041A (zh) * 2015-09-29 2017-04-05 中国疾病预防控制中心传染病预防控制所 凝胶电泳数字图像中基于局部抑制的条带探测算法
CN206089656U (zh) * 2016-10-08 2017-04-12 郑州伊美诺生物技术有限公司 核酸电泳实时观测切胶平台
CN109628554A (zh) * 2018-11-16 2019-04-16 上海爪鸽医药科技有限公司 一种基于图像识别的核酸切胶回收自动化方法
CN110033449A (zh) * 2019-04-15 2019-07-19 苏州金唯智生物科技有限公司 电泳图的识别方法、装置、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08327595A (ja) * 1995-05-26 1996-12-13 Toyo Roshi Kaisha Ltd 電気泳動ゲルからの核酸回収方法及びそれに用いる核酸回収チップ
CN103529038A (zh) * 2013-10-16 2014-01-22 无锡优创生物科技有限公司 一种凝胶中dna条带智能识别系统及方法
CN106558041A (zh) * 2015-09-29 2017-04-05 中国疾病预防控制中心传染病预防控制所 凝胶电泳数字图像中基于局部抑制的条带探测算法
CN105302553A (zh) * 2015-10-21 2016-02-03 北京工业大学 一种基于vc++的凝胶图像识别和定位的方法
CN206089656U (zh) * 2016-10-08 2017-04-12 郑州伊美诺生物技术有限公司 核酸电泳实时观测切胶平台
CN109628554A (zh) * 2018-11-16 2019-04-16 上海爪鸽医药科技有限公司 一种基于图像识别的核酸切胶回收自动化方法
CN110033449A (zh) * 2019-04-15 2019-07-19 苏州金唯智生物科技有限公司 电泳图的识别方法、装置、设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937052A (zh) * 2023-03-14 2023-04-07 四川福莱宝生物科技有限公司 一种凝胶电泳图像的处理方法、装置、设备及介质

Similar Documents

Publication Publication Date Title
CN106156761B (zh) 面向移动终端拍摄的图像表格检测与识别方法
CN111081318B (zh) 一种融合基因检测方法、系统和介质
CN112669901A (zh) 基于低深度高通量基因组测序的染色体拷贝数变异检测装置
CN105069456B (zh) 一种车牌字符分割方法及装置
CN108133216B (zh) 基于机器视觉的可实现小数点读取的数码管读数识别方法
CN102426649A (zh) 一种简单的高准确率的钢印数字自动识别方法
CN103488986B (zh) 自适应字符切分及提取方法
CN110400287B (zh) 结直肠癌ihc染色图像肿瘤侵袭边缘和中心的检测系统及方法
IL319365A (en) Methods and processes for assessing genetic variations
WO2022188696A1 (zh) 自动识别和分析胶图的方法
CN108235115B (zh) 一种歌曲视频中人声区域定位的方法及终端
CN101115151A (zh) 一种视频字幕提取的方法
CN110993023A (zh) 复杂突变的检测方法及检测装置
CN113392833B (zh) 一种工业射线底片图像铅字编号识别方法
CN117315668A (zh) 一种基于ocr的文本智能识别系统
CN111666864B (zh) 一种基于计算机视觉的自动阅卷方法
CN115984859B (zh) 一种图像文字识别的方法、装置及存储介质
CN115937052B (zh) 一种凝胶电泳图像的处理方法、装置、设备及介质
CN112528741A (zh) 一种变电站压板的状态识别方法及装置
CN114612890A (zh) 一种车牌字符定位及分割方法
CN108647713B (zh) 胚胎边界识别与激光轨迹拟合方法
CN110189345A (zh) 一种数据图中数据点坐标信息的提取方法
CN116363097A (zh) 一种光伏板的缺陷检测方法及系统
CN111783807B (zh) 图片提取方法、装置和计算机可读存储介质
CN113496761A (zh) 确定核酸样本中cnv的方法、装置及应用

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22766222

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22766222

Country of ref document: EP

Kind code of ref document: A1