WO2024055149A1 - 测序方法、处理系统、测序系统 - Google Patents
测序方法、处理系统、测序系统 Download PDFInfo
- Publication number
- WO2024055149A1 WO2024055149A1 PCT/CN2022/118410 CN2022118410W WO2024055149A1 WO 2024055149 A1 WO2024055149 A1 WO 2024055149A1 CN 2022118410 W CN2022118410 W CN 2022118410W WO 2024055149 A1 WO2024055149 A1 WO 2024055149A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- core
- reagent
- row
- data
- sequencing
- Prior art date
Links
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 217
- 238000012545 processing Methods 0.000 title claims abstract description 85
- 239000003153 chemical reaction reagent Substances 0.000 claims abstract description 195
- 239000004065 semiconductor Substances 0.000 claims abstract description 121
- 239000000758 substrate Substances 0.000 claims abstract description 86
- 230000003287 optical effect Effects 0.000 claims abstract description 30
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 22
- 238000006243 chemical reaction Methods 0.000 claims abstract description 16
- 230000033001 locomotion Effects 0.000 claims description 27
- 230000005540 biological transmission Effects 0.000 claims description 18
- 238000003491 array Methods 0.000 claims description 4
- 230000003542 behavioural effect Effects 0.000 claims description 4
- 238000007635 classification algorithm Methods 0.000 claims description 4
- 238000007654 immersion Methods 0.000 abstract description 33
- 238000005842 biochemical reaction Methods 0.000 description 26
- 238000004020 luminiscence type Methods 0.000 description 18
- 238000000034 method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 230000000875 corresponding effect Effects 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 108091006146 Channels Proteins 0.000 description 8
- 230000003321 amplification Effects 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 238000003199 nucleic acid amplification method Methods 0.000 description 5
- 239000012530 fluid Substances 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 239000012634 fragment Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 230000001276 controlling effect Effects 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035484 reaction time Effects 0.000 description 2
- 238000011946 reduction process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 108090000790 Enzymes Proteins 0.000 description 1
- 102000004190 Enzymes Human genes 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002255 enzymatic effect Effects 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 229910052757 nitrogen Inorganic materials 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000000376 reactant Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
Definitions
- the present invention relates to the technical field of gene sequencing, specifically, to a sequencing method, a data processing system and a gene sequencing system.
- the sequencing substrates also called semiconductor sequencing chips used in existing high-throughput gene sequencers include surface chips and semiconductor sequencing chip integrated circuits.
- the former usually captures fluorescence signals through the microscope optical system, and the semiconductor sequencing chip completes the collection and analog-to-digital conversion of electrical or optical signals through internal integrated circuits.
- the present invention aims to solve at least one of the technical problems existing in the prior art. To this end, the present invention proposes a sequencing method, a data processing system and a gene sequencing system.
- the sequencing method is simple and can greatly reduce the amount of data processing, thereby greatly reducing the data transmission and back-end server processing. quantity.
- An embodiment of the present invention provides a sequencing method for use in a gene sequencing system.
- the semiconductor sequencing chip includes multiple cores, and the multiple cores are arranged to form multiple core rows.
- the sequencing method includes:
- N Based on the core row contacting N units of the reagent, N>0, read the data output by the core row contacting the reagent at least once until all core rows are exposed to the reagent,
- the target template is determined based on the data output by the first core row, and the target template includes a signal range of whether the base emits light;
- the base type is determined based on the light signal data.
- the above-mentioned sequencing method contacts the core rows in a contact manner with the core behavior contact unit, and determines the target template based on the data output by the first core row to simplify the data output by the remaining core rows to obtain different bases.
- Optical signal data can greatly reduce the data processing volume of the remaining core rows, which in turn can greatly reduce the amount of data transmitted and processed.
- An embodiment of the present invention provides a data processing system for use in a gene sequencing system.
- the data processing system includes a semiconductor sequencing chip, a control device and a robot.
- the semiconductor sequencing chip includes a processing module and a plurality of cores.
- the plurality of cores Distributed in an array to form multiple core rows, the processing module connects the core rows and the control device, and the control device connects the manipulator,
- the control device is used for:
- the processing module is used for:
- the data output by the core row in the reagent that is exposed to the reagent at least once is read until all core rows are exposed to the reagent.
- An embodiment of the present invention provides a gene sequencing system, including the above-mentioned data processing system.
- the above-mentioned data processing system and gene sequencing system contact the core row in a way that the core behavior contacts the unit, and determine the target template based on the data output by the first core row to simplify and process the data output by the remaining core rows.
- Obtaining the optical signal data of different bases can greatly reduce the data processing volume of the remaining core rows, which in turn can greatly reduce the amount of data transmitted and processed.
- Figure 1 is a flow chart of a sequencing method according to an embodiment of the present invention.
- Figure 2 is a schematic module diagram of the data processing system of the semiconductor sequencing chip according to an embodiment of the present invention.
- Figure 3 is a module schematic diagram of the image control module of the data processing system of the semiconductor sequencing chip according to the embodiment of the present invention.
- Figure 4 is a schematic diagram of the target template of the sequencing method according to the embodiment of the present invention.
- Figure 5 is a schematic diagram corresponding to optical signal data and base types according to an embodiment of the present invention.
- Figure 6 is a schematic structural diagram of a semiconductor sequencing chip according to an embodiment of the present invention.
- Figure 7 is a schematic diagram of the regional division of a semiconductor sequencing chip according to an embodiment of the present invention.
- Figure 8 is a schematic structural diagram of the core of a semiconductor sequencing chip according to an embodiment of the present invention.
- Figure 9 is another structural schematic diagram of the core of a semiconductor sequencing chip according to an embodiment of the present invention.
- Figure 10 is another structural schematic diagram of the core of a semiconductor sequencing chip according to an embodiment of the present invention.
- Figure 11 is a timing diagram of data reading according to the embodiment of the present invention.
- Second signal range 15 mechanical control module 16
- third signal range 17 fluid control module 18
- fourth signal range 19 temperature control module 20, environment control module 22, image processing module 24,
- Control device 26 display screen 28, control module 30, power board 32, row switching and common reading unit 34,
- Main control board 36 driver board 40, core 42, core row 44, area 48, sub-pixel array 50,
- Pixel array 51 phase locked loop
- decoder controller decoder controller
- An embodiment of the present invention provides a sequencing method for a semiconductor sequencing chip 14, which is used in a gene sequencing system.
- the semiconductor sequencing chip 14 includes a plurality of cores 42, and the plurality of cores 42 are arranged to form a Multiple core rows 44.
- the semiconductor sequencing chip 144 may include a sequencing surface and a backplane, or may include two sequencing surfaces.
- the sequencing surface includes the core 42 , and the backplane is a side that does not include the core 42 .
- the sequencing surface and the reagent can achieve the purpose of contact through the action of "contact".
- the contact action can be in several ways, but is not limited to this: the reagent achieves the purpose of contact with the fixed sequencing surface through flow, expansion, etc.;
- the sequencing surface is immersed in the reagent by moving, rotating, etc. to achieve contact; the sequencing surface and the reagent are in contact through relative motion.
- the above methods can realize the contact between the sequencing surface and the reagents, and should be understood as replacement according to the actual usage scenario.
- "the purpose of contacting by immersing the sequencing surface in the reagent by moving” is used as an example for explanation.
- Sequencing methods include:
- Step 101 control the semiconductor sequencing chip 14 to be immersed in the reagents used in each round of sequencing reaction in a immersion manner with the core row 44 as the immersion unit.
- the reagents include at least two substrate reagents.
- the corresponding base emits light or does not emit light;
- Step 103 every time N units of core rows 44 are immersed in the substrate reagent (N>0), read the data output by the core rows 44 immersed in the reagent at least once until all core rows 44 are immersed in the substrate reagent. middle,
- the target template 11 is determined using the data output by the first core row 44.
- the target template 11 includes the signal range of whether the base emits light;
- Step 105 use the target template 11 to simplify the data output by the remaining core rows 44 to obtain optical signal data of different bases;
- Step 107 Determine the base type based on the light signal data.
- the semiconductor sequencing chip 14 is immersed with the core row 44 as the immersion unit, and the data output by the first core row 44 is used to determine the target template 11 to simplify the data output by the remaining core rows 44 to obtain
- the optical signal data of different bases can greatly reduce the amount of data transmitted and processed.
- the semiconductor sequencing chip 14 includes a plurality of cores 42, and the plurality of cores 42 are arranged to form a plurality of core rows 44.
- the sequencing is initially performed, and different amplification starting fragments are loaded on the core row 44 that is first immersed in reagents and undergoes the same amplification process.
- base sequence clusters/balls In one example, there is only one core 42 on the first core row 44, and different amplification starting fragments (such as insert size of 50bp, 100bp, 200bp, 300bp) are loaded on this core 42 after the same amplification process. Base sequence clusters/balls. Through this step, the first core row 44 of the semiconductor sequencing chip 14 can obtain a target template 11 after the biochemical reaction.
- the semiconductor sequencing chip 14 is immersed in the reagents used in each round of sequencing reaction with the core row 44 as the immersion unit.
- the reagents include at least two substrate reagents.
- the substrate reagent causes different base types to show different luminescence intensities. Every time N (N>0) units of the core row 44 are immersed in the substrate reagent, the light signal data output by the core row 44 is read, and the target is used to The template 11 simplifies the data output by the remaining core rows 44 to obtain the light signal data of different bases. Through the light signal data output by the semiconductor sequencing chip 14, the base type can be determined.
- the immersion unit for each immersion of the reagent is N, and the value of N represents the number of core rows 44 that enter the reagent each time.
- N represents the number of core rows 44 that enter the reagent each time.
- N can be made to be a value greater than 1.
- N can be set.
- one core row 44 is immersed and data is read twice. The number of core rows 44 immersed in the reagent each time and the number of signal readings each time need to be set according to the actual situation, and are not specifically limited here.
- the reagents include two substrate reagents.
- a target template 11 can be obtained. , as shown in Figure 4, use the target template 11 to include the signal range of whether the base emits light (the range defined by the dotted box in Figure 4) and simplify the data output by the remaining core rows 44 to obtain the light signals of different bases. data.
- the type of base can be determined through the output light signal data.
- a data processing system 100 includes a semiconductor sequencing chip 14 , a control device 26 and a robot 10 .
- Reagents may be placed in reagent tank 12 .
- the manipulator 10, the reagent tank 12 and the semiconductor sequencing chip 14 can be placed in a closed space, and a series of biochemical reactions of gene sequencing are performed in the closed space.
- the biochemical reaction occurring in the reagent tank 12 requires maintaining a certain temperature, time and environment, and these conditions are controlled by the control device 26 .
- the semiconductor sequencing chip 14 includes a processing module (not shown) and a plurality of cores 42.
- the plurality of cores 42 are distributed in an array to form a plurality of core rows 44.
- the processing module connects the core rows 44 and the control device 26.
- the processing module is in each After the core row 44 of the unit is immersed in the substrate reagent, the data output by the core row 44 is read, and a target template 11 is determined using the data output by the first core row 44, and the signal range of whether the base emits light is limited by the target template 11.
- the target template 11 is used to simplify the data output by the remaining core rows 44 to obtain and output the optical signal data of different bases.
- Control device 26 may include a display screen 28 and a control module 30 .
- the display screen 28 may be a touch display screen, and the operator can control the entire gene sequencing process through the display screen 28 .
- the control module 30 may include a mechanical control module 16 , a fluid control module 18 , a temperature control module 20 , an environmental control module 22 and an image processing module 24 .
- the mechanical control module 16 is used to control the movement of the robot 10 to clamp the semiconductor sequencing chip 14, and control the semiconductor sequencing chip 14 to be immersed in the reagents in the reagent tank 12 with the core row 44 as the immersion unit.
- the mechanical control module 16 can control the semiconductor sequencing chip 14.
- the reaction time of the chip 14 in the reagent tank 12, and the mechanical control module 16 can control the movement speed of the manipulator 10 within an appropriate value range, which can reduce the amount of reagents brought out by the semiconductor sequencing chip 14 when entering and exiting the reagent tank, and at the same time
- the appropriate movement speed of the robot 10 can also reduce the amount of bubbles produced in the reagent.
- the fluid control module 18 is used to monitor the reagent content in the reagent tank 12. By detecting the quality of the reagent, the fluid control module 18 can monitor changes in the reagent content in the reagent tank, and then is responsible for controlling the relevant water pumps and valves to replenish and circulate the reagents. To maintain the content of key reactants required for biochemical reactions in the reagents at a certain level.
- the temperature control module 20 is used to monitor and control the temperature in the closed space through a temperature sensor to maintain the appropriate temperature required for biochemical reactions.
- the environment control module 22 is used to monitor and control the contents of various major gases in the closed space, and ensure that biochemical reactions are carried out in a low-oxygen environment by filling with nitrogen or other means.
- the image processing module 24 can be connected to the semiconductor sequencing chip 14 through the interface board 38.
- the image processing module 24 includes a row switching and common reading unit 34, a main control board 36 and a driving board 40. Through the row switching and common reading The unit 34 reads out the image signals collected by the semiconductor sequencing chip 14 row by row and transmits them to the back-end hard disk for storage.
- the row switching and common reading unit 34 can also communicate with the main control board 36 to transmit data in both directions.
- the power board 32 supplies power to the image processing module 24, and the main control board 36 performs data collection, calculation and instruction output. Connect the monitor output via driver board 40.
- the reagents include a first substrate reagent and a second substrate reagent, and when the semiconductor sequencing chip 14 is immersed in the first substrate reagent or the second substrate reagent, two of the bases are caused to emit light, and in addition Both bases do not emit light. In this way, four different bases can be distinguished through the luminescence of the two channels.
- the semiconductor sequencing chip 14 when the semiconductor sequencing chip 14 is immersed in the first substrate reagent or the second substrate reagent, a biochemical reaction occurs in the first substrate reagent or the second substrate reagent, which is manifested by the luminescence intensity of the bases during the reaction. Differently, by limiting the luminescence signal range by the target template 11, the luminescence of the bases in the first substrate reagent or the second substrate reagent can be obtained, thereby distinguishing different bases.
- the target template 11 includes a first signal range 13 , a second signal range 15 , a third signal range 17 and a fourth signal range 19 .
- the types of bases include first types, Second kind, third kind and fourth kind,
- the first signal range 13 indicates that the first type of base does not emit light in neither the first substrate reagent nor the second substrate reagent;
- the second signal range 15 indicates that the second type of base does not emit light in the first substrate reagent but emits light in the second substrate reagent;
- the third signal range 17 indicates that the third type of base emits light in the first substrate reagent but does not emit light in the second substrate reagent;
- the fourth signal range 19 indicates that the fourth type of base emits light in both the first substrate reagent and the second substrate reagent. In this way, specific base types can be determined in biochemical reactions.
- the target template 11 determined by the data output by the first core row 44 includes a first signal range 13, a second signal range 15, a third signal range 17 and a fourth signal range 19.
- the four bases are in the first
- the reaction in the substrate reagent or the second substrate reagent shows different luminescence intensities, and two signals of luminescence or non-luminescence are output through the signal range defined by the target template 11, and the reaction results of each base in the two substrate reagents is unique, so the type of current base can be accurately determined.
- sequencing methods include:
- the data output by the remaining core rows 44 are classified into the signal range of the target template 11 through the intercept classification algorithm. In this way, sequencing errors can be reduced and more accurate sequencing results can be obtained.
- a signal scatter diagram can be obtained on the coordinate axes.
- the two coordinate axes represent the luminescence intensity of the base in the two substrate reagents.
- the scattered points in the coordinate axis are mostly concentrated in four areas according to the luminescence of the bases.
- a circular (or oval) area representing four different bases is drawn, which is the signal range defined by the target template 11, as shown in Figure 4: the first signal range 13, the second signal range 15, and the third signal range 17. and fourth signal range 19.
- the target template 11 obtained from the first core row 44 is used as a reference, and the intercept classification algorithm is used to classify the data output by the remaining core rows 44 into the signal range of the target template 11, that is, according to the read Compare the position of the data on the coordinates with the distance (i.e. intercept) between the center of each circle (or ellipse), and classify the point into the range of the circle (or ellipse) with the shortest intercept, which can be reduced Sequencing errors and obtain more accurate sequencing results.
- the reduction process includes binarization. In this way, the amount of data calculation can be reduced while sequencing results can be accurately output.
- the different luminescence intensities of the read bases are simplified into outputting two signals: luminescence or non-luminescence, with "0" representing non-luminescence and "1” representing luminescence.
- the base reacts in two substrate reagents and can output two luminous or non-luminous light signal data, marked as "00", "01", "10” or "11".
- Different light signal data correspond to Different types of bases are shown in Figure 5.
- each pixel finally outputs a 1-bit data.
- the 2-bit data corresponds to one of the four bases of "AGCT", which greatly reduces the pressure of data transmission and processing.
- the simplification processing can also be other simplification processing, and is not limited to binary processing.
- the simplification processing can be understood as processing the original data to reduce the output of the data amount. Binarization is not limited to being represented by 0 and 1, and can also be represented by other numerical values or symbols.
- the semiconductor sequencing chip 14 loads the base sequence clusters/balls of the starting fragments that have undergone the same amplification process row by row under the control of the robot 10, and then sequentially immerses the sequencing reagent with the core row 44 as the immersion unit.
- the semiconductor sequencing chip 14 is immersed in the first substrate reagent with the core row 44 as the immersion unit, the 1-bit optical signal data of all core rows is read out.
- the first core row of the semiconductor sequencing chip 14 is immersed in the second substrate reagent At this time, all 2-bit data of the first core row of the semiconductor sequencing chip 14 is read, thereby obtaining the target template 11.
- the data of the second core row falls into the signal range of the target template 11, and the output 1-bit optical signal data is consistent with the second core row of the semiconductor sequencing chip 14.
- the 1-bit optical signal data output by the two core rows immersed in the first substrate reagent is combined to form 2-bit optical signal data "00", "01", “10” or "11", corresponding to one of the four bases of "AGCT" In this way, the optical signal data of the bases on the second core row of the semiconductor sequencing chip 14 can be obtained.
- the semiconductor sequencing chip 14 As the semiconductor sequencing chip 14 is immersed in the second substrate reagent with the core row 44 as the immersion unit, biochemical reactions are performed and data is read until all the core rows 44 of the entire semiconductor sequencing chip 14 are immersed in the second substrate reagent and the reading is completed. At this time, the optical signal data of the first base of all base sequence clusters/spheres loaded on the semiconductor sequencing chip 14 is read out, and one cycle of sequencing ends at this time. Through the above sequencing process, multiple sequencing cycles are performed until the optical signal data of each base on the entire base sequence cluster/ball is read out, and one sequencing is completed.
- the gene sequencing system includes a row switching and common reading unit 34, the row switching and common reading unit 34 connects all core rows 44, and the sequencing method includes:
- the data channel switching and data reading of the core 42 immersed in the reagent are controlled by the row switching and common reading unit 34 .
- the data of the semiconductor sequencing chip 14 can be read out line by line, reducing the difficulty of the computer system in reading, wiring, transmitting, caching, and processing data.
- 69 cores 42 are distributed in an array on a semiconductor sequencing chip 14. By processing the circuits in the cores 42, the 69 cores 42 are divided into 9 core rows 44, and one or more cores 42 of each unit of the core row 44 read data simultaneously.
- the row switching and common reading unit 34 may include a row switching and common reading circuit. The row switching and common reading unit 34 is connected to all core rows 44, and the image signals are read out row by row through the row switching and common reading unit 34. During control Under channel interaction, data is read at least once every time at least one unit of core row 44 is immersed.
- the system load of sequential reading is reduced to 16% of that of parallel full-wafer (semiconductor sequencing chip 14) reading.
- compressing the 10-bit digital quantity of each pixel by 2 bits can represent the four bases of ATCG.
- the data processing of the first immersed core the data of all cores of the full wafer (semiconductor sequencing chip 14) is processed. After normalization, base sequence results are obtained using a simplified algorithm. Therefore, the transmission, calculation, and storage of the system are extremely simple and the cost is very low.
- sequencing methods include:
- the enzyme carried by the core 42 of the semiconductor sequencing chip 14 through the pre-sequence biochemical reaction begins to generate a signal after contacting the substrate, and its signal curve is related to temperature, Factors such as time are strongly related.
- the robot 10 clamps the immersion reagent of the semiconductor sequencing chip 14 with the core row 44 as the immersion unit, and controls the time difference of the time each core row 44 is immersed in the reagent to not exceed the preset range. , so that the time when the biochemical reaction occurs on the core row 44 of each unit is kept uniform.
- sequencing methods include:
- the time difference of the data reading time of the core row 44 of each unit is controlled not to exceed a preset range. In this way, it can be ensured that the signals obtained by all core rows 44 are relatively uniform.
- the image processing module 24 is controlled to set the same data reading time, and the time difference between the time when the core row 44 of each unit is immersed in the biochemical reaction in the reagent does not exceed the preset range, and the core row 44 of each unit is Collecting signals at a fixed time can ensure that the signals obtained by all core rows 44 are relatively uniform.
- the data reading time set by the control image processing module 24 can also be controlled not to exceed a preset time range.
- the time and preset range of the biochemical reaction of each immersed core row 44 in the reagent determine the The image processing module 24 collects signals from the immersed part within the preset time range, which can ensure that all core rows 44 can collect data.
- the preset range is 0-1s. If the immersion time of the core row is 10s, the preset time range of the data reading time is 10s-11s, so that the time difference between the immersion time of the core row and the data reading time is satisfied. exceeds the preset range.
- the sequencing method before the semiconductor sequencing chip 14 is immersed in the reagent, the sequencing method further includes:
- the entire semiconductor sequencing chip 14 is divided into multiple regions 48 parallel to the reagent tank 12.
- Each region 48 includes a core row 44, and the corresponding relationship between each region 48 and the core row 44 of each unit is stored. In this way, the movement amount and the number of movements of each movement of the robot 10 clamping the semiconductor sequencing chip 14 are determined.
- the entire semiconductor sequencing chip 14 is divided into multiple areas 48 in a manner parallel to the reagent tank 12 .
- Each area 48 includes a core row 44 , and each area 48 is immersed once the robot 10 moves. The part where biochemical reactions occur with reagents and data is read.
- the movement amount and the number of movements of each movement of the robot 10 holding the semiconductor sequencing chip 14 during the operation of the entire semiconductor sequencing chip 14 are determined.
- the entire semiconductor sequencing chip 14 is divided into seven regions.
- the entire semiconductor sequencing chip 14 is divided into multiple areas 48.
- Each area 48 can be multiple core rows 44. By changing its circuit logic, the multiple core rows 44 serve as one area 48. 10. Control the biochemical reaction and read data in one action of immersing the reagent.
- the robot 10 is used to control the semiconductor sequencing chip 14 to be immersed in the reagent, the continuous exposure time of each unit's core row 44 is controlled through the first timing, and the movement time of the robot 10 is controlled through the second timing.
- the first timing There is a waiting time before and after the continuous exposure time of the core row 44 of each unit and each movement time of the manipulator 10 in the second timing sequence. In this way, ensuring that the manipulator 10 is in a stationary state during the exposure time can make the image signal clearer and more accurate.
- Figure 11 is a timing diagram intercepted during a certain period of time during the test.
- the exposure time is determined by the biochemical reaction light intensity and signal-to-noise ratio.
- the exposure time is obtained through preliminary experimental testing and calculation, and is controlled by the first timing sequence.
- a waiting time is set before and after each exposure time (the high level part of the first sequence), and the movement time of the manipulator 10 controlled by the second sequence is set in the two waiting times.
- the movement of the manipulator 10 is controlled through three timing sequences so that the manipulator 10 remains stationary during the exposure time, thereby obtaining a clear and accurate image signal.
- the data transmission of each unit of core row 44 is controlled through a third timing sequence in which the data transmission time of each unit of core row 44 is after the continuous exposure time of the corresponding core row 44. In this way, data is read after each core row 44 is immersed, thereby realizing joint control of immersion and reading, thereby reducing the transmission, caching, and processing loads of the system.
- the robot 10 holds the semiconductor sequencing chip 14 and moves row by row in the direction vertical to the reagent tank 12.
- the exposure control is switched to the immersed reagent area. 48.
- the image signal is read out through the row switching and common reading unit 34, and the data passes through the row switching circuit logic and is uploaded to the data channel step by step, thereby reducing the transmission, buffering, and processing loads of the system.
- each core 42 includes a pixel array 51, and the pixel array 51 is a single pixel array 51, or the pixel array 51 is composed of at least two sub-pixel arrays 50 spliced through stitching techniques.
- FIG. 10 which is a multi-core spliced semiconductor sequencing chip 14 .
- the pixel array 51 occupies the center of the core 42, while peripheral circuits include a phase locked loop, a decoder controller and a digital buffer 52, a correlated double sampling circuit and a comparator 54, a readout circuit 56 (including Digital and analog processors, decoders and charging mode readout circuits), decoders and drivers 58 are located around the pixel array 51 or on another wafer to be bonded with it, as shown in Figure 9.
- the pixel array 51 may be spliced by several sub-pixel arrays 50 .
- the pixel array 51 is formed by stitching four sub-pixel arrays 50 through stitching techniques, thereby enabling a larger chip area to be obtained.
- one pixel array 51 includes r*c pixels (Pixels).
- the data output by the core row 44 is read at least once, reducing from reading all cores in parallel to reading all core rows 44, through the base in two
- the luminescence reaction in the substrate reagent can output the type of base through a simple light signal. Compressing the 10-bit digital quantity of each pixel by 2 bits can characterize the four bases of ATCG, and the base sequence results can be obtained using a simplified algorithm. . Therefore, the transmission, calculation, and storage of the system are extremely simple and the cost is very low.
- the embodiment of the present invention also provides a data processing system 100 for use in a gene sequencing system.
- the data processing system 100 includes a semiconductor sequencing chip 14, a control device 26 and a robot 10.
- the semiconductor sequencing chip 14 includes a processing module and a plurality of cores 42.
- the plurality of cores 42 are distributed in an array to form a plurality of core rows 44.
- the processing module connects the core rows 44 and the control device 26.
- the control device 26 is connected to the robot 10 and the control device 26 Used for: controlling the manipulator 10 to immerse the semiconductor sequencing chip 14 into the reagents used in the sequencing reaction in a immersion manner with the core row 44 as the immersion unit.
- the reagents include at least two substrate reagents.
- the processing module is used to: based on each N unit of the core row immersed in the substrate reagent, where N>0, read all the units immersed in the reagent at least once. The data output by the core rows is continued until all core rows have been immersed in the substrate reagent.
- the above-mentioned data processing system 100 immerses the semiconductor sequencing chip 14 in the reagents used in the sequencing reaction through the joint control of immersion and reading, with the core row 44 as the immersion unit.
- the bases carried on the semiconductor sequencing chip 14 pass through the reagents.
- the biochemical reactions that occur in the cell can show different light intensities. For example, in a certain substrate reagent that makes the bases emit light, only two of the four bases of ATCG emit light or show stronger light intensities. In another substrate reagent that makes bases emit light, these two bases do not emit light or show weak light intensity, while the light intensity of the other two bases is opposite. Therefore, through two substrate reagents that make the bases emit light, the type of the base can be determined based on the light intensity of the four ATCG bases.
- the target template 11 is determined using the data output by the first core row 44, so that the data output by the remaining core rows 44 are simplified and processed to obtain optical signal data of different bases. Reduce the system load of sequential reading, realize system transmission, calculation, and storage relatively simply, and also reduce costs.
- the data output by the first core row 44 is used to determine the target template 11.
- the target template 11 includes the signal range of whether the base emits light. Using the target The template 11 simplifies the data output by the remaining core rows 44 to obtain optical signal data of different bases, and the control device 26 is used to determine the base type based on the optical signal data. In this way, the target template 11 can be used to simplify data processing and obtain optical signal data, thereby determining the base type.
- all base sequence clusters/balls loaded on the semiconductor sequencing chip 14 undergo a biochemical reaction with the substrate reagents.
- all base sequence clusters/balls are The 2-bit optical signal data of the first base of the base sequence cluster/ball is read out respectively, the data output by the core row 44 of the immersion unit is read, and a target template 11 is determined by using the data output by the first core row 44.
- the target template 11 limits the signal range of whether the base emits light.
- the target template 11 is used to simplify the data output by the remaining core rows 44 to obtain and output the optical signal data of different bases.
- the 2-bit optical signal data "00” can be obtained ”, “01”, “10” or “11”, corresponding to one of the four bases of “AGCT”, the control device 26 reads and displays the specific base type or base sequence through the output 2-bit optical signal data .
- the optical signal data of each base of all base sequence clusters/balls loaded on the semiconductor sequencing chip 14 can be obtained sequentially. At this time, the base sequence of the sequenced gene can be obtained, and the process is completed. Sequencing process.
- the reagents include a first substrate reagent and a second substrate reagent, and when the semiconductor sequencing chip 14 is immersed in the first substrate reagent or the second substrate reagent, two of the bases are caused to emit light, and in addition Both bases do not emit light. In this way, four different bases can be distinguished through the luminescence of the two channels.
- the target template 11 includes a first signal range 13, a second signal range 15, a third signal range 17 and a fourth signal range 19, and the types of bases include the first type, the second type, the third type. kind and the fourth kind,
- the first signal range 13 indicates that the first type of base does not emit light in both the first substrate reagent and the second substrate reagent;
- the second signal range 15 indicates that the second type of base does not emit light in the first substrate reagent but emits light in the second substrate reagent;
- the third signal range 17 indicates that the third type of base emits light in the first substrate reagent but does not emit light in the second substrate reagent;
- the fourth signal range 19 indicates that the fourth type of base emits light in both the first substrate reagent and the second substrate reagent. In this way, specific base types can be determined in biochemical reactions.
- the processing module is also used to:
- the data output by the remaining core rows 44 are classified into the signal range of the target template 11 through the intercept classification algorithm. In this way, sequencing errors can be reduced and more accurate sequencing results can be obtained.
- the reduction process includes binarization. In this way, the amount of data calculation can be reduced while sequencing results can be accurately output.
- control device 26 includes an image processing module 24 connected to the semiconductor sequencing chip 14.
- the image processing module 24 includes a row switching and common reading unit 34.
- the row switching and common reading unit 34 connects all core rows 44.
- the control device 26 is used to control the data channel switching and data reading of the core row 44 immersed in the reagent through the row switching and common reading unit 34 . In this way, joint control of immersion and reading of the semiconductor sequencing chip 14 can be achieved, thereby reducing the difficulty of the computer system in reading, wiring, transmitting, caching, and processing data.
- control device 26 includes a mechanical control module 16 connected to the robot 10 .
- the mechanical control module 16 is used to control the robot 10 so that the time difference between the core rows 44 of each unit immersed in the reagent does not exceed a preset range. In this way, the time at which biochemical reactions occur on the core row 44 of each unit is kept uniform.
- the mechanical control module 16 controls the robot arm 10 to clamp the semiconductor sequencing chip 14 and controls the immersion reagent of the semiconductor sequencing chip 14 with the core row 44 as the immersion unit.
- the movement time of the robot 10 can be set.
- the robot 10 clamps the immersion reagent of the semiconductor sequencing chip 14 with the core row 44 as the immersion unit, and controls the time difference of the immersion time of all core rows 44 in the reagent to not exceed the preset value. range, so that the time when the biochemical reaction occurs on the core row 44 of each unit remains uniform.
- control device 26 includes an image processing module 24 connected to the semiconductor sequencing chip 14.
- the image processing module 24 includes a row switching and common reading unit 34.
- the row switching and common reading unit 34 is used to control each unit.
- the time difference of the data reading time of core row 44 does not exceed the preset range. In this way, it can be ensured that the signals obtained by all core rows 44 are relatively uniform.
- the image processing module 24 is controlled to set the same data reading time.
- the time difference between the biochemical reaction time of each immersed core row 44 in the reagent does not exceed the preset range.
- Each row of chip cores 42 is fixed after being immersed. Time acquisition of signals can ensure that the signals obtained by all core rows 44 are relatively uniform.
- the control device 26 is further used to divide the entire semiconductor sequencing chip 14 into a plurality of regions 48 in a manner parallel to the reagent tank 12, each region 48 includes a unit of core row 44, and stores the correspondence between each region 48 and each unit of core row 44. In this way, by dividing the plurality of regions 48, the gene sequencing system can determine the movement amount and number of movements each time when the manipulator 10 moves while holding the semiconductor sequencing chip 14.
- the entire semiconductor sequencing chip 14 is divided into multiple regions 48 parallel to the reagent tank 12 , and each region 48 is a part of the robot 10 that moves once and is immersed in the reagent. By dividing the area 48, the movement amount and the number of movements of each movement of the robot 10 clamping the semiconductor sequencing chip 14 during the operation of the entire semiconductor sequencing chip 14 are determined.
- control device 26 includes an image processing module 24 connected to the semiconductor sequencing chip 14.
- the image processing module 24 includes a row switching and common reading unit 34.
- the row switching and common reading unit 34 is used to pass the first timing sequence.
- the continuous exposure time of the core row 44 of each unit is controlled, and the movement time of the manipulator 10 is controlled through the second timing sequence.
- the continuous exposure time of the core row 44 of each unit in the first timing sequence is before and after the continuous exposure time of the manipulator 10 in the second timing sequence.
- Each exercise time is separated by a waiting time. In this way, the manipulator 10 remains stationary during the exposure time, which can make the image signal clearer and more accurate.
- the exposure time is determined by the biochemical reaction light intensity and signal-to-noise ratio.
- the exposure time is obtained through preliminary experimental testing and calculation.
- the continuous exposure time of the core row 44 of each unit is controlled by the first timing.
- a waiting time is set before and after, and the movement time of the manipulator 10 controlled by the second timing is set in the two waiting times.
- the manipulator 10 is controlled to remain stationary during the exposure time, thereby obtaining a clear and accurate image signal.
- the row switching and common reading unit 34 is also used to control the data transmission of the core row 44 of each unit through a third timing sequence.
- the data transmission time of the core row 44 of each unit is within After the continuous exposure time of the corresponding core row 44. In this way, data is read after each core row 44 is immersed, thereby realizing joint control of immersion and reading, thereby reducing the transmission, caching, and processing loads of the system.
- the robot 10 holds the semiconductor sequencing chip 14 and moves row by row in the direction vertical to the reagent tank 12.
- the exposure control is switched to the immersed reagent area. 48.
- the image signal is read out through the row switching and common reading unit 34, and the data passes through the row switching circuit logic and is uploaded to the data channel step by step, thereby reducing the transmission, buffering, and processing loads of the system.
- the sequencing method of the semiconductor sequencing chip 14 is used in a gene sequencing system.
- the immersion reagent reaction and synchronous readout of the semiconductor sequencing chip 14 with the core row 44 as the immersion unit are realized. data, and then calculate the read data, and limit the signal range of subsequent read data based on the target template 11 obtained from the first core row 44 data, thereby outputting a 2-bit signal to determine the base type, and load the system for sequential reading
- Reduced, the transmission, calculation, and storage of the system are realized relatively simply, and the cost is also reduced.
- the embodiment of the present invention also provides a gene sequencing system.
- the above description of the data processing system implementation and beneficial effects of the semiconductor sequencing chip 14 is also applicable to the gene sequencing system of the embodiment of the present invention. In order to avoid redundancy, in This will not be discussed in detail.
- the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation.
- the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
- the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic disc, optical disk), including several instructions to cause a terminal device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods of various embodiments of the present application.
Landscapes
- Chemical & Material Sciences (AREA)
- Organic Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Microbiology (AREA)
- Biochemistry (AREA)
- Physics & Mathematics (AREA)
- Molecular Biology (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Analytical Chemistry (AREA)
- Immunology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
- Apparatus Associated With Microorganisms And Enzymes (AREA)
Abstract
一种半导体测序芯片的测序方法、数据处理系统和基因测序系统,半导体测序芯片包括多个核心排列形成的多个核心行,测序方法包括:控制半导体测序芯片以核心行为浸入单位的浸入方式浸入每轮测序反应所用的试剂中,在半导体测序芯片浸入底物试剂中时,使相应的碱基发光或不发光;在底物试剂中每浸入至少一个单位的核心行时,读取至少一次浸入试剂中的核心行输出的数据,直至将所有核心行浸入至底物试剂中,利用第一个核心行输出的数据确定目标模板限定碱基是否发光的信号范围利用目标模板对其余核心行输出数据进行简化处理获取光信号数据;根据光信号数据确定碱基种类。
Description
本发明涉及基因测序技术领域,具体而言,涉及一种测序方法、一种数据处理系统和一种基因测序系统。
现有高通量基因测序仪所用到测序基材(也称半导体测序芯片),常见有表面式芯片和半导体测序芯片集成电路两种。前者通常通过显微镜光学系统捕获荧光信号,半导体测序芯片通过内部的集成电路完成电信号或者光信号的采集和模数转换。
在利用半导体测序芯片测序时,读取所有像素输出的信号,并处理这些信号以获取一张图片,通过复杂的图像处理算法,对散点图进行分群,从而辨识出ATCG四种不同的碱基。然而,图像处理算法会消耗大量的计算资源,因此高通量、甚至超高通量的数据产生时,计算负载也逐步变得不可接受。
发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。为此,本发明在于提出一种测序方法、一种数据处理系统和一种基因测序系统,所述测序方法简单,可以大大减少数据处理量,进而可极大降低传输和后端服务器处理的数据量。
本发明实施方式提供一种测序方法,用于基因测序系统,所述半导体测序芯片包括多个核心,所述多个核心排列形成有多个核心行,所述测序方法包括:
控制所述半导体测序芯片以核心行为接触单位的方式接触每轮测序反应所用的试剂中,基于所述半导体测序芯片接触所述试剂中,使相应的碱基发光或不发光;
基于所述试剂中每接触N个单位的所述核心行,N>0,读取至少一次接触所述试剂中的所述核心行输出的数据,直至将所有核心行接触至所述试剂中,
其中,在读取第一个核心行输出的数据后,根据所述第一个核心行输出的数据确定目标模板,所述目标模板包括碱基是否发光的信号范围;
根据所述目标模板对其余核心行输出的数据进行简化处理以获取不同碱基的光信号数据;
根据所述光信号数据确定碱基种类。
上述测序方法,以以核心行为接触单位的接触方式接触核心行,并根据所述第一个核心行输出的数据确定目标模板,以对其余核心行输出的数据进行简化处理以获取不同 碱基的光信号数据,可以大大减少其余核心行的数据处理量,进而可极大降低传输和处理的数据量。
本发明实施方式提供一种数据处理系统,用于基因测序系统,所述数据处理系统包括半导体测序芯片、控制装置和机械手,所述半导体测序芯片包括处理模块和多个核心,所述多个核心呈阵列式分布以形成多个核心行,所述处理模块连接所述核心行和所述控制装置,所述控制装置连接所述机械手,
所述控制装置用于:
控制所述机械手将所述半导体测序芯片以核心行为接触单位的方式接触测序反应所用的试剂中,基于所述半导体测序芯片接触所述试剂中,使相应的碱基发光或不发光;
所述处理模块用于:
基于所述试剂中每接触N个单位的所述核心行,N>0,读取至少一次接触所述试剂中的所述核心行输出的数据,直至将所有核心行接触至所述试剂中。
本发明实施方式提供的一种基因测序系统,包括上述数据处理系统。
上述数据处理系统和基因测序系统,以以核心行为接触单位的接触方式接触核心行,并根据所述第一个核心行输出的数据确定目标模板,以对其余核心行输出的数据进行简化处理以获取不同碱基的光信号数据,可以大大减少其余核心行的数据处理量,进而可极大降低传输和处理的数据量。
本发明的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
本发明的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:
图1为本发明实施方式的测序方法的流程图;
图2为本发明实施方式的半导体测序芯片的数据处理系统的模块示意图;
图3为本发明实施方式的半导体测序芯片的数据处理系统的图像控制模块的模块示意图;
图4为本发明实施方式的测序方法的目标模板示意图;
图5为本发明实施方式的光信号数据和碱基种类对应示意图;
图6为本发明实施方式的半导体测序芯片的结构示意图;
图7为本发明实施方式的半导体测序芯片的区域划分的示意图;
图8为本发明实施方式的半导体测序芯片核心的结构示意图;
图9为本发明实施方式的半导体测序芯片核心的另一结构示意图;
图10为本发明实施方式的半导体测序芯片核心的再一结构示意图;
图11为本发明实施方式的数据读取的时序图;
附图标记:
机械手10,目标模板11,试剂槽12,第一信号范围13,半导体测序芯片14,
第二信号范围15,机械控制模块16,第三信号范围17,流体控制模块18,
第四信号范围19,温度控制模块20,环境控制模块22,图像处理模块24,
控制装置26,显示屏28,控制模块30,电源板32,行切换及公共读取单元34,
主控板36,驱动板40,核心42,核心行44,区域48,子像素阵列50,
像素阵列51,锁相环、解码器控制器和数字缓冲器52,
相关双采样电路和比较器54,读出电路56,解码器和驱动程序58,
数据处理系统100。
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。
请参阅图1、图2和图6,本发明实施方式提供的一种半导体测序芯片14的测序方法,用于基因测序系统,半导体测序芯片14包括多个核心42,多个核心42排列形成有多个核心行44。
其中,半导体测序芯片144可以包括测序面和背板,也可以包括两个测序面,该测序面包括所述核心42,而背板为不包含核心42的面。测序面与试剂可通过“接触”的动作完成接触的目的,所述接触的动作可以有几种方式,但不限于此:试剂通过流动、展开等方式与固定的测序面实现接触的目的;通过移动、转动等方式将测序面浸入试剂的方式实现接触的目的;测序面和试剂通过相对运动而方式实现接触的目的。上述方式均可实现测序面与试剂的接触,应理解为根据实际使用场景进行替换。为便于解释本公开的实质内容,以“通过移动的方式将测序面浸入试剂的方式实现接触的目的为例”进行说明。
测序方法包括:
步骤101,控制半导体测序芯片14以核心行44为浸入单位的浸入方式浸入每轮测序反应所用的试剂中,试剂包括至少两个底物试剂,在半导体测序芯片14浸入底物试剂中时, 使相应的碱基发光或不发光;
步骤103,在底物试剂中每浸入N个单位的核心行44时(N>0),读取至少一次浸入试剂中的核心行44输出的数据,直至将所有核心行44浸入至底物试剂中,
其中,在读取第一个核心行44输出的数据后,利用第一个核心行44输出的数据确定目标模板11,目标模板11包括碱基是否发光的信号范围;
步骤105,利用目标模板11对其余核心行44输出的数据进行简化处理以获取不同碱基的光信号数据;
步骤107,根据光信号数据确定碱基种类。
上述测序方法中,以核心行44为浸入单位的方式浸入半导体测序芯片14,并利用第一个核心行44输出的数据确定目标模板11,以对其余核心行44输出的数据进行简化处理以获取不同碱基的光信号数据,可极大降低传输和处理的数据量。
半导体测序芯片14包括多个核心42,多个核心42排列形成有多个核心行44,测序进行初始,在最早浸入试剂的核心行44上载入不同扩增起始片段经同样扩增过程后的碱基序列簇/球。在一个例子中,第一核心行44上仅有一个核心42,在该核心42上载入不同扩增起始片段(如insert size为50bp、100bp、200bp、300bp)经同样扩增过程后的碱基序列簇/球。通过此步骤,半导体测序芯片14的第一核心行44在生化反应后,即可得出一个目标模板11。
在测序过程中,半导体测序芯片14以核心行44为浸入单位的方式浸入每轮测序反应所用的试剂中,试剂包括至少两个底物试剂,在半导体测序芯片14浸入底物试剂中时,底物试剂使不同的碱基种类表现出不同发光强度,在底物试剂中每浸入N个(N>0)单位的核心行44时,就读取该核心行44输出的光信号数据,利用目标模板11对其余核心行44输出的数据进行简化处理以获取不同碱基的光信号数据,通过半导体测序芯片14输出的光信号数据,即可确定碱基种类。
每次浸入试剂的浸入单位为N,N的数值代表每一次进入试剂的核心行44的数量。当N=1时,每次浸入一个核心行44,读取一次半导体测序芯片14输出的数据,可以使N为大于1的数值,此时每次浸入N个核心行44时,读取一次半导体测序芯片14输出的数据,以提升数据读取的速度。或者对于兼容读取速度较慢的设备,可以使N=0.5,此时浸入一个核心行44,进行两次数据读取。对于每次浸入试剂的核心行44的个数和每一次读取信号的次数都需要依据实际情况设定,在此不做具体限定。
在一个实施方式中,试剂包括两个底物试剂,在半导体测序芯片14的第一个核心行44输出两个底物试剂的数据后(即两个通道数据),即可得到一个目标模板11,如图4所示, 利用目标模板11包括碱基是否发光的信号范围(如图4中虚线框限定的范围)并对其余核心行44输出的数据进行简化处理以获取不同碱基的光信号数据。通过输出的光信号数据即可确定碱基的种类。
请参阅图2,本发明实施方式的数据处理系统100,包括半导体测序芯片14、控制装置26和机械手10。试剂可以放置在试剂槽12中。机械手10、试剂槽12和半导体测序芯片14可以放置在一个密闭空间,基因测序的一系列生化反应在密闭空间进行。试剂槽12可以有一个或多个,每个试剂槽12中盛有基因测序需要的试剂。在试剂槽12中发生的生化反应,需要保持一定的温度,时间和环境,这些条件由控制装置26进行控制。
半导体测序芯片14包括处理模块(未示出)和多个核心42,多个核心42呈阵列式分布以形成多个核心行44,处理模块连接核心行44和控制装置26,处理模块在每一个单位的核心行44浸入底物试剂后,读取该核心行44输出的数据,利用第一个核心行44输出的数据确定一个目标模板11,通过目标模板11限定碱基是否发光的信号范围,利用目标模板11对其余核心行44输出的数据进行简化处理以获取并输出不同碱基的光信号数据。
控制装置26可包括显示屏28和控制模块30。显示屏28可以是触摸显示屏,通过显示屏28实现操作人员对整个基因测序流程的控制。控制模块30可包括机械控制模块16、流体控制模块18、温度控制模块20,环境控制模块22和图像处理模块24。
其中,机械控制模块16用于控制机械手10夹持半导体测序芯片14运动,控制半导体测序芯片14以核心行44为浸入单位的浸入方式浸入试剂槽12的试剂中,机械控制模块16可以控制半导体测序芯片14在试剂槽12中的反应时间,同时机械控制模块16可以将机械手10的运动速度的控制在一个适当值范围内,能够减少半导体测序芯片14在进出试剂槽时带出试剂的量,同时机械手10适当的运动速度也能够减少试剂中气泡的产生量。
流体控制模块18用于监测试剂槽12中的试剂含量,通过对试剂质量的检测,流体控制模块18能够监控到试剂槽中试剂含量的变化,然后负责控制相关水泵和阀以补充并循环试剂,以保持试剂中生化反应所需的关键反应物的含量维持在一定水平。
温度控制模块20用于通过温度传感器监测、控制密闭空间内的温度保持在生化反应所需的适宜温度。
环境控制模块22用于监测并控制密闭空间中各种主要气体的含量,通过充入氮气或其他方式,保证生化反应在低氧环境下进行。
请参阅图3,图像处理模块24可通过接口板38连接半导体测序芯片14,图像处理模块24包括行切换及公共读取单元34、主控板36和驱动板40,通过行切换及公共读取单元34逐行将半导体测序芯片14采集到的图像信号读出,传至后端硬盘进行存储,行切换及公共 读取单元34还可以和主控板36进行通讯,双向传输数据。电源板32为图像处理模块24供电,主控板36进行数据收集、运算和指令输出。通过驱动板40连接显示器输出。
在某些实施方式中,试剂包括第一底物试剂和第二底物试剂,在半导体测序芯片14浸入第一底物试剂中或第二底物试剂时,使其中两种碱基发光,另外两种碱基不发光。如此,通过两条通道的发光情况即可区分出四种不同碱基。
具体地,半导体测序芯片14浸入第一底物试剂中或第二底物试剂时,在第一底物试剂中或第二底物试剂中发生生化反应,表现为在反应中碱基的发光强度不同,通过目标模板11限定发光的信号范围,即可得到碱基在第一底物试剂或第二底物试剂的发光情况,由此来区分不同的碱基。
在某些实施方式中,请参阅图4,目标模板11包括第一信号范围13、第二信号范围15、第三信号范围17和第四信号范围19,碱基的种类包括第一种类、第二种类、第三种类和第四种类,
其中,第一信号范围13表示第一种类的碱基在第一底物试剂和第二底物试剂中均不发光;
第二信号范围15表示第二种类的碱基在第一底物试剂不发光,在第二底物试剂中发光;
第三信号范围17表示第三种类的碱基在第一底物试剂发光,在第二底物试剂中不发光;
第四信号范围19表示第四种类的碱基在第一底物试剂和第二底物试剂中均发光。如此,即可在生化反应中确定具体的碱基种类。
具体地,通过第一核心行44输出的数据所确定的目标模板11包括第一信号范围13、第二信号范围15、第三信号范围17和第四信号范围19,四种碱基在第一底物试剂或第二底物试剂中反应表现出不同发光强度,通过目标模板11限定的信号范围输出发光或者不发光两种信号,且每一种碱基在两种底物试剂中的反应结果是唯一的,由此可以准确确定当前碱基的种类。
在某些实施方式中,测序方法包括:
通过截距归类算法将其余核心行44输出的数据归入目标模板11的信号范围。如此,即可减小测序误差,得到更加准确的测序结果。
具体地,通过第一核心行44在试剂中完成反应后读出的数据,能够在坐标轴上得到一个信号散点图,两条坐标轴表示碱基在两种底物试剂中的发光强度,坐标轴中的散点根据碱基发光情况不同大多集中在四个区域,通过对四个散点集中区域进行分析处理,沿每一集中区域的外圈散点进行描边,可以在图上区分出代表四种不同碱基的圆形(或椭圆形)区域,即为目标模板11限定的信号范围,如图4所示的第一信号范围13、第二信号范围15、第三 信号范围17和第四信号范围19。在读取其余行数据时,均用第一核心行44得到的目标模板11作为参照,使用截距归类算法将其余核心行44输出的数据归入目标模板11的信号范围,即根据读取的数据在坐标上的位置和每个圆(或椭圆)的中心之间的距离(即截距)做比较,将该点归入截距最短的圆(或椭圆)范围中,即可减小测序误差,得到更加准确的测序结果。
在某些实施方式中,简化处理包括二值化处理。如此,能够减小数据运算量,同时精确输出测序结果。
具体地,通过目标模板11限定的信号范围,将读取的碱基的不同发光强度简化为输出发光或者不发光两种信号,用“0”表示不发光,用“1”表示发光。由此,碱基在两种底物试剂中反应,可输出两次发光或者不发光的光信号数据,标记为“00”、“01”、“10”或“11”,不同光信号数据对应不同种类碱基,如图5所示。这样每个像素最终输出一个1bit数据,当两种底物反应后得到一个2bit数据,2bit数据对应着“AGCT”四种碱基中的一种,大大减少了数据传输处理压力。可以理解,在其他实施方式中,简化处理还可以是其他简化的处理,而不限于二值化处理,简化处理可以理解为对原始数据进行处理以减少数据量的输出。二值化也不限于用0和1来表示,也可以用其他数值或符号来表示。
在一个例子中,半导体测序芯片14在机械手10的控制下逐行载入起始片段经同样扩增过程后的碱基序列簇/球,然后依次以核心行44为浸入单位的浸入测序试剂,当半导体测序芯片14以核心行44为浸入单位的浸入第一底物试剂时,所有核心行的1bit光信号数据被读出,当半导体测序芯片14的第一个核心行浸入第二底物试剂时,半导体测序芯片14的第一个核心行的全部2bit数据被读出,由此得到目标模板11。
当半导体测序芯片14的第二个核心行浸入第二底物试剂时,此时第二个核心行的数据归入目标模板11的信号范围,输出的1bit光信号数据与半导体测序芯片14的第二个核心行浸入第一底物试剂输出的1bit光信号数据结合,组成2bit光信号数据“00”、“01”、“10”或“11”,对应“AGCT”四种碱基中的一种,即可得到半导体测序芯片14的第二个核心行上碱基的光信号数据。
随着半导体测序芯片14以核心行44为浸入单位的浸入第二底物试剂、进行生化反应和读取数据直至整颗半导体测序芯片14全部核心行44浸入第二底物试剂并读取完成,此时载入半导体测序芯片14上载入的全部碱基序列簇/球的第一位碱基的光信号数据被读出,此时测序的一个循环结束。通过上述测序流程进行多次测序循环,直至全部碱基序列簇/球上每一位碱基的光信号数据被读出,一次测序完成。
在某些实施方式中,基因测序系统包括行切换及公共读取单元34,行切换及公共读取单元34,行切换及公共读取单元34连接所有核心行44,测序方法包括:
通过行切换及公共读取单元34控制浸入试剂中的核心42的数据通道切换及数据读取。如此,能够使半导体测序芯片14的数据被逐行读出,减少计算机系统读取、走线、传输、缓存、处理数据的难度。
具体地,首先定义一张晶圆上通过光罩制造出来的多颗芯片核心42之间的逻辑关系,在一个例子中,请参阅图6,在一个半导体测序芯片14上阵列分布的69颗核心42,通过对核心42中的电路进行处理,将69颗核心42分成9个核心行44,每个单位的核心行44的一个或多个核心42同时读取数据。行切换及公共读取单元34可包括行切换及公共读取电路,行切换及公共读取单元34连接所有核心行44,通过行切换及公共读取单元34逐行将图像信号读出,在控制通道的交互下,每浸入至少一个单位的核心行44,则读取至少一次数据。
即通过浸入-读取的联合控制,将顺序读取的系统负载减小到并行全晶圆(半导体测序芯片14)读取的16%。采用如上系统逻辑和算法,将每个像素的10bit数字量压缩2bit即可表征ATCG四种碱基,通过首先浸入的核心的数据处理,对全晶圆(半导体测序芯片14)所有核心的数据进行归一化后,使用简化的算法即可获得碱基序列结果。因此系统的传输、计算、存储实现极为简单,成本也非常低。
在某些实施方式中,测序方法包括:
控制每个单位的核心行44浸入试剂中的时间相同。如此,能够保证每次浸入的核心行44与试剂发生生化反应的时间的时间差不超过预设范围。
具体地,在生物自发光或者其它酶促发光的信号产生体系中,半导体测序芯片14的核心42上通过前序生化反应携带的酶,在接触到底物后开始产生信号,其信号曲线与温度、时间等等因素强相关。通过在机械控制模块16中设置机械手10运动时间,使得机械手10夹持半导体测序芯片14以核心行44为浸入单位的浸入试剂,控制各个核心行44浸入试剂中的时间的时间差不超过预设范围,使每个单位的核心行44上生化反应发生的时间保持均匀。
在某些实施方式中,测序方法包括:
控制每个单位的核心行44的数据读取时间的时间差不超过预设范围。如此,能够保证所有核心行44获得的信号是相对均一的。
具体地,控制图像处理模块24设定相同的数据读取时间,每次浸入的一个单位的核心行44在试剂中发生生化反应的时间的时间差不超过预设范围,每个单位的核心行44后固定的时间采集信号,能够保证所有核心行44获得的信号是相对均一的。
在另一些实施方式中,还可以使控制图像处理模块24设定的数据读取时间不超过预设时间范围,每次浸入的核心行44在试剂中发生生化反应的时间和预设范围决定该预设时间范围,图像处理模块24在该预设时间范围内对浸入部分采集信号,能够保证所有核心行44能够采集到数据。例如,预设范围是0-1s,若浸入核心行的时间为10s,则该数据读取时间的预设时间范围为10s-11s,进而满足浸入核心行的时间与数据读取时间的时间差不超过预设范围。
在某些实施方式中,在半导体测序芯片14浸入试剂中前,测序方法还包括:
将整张半导体测序芯片14按照平行于试剂槽12的方式分为多个区域48,每个区域48包含一个核心行44,并存储每个区域48与每个单位的核心行44的对应关系。如此,机械手10夹持半导体测序芯片14每一次移动的移动量以及移动次数得以确定。
具体地,请参阅图7,将整张半导体测序芯片14按照平行于试剂槽12的方式分为多个区域48,每一个区域48包括一个核心行44,每一个区域48为机械手10移动一次浸入试剂发生生化反应并读取数据的部分。通过划分若干区域48,确定在一次对整张半导体测序芯片14进行操作的过程中,机械手10夹持半导体测序芯片14每一次移动的移动量以及运动的次数。在图7所示的实施方式中,将整张半导体测序芯片14划分为7个区域。
在其他实施方式中,整张半导体测序芯片14划分为多个区域48,每一区域48可以是多个核心行44,通过改变其电路逻辑,使得多个核心行44作为一个区域48,在机械手10控制浸入试剂的一次动作中进行生化反应和读取数据。
在某些实施方式中,通过机械手10控制半导体测序芯片14浸入试剂中,通过第一时序控制每个单位的核心行44的连续曝光时间,通过第二时序控制机械手10的运动时间,第一时序中每个单位的核心行44的连续曝光时间之前和之后与第二时序中机械手10的每个运动时间隔开一个等待时间。如此,在曝光时间内确保机械手10处于静止状态,能够使得图像信号更加清晰准确。
具体地,图11为测试过程中截取的某一段时间内的时序图,曝光时间是由生化反应光强及信噪比所决定,通过前期实验测试和计算得到曝光时间,由第一时序控制每个单位的核心行44的连续曝光时间,在每一个曝光时间(第一时序的高电平部分)前后设置一个等待时间,在两个等待时间中设置由第二时序控制机械手10的运动时间。通过三个时序控制机械手10的运动,使机械手10在曝光时间内保守静止状态,从而获得清晰准确的图像信号。
在某些实施方式中,通过第三时序控制每个单位的核心行44的数据传输,第三时 序中每个单位的核心行44的数据传输时间在对应的核心行44的连续曝光时间之后。如此,每浸入一核心行44后就进行数据读取,实现浸入-读取的联合控制,使得系统的传输、缓存、处理负载减小。
具体地,机械手10夹持半导体测序芯片14沿垂直试剂槽12的方向逐行移动,在每个单位的核心行44随着机械手10逐步浸入试剂并发生生化反应,将曝光控制切换到浸入试剂区域48,通过行切换及公共读取单元34将图像信号读出,数据通过行切换电路逻辑,分步上行到数据通道,使得系统的传输、缓存、处理负载减小。
在某些实施方式中,每个核心42包括像素阵列51,像素阵列51为单一的像素阵列51,或像素阵列51由至少两个子像素阵列50通过拼接技术(stitching techniques)拼接而成。如此,能够得到超大尺寸、超大阵列的半导体测序芯片14,如图10为多核心拼接的半导体测序芯片14。
具体地,请参阅图8,像素阵列51占据核心42中心,而外围电路,包括锁相环、解码器控制器和数字缓冲器52、相关双采样电路和比较器54、读出电路56(包括数字和模拟处理器、解码器和充电模式读出电路)、解码器和驱动程序58位于像素阵列51的四周或者另外一张即将与之键合在一起的晶圆上,如图9所示。在其他实施方式中,像素阵列51可以通过几个子像素阵列50拼接而成。在一个例子中,如图10所示,像素阵列51是由4个子像素阵列50通过拼接技术(stitchingtechniques)拼接得到,由此能够获取更大面积的芯片。在图示的实施方式中,一个像素阵列51包括r*c个像素(Pixel)。
综上所述,在每浸入一个浸入单位的核心行44时,读取至少一次该核心行44输出的数据,从并行读取所有核心减少到读取所有核心行44,通过碱基在两种底物试剂中的发光反应,能够通过简单的光信号输出碱基的种类,将每个像素的10bit数字量压缩2bit即可表征ATCG四种碱基,使用简化的算法即可获得碱基序列结果。因此系统的传输、计算、存储实现极为简单,成本也非常低。
本发明实施方式还提供了一种数据处理系统100,用于基因测序系统,数据处理系统100包括半导体测序芯片14、控制装置26和机械手10。半导体测序芯片14包括处理模块和多个核心42,多个核心42呈阵列式分布以形成多个核心行44,处理模块连接核心行44和控制装置26,控制装置26连接机械手10,控制装置26用于:控制机械手10将半导体测序芯片14以核心行44为浸入单位的浸入方式浸入测序反应所用的试剂中,试剂包括至少两个底物试剂,在半导体测序芯片14浸入底物试剂中时,使相应的碱基发光或不发光,处理模块用于:基于所述底物试剂中每浸入N个单位的所述核心行,所述N>0,读取至少 一次浸入所述试剂中的所述核心行输出的数据,直至将所有核心行浸入至所述底物试剂中。
上述数据处理系统100,通过浸入-读取的联合控制,将半导体测序芯片14以核心行44为浸入单位的方式浸入测序反应所用的试剂中,半导体测序芯片14上所携带的碱基通过在试剂中发生的生化反应,能够表现出不同的光强,例如,在某种使碱基发光的底物试剂中,ATCG四种碱基中只有两种碱基发光或表现出较强的光强,而在另一种使碱基发光的底物试剂中,此两种碱基则不发光或表现较弱光强,另外两种碱基的光强表现则相反。因此通过两种使碱基发光的底物试剂,即可根据ATCG四种碱基在其中的光强表现判断碱基的种类。
利用第一个核心行44输出的数据确定目标模板11,以对其余核心行44输出的数据进行简化处理以获取不同碱基的光信号数据。将顺序读取的系统负载减小,较为简单地实现了系统的传输、计算、存储,也降低了成本。
需要说明的是,上述对测序方法的实施方式和有益效果的解释说明,也适用于本实施方式的数据处理系统,为避免冗余,在此不作说详细展开。
在某些实施方式中,在读取第一个核心行44输出的数据后,利用第一个核心行44输出的数据确定目标模板11,目标模板11包括碱基是否发光的信号范围,利用目标模板11对其余核心行44输出的数据进行简化处理以获取不同碱基的光信号数据,控制装置26用于:根据光信号数据确定碱基种类。如此,能够利用目标模板11简化处理数据并获得光信号数据,从而确定碱基种类。
具体的,每一个浸入单位的核心行44浸入两种底物试剂后,半导体测序芯片14上载入的全部碱基序列簇/球与底物试剂进行生化反应,在一个测序循环后,全部碱基序列簇/球的第一位碱基的2bit光信号数据分别被读出,读取该浸入单位核心行44输出的数据,利用第一个核心行44输出的数据确定一个目标模板11,通过目标模板11限定碱基是否发光的信号范围,利用目标模板11对其余核心行44输出的数据进行简化处理以获取并输出不同碱基的光信号数据,此时即可以得到2bit光信号数据“00”、“01”、“10”或“11”,对应“AGCT”四种碱基中的一种,控制装置26通过输出的2bit光信号数据读出并显示具体的碱基种类或碱基序列。
在经过多次测序循环后,能够依次获得半导体测序芯片14上载入的全部碱基序列簇/球每一位的碱基的光信号数据,此时即可得到测序基因的碱基序列,完成测序过程。
在某些实施方式中,试剂包括第一底物试剂和第二底物试剂,在半导体测序芯片14浸入第一底物试剂中或第二底物试剂时,使其中两种碱基发光,另外两种碱基不发光。如此,通过两条通道的发光情况即可区分出四种不同碱基。
在某些实施方式中,目标模板11包括第一信号范围13、第二信号范围15、第三信号范围17和第四信号范围19,碱基的种类包括第一种类、第二种类、第三种类和第四种类,
其中,第一信号范围13表示第一种类的碱基在第一底物试剂和第二底物试剂中均不发光;
第二信号范围15表示第二种类的碱基在第一底物试剂不发光,在第二底物试剂中发光;
第三信号范围17表示第三种类的碱基在第一底物试剂发光,在第二底物试剂中不发光;
第四信号范围19表示第四种类的碱基在第一底物试剂和第二底物试剂中均发光。如此,即可在生化反应中确定具体的碱基种类。
在某些实施方式中,处理模块还用于:
通过截距归类算法将其余核心行44输出的数据归入目标模板11的信号范围。如此,即可减小测序误差,得到更加准确的测序结果。
在某些实施方式中,简化处理包括二值化处理。如此,能够减小数据运算量,同时精确输出测序结果。
在某些实施方式中,控制装置26包括连接半导体测序芯片14的图像处理模块24,图像处理模块24包括行切换及公共读取单元34,行切换及公共读取单元34连接所有核心行44,控制装置26用于通过行切换及公共读取单元34控制浸入试剂中的核心行44的数据通道切换及数据读取。如此,能够实现半导体测序芯片14的浸入-读取的联合控制,减少计算机系统读取、走线、传输、缓存、处理数据的难度。
在某些实施方式中,控制装置26包括连接机械手10的机械控制模块16,机械控制模块16用于控制机械手10使每个单位的核心行44浸入试剂中的时间的时间差不超过预设范围。如此,使每个单位的核心行44上生化反应发生的时间保持均匀。
具体地,机械控制模块16控制机械手10夹持半导体测序芯片14运动,控制半导体测序芯片14以核心行44为浸入单位的浸入试剂。通过机械控制模块16,能够设定机械手10的运动时间,机械手10夹持半导体测序芯片14以核心行44为浸入单位的浸入试剂,控制所有核心行44浸入试剂中的时间的时间差不超过预设范围,使每一单位的核心行44上生化反应发生的时间保持均匀。
在某些实施方式中,控制装置26包括连接半导体测序芯片14的图像处理模块24,图像处理模块24包括行切换及公共读取单元34,行切换及公共读取单元34用于控制每个单位的核心行44的数据读取时间的时间差不超过预设范围。如此,能够保证所有核心行44获得的信号是相对均一的。
具体地,控制图像处理模块24设定相同的数据读取时间,每次浸入的核心行44在 试剂中剂发生生化反应的时间的时间差不超过预设范围,每行芯片核心42浸入后固定的时间采集信号,能够保证所有核心行44获得的信号是相对均一的。
在某些实施方式中,在半导体测序芯片14浸入试剂中前,控制装置26还用于将整张半导体测序芯片14按照平行于试剂槽12的方式分为多个区域48,每个区域48包含一个单位的核心行44,并存储每个区域48与每个单位的核心行44的对应关系。如此,通过划分多个区域48,基因测序系统能够确定机械手10夹持半导体测序芯片14运动时,每一次的移动量和移动次数。
具体地,将整张半导体测序芯片14按照平行于试剂槽12的方式分为多个区域48,每一个区域48为机械手10移动一次浸入试剂的部分。通过划分区域48,确定在一次对整张半导体测序芯片14进行操作的过程中,机械手10夹持半导体测序芯片14每一次移动的移动量以及运动次数。
在某些实施方式中,控制装置26包括连接半导体测序芯片14的图像处理模块24,图像处理模块24包括行切换及公共读取单元34,行切换及公共读取单元34用于通过第一时序控制每个单位的核心行44的连续曝光时间,通过第二时序控制机械手10的运动时间,第一时序中每个单位的核心行44的连续曝光时间之前和之后与第二时序中机械手10的每个运动时间隔开一个等待时间。如此,如此,在曝光时间内机械手10保持静止状态,能够使得图像信号更加清晰准确。
具体地,曝光时间是由生化反应光强及信噪比所决定,通过前期实验测试和计算得到曝光时间,由第一时序控制每个单位的核心行44的连续曝光时间,在每一个曝光时间前后设置一个等待时间,在两个等待时间中设置由第二时序控制机械手10运动时间。通过三个时序,控制在曝光时间内,机械手10保守静止状态,从而获得清晰准确的图像信号。
在某些实施方式中,行切换及公共读取单元34还用于通过第三时序控制每个单位的核心行44的数据传输,第三时序中每个单位的核心行44的数据传输时间在对应的核心行44的连续曝光时间之后。如此,如此,每浸入一核心行44后就进行数据读取,实现浸入-读取的联合控制,使得系统的传输、缓存、处理负载减小。
具体地,机械手10夹持半导体测序芯片14沿垂直试剂槽12的方向逐行移动,在每个单位的核心行44随着机械手10逐步浸入试剂并发生生化反应,将曝光控制切换到浸入试剂区域48,通过行切换及公共读取单元34将图像信号读出,数据通过行切换电路逻辑,分步上行到数据通道,使得系统的传输、缓存、处理负载减小。
综上所述,本发明实施方式提供的半导体测序芯片14的测序方法,用于基因测序系统,通过测序系统的控制实现半导体测序芯片14以核心行44为浸入单位的浸入试剂反应并 同步读出数据,进而对读出的数据进行计算,根据第一核心行44数据得到的目标模板11来对后续读取数据限定信号范围,从而输出2bit信号来确定碱基种类,将顺序读取的系统负载减小,较为简单地实现了系统的传输、计算、存储,也降低了成本。
本发明实施方式还提供了一种基因测序系统,上述对半导体测序芯片14的数据处理系统实施方式和有益效果的说明,也适应用于本发明实施方式的基因测序系统,为避免冗余,在此不作详细展开。
在本说明书的描述中,参考术语“一个实施方式”、“一些实施方式”、“示意性实施方式”、“示例”、“具体示例”或“一些示例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例的方法。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施方式进行变化、修改、替换和变型。
Claims (20)
- 一种测序方法,其特征在于,用于通过半导体测序芯片完成测序,所述半导体测序芯片包括多个核心,所述多个核心排列形成有多个核心行,所述测序方法包括:控制所述半导体测序芯片以核心行为接触单位的方式接触每轮测序反应所用的试剂中,基于所述半导体测序芯片接触所述试剂中,使相应的碱基发光或不发光;基于所述试剂中每接触N个单位的所述核心行,N>0,读取至少一次接触所述试剂中的所述核心行输出的数据,直至将所有核心行接触至所述试剂中,其中,在读取第一个核心行输出的数据后,根据所述第一个核心行输出的数据确定目标模板,所述目标模板包括碱基是否发光的信号范围;根据所述目标模板对其余核心行输出的数据进行简化处理以获取不同碱基的光信号数据;根据所述光信号数据确定碱基种类。
- 根据权利要求1所述的测序方法,其特征在于,所述试剂包括第一底物试剂和第二底物试剂,基于所述半导体测序芯片接触所述第一底物试剂或所述第二底物试剂中,使其中两种碱基发光,另外两种碱基不发光。
- 根据权利要求2所述的测序方法,其特征在于,所述目标模块包括第一信号范围、第二信号范围、第三信号范围和第四信号范围,所述碱基的种类包括第一种类、第二种类、第三种类和第四种类,其中,所述第一信号范围表示第一种类的碱基在所述第一底物试剂和所述第二底物试剂中均不发光;所述第二信号范围表示第二种类的碱基在所述第一底物试剂不发光,在所述第二底物试剂中发光;所述第三信号范围表示第三种类的碱基在所述第一底物试剂发光,在所述第二底物试剂中不发光;所述第四信号范围表示第四种类的碱基在所述第一底物试剂和所述第二底物试剂中均发光。
- 根据权利要求1所述的测序方法,其特征在于,所述测序方法包括:通过行切换及公共读取单元控制接触所述试剂中的核心行的数据通道切换及数据读取,所述行切换及公共读取单元连接所有所述核心行。
- 根据权利要求1所述的测序方法,其特征在于,所述测序方法包括:控制各个单位的核心行接触所述试剂中的时间的时间差不超过预设范围,或控制每个单位的核心行的数据读取时间的时间差不超过预设范围。
- 根据权利要求1所述的测序方法,其特征在于,在所述半导体测序芯片接触试剂中前,所述测序方法还包括:将整张所述半导体测序芯片按照平行于试剂槽的方式分为多个区域,每个区域包含一个所述核心行,并存储每个区域与每个单位的核心行的对应关系。
- 根据权利要求1所述的测序方法,其特征在于,通过机械手控制所述半导体测序芯片接触所述试剂中,通过第一时序控制每个单位的核心行的连续曝光时间,通过第二时序控制所述机械手的运动时间,所述第一时序中每个单位的核心行的连续曝光时间之前和之后与所述第二时序中机械手的每个运动时间隔开一个等待时间。
- 根据权利要求7所述的测序方法,其特征在于,通过第三时序控制每个单位的核心行的数据传输,所述第三时序中每个所述核心行的数据传输时间在对应的核心行的连续曝光时间之后。
- 根据权利要求1所述的测序方法,其特征在于,每个所述核心包括像素阵列,所述像素阵列为单一的像素阵列,或所述像素阵列由至少两个子像素阵列拼接而成。
- 一种数据处理系统,用于基因测序系统,其特征在于,所述数据处理系统包括半导体测序芯片、控制装置和机械手,所述半导体测序芯片包括处理模块和多个核心,所述多个核心呈阵列式分布以形成多个核心行,所述处理模块连接所述核心行和所述控制装置,所述控制装置连接所述机械手,所述控制装置用于:控制所述机械手将所述半导体测序芯片以所述核心行为接触单位的方式接触测序反应所用的试剂中,基于所述半导体测序芯片接触所述试剂中,使所述核心行相应的碱基发光或不发光;所述处理模块用于:基于所述试剂中每接触N个单位的所述核心行,N>0,读取至少一次接触所述试剂中的所述核心行输出的数据,直至将所有核心行接触至所述试剂中。
- 根据权利要求10所述的数据处理系统,其特征在于,在读取第一个核心行输出的数据后,根据所述第一个核心行输出的数据确定目标模板,所述目标模板包括碱基是否发光的信号范围;根据所述目标模板对其余核心行输出的数据进行简化处理以获取不同碱基的光信号数据;所述控制装置还用于:基于所述所有核心行接触至所述试剂中,确定所述半导体测序芯片上的所述碱基的种类;根据所述光信号数据确定碱基种类。
- 根据权利要求10所述的数据处理系统,其特征在于,所述试剂包括第一底物试剂和第二底物试剂,基于所述半导体测序芯片接触所述第一底物试剂中或所述第二底物试剂,使其中两种碱基发光,另外两种碱基不发光。
- 根据权利要求10所述的数据处理系统,其特征在于,通过截距归类算法将其余核心行输出的数据归入目标模板的信号范围。
- 根据权利要求10所述的数据处理系统,其特征在于,所述控制装置包括连接所述半导体测序芯片的图像处理模块,所述图像处理模块包括行切换及公共读取单元,所述行切换及公共读取单元连接所有核心行,所述控制装置用于通过所述行切换及公共读取单元控制接触所述试剂中的核心行的数据通道切换及数据读取。
- 根据权利要求10所述的数据处理系统,其特征在于,所述控制装置包括连接所述机械手的机械控制模块,所述机械控制模块用于控制所述机械手使每个单位的核心行接触所述试剂中的时间的时间差不超过预设范围。
- 根据权利要求10所述的数据处理系统,其特征在于,所述控制装置包括连接所述半导体测序芯片的图像处理模块,所述图像处理模块包括行切换及公共读取单元,所述行切换及公共读取单元用于控制每个单位的核心行的数据读取时间的时间差不超过预设范围。
- 根据权利要求10所述的数据处理系统,其特征在于,在所述半导体测序芯片接触试剂中前,所述控制装置还用于将整张所述半导体测序芯片按照平行于试剂槽的方式分为多个区域,每个区域包含一个所述核心行,并存储每个区域与每个单位的核心行的对应关系。
- 根据权利要求10所述的数据处理系统,其特征在于,所述控制装置包括连接所述半导体测序芯片的图像处理模块,所述图像处理模块包括行切换及公共读取单元,所述行切换及公共读取单元用于通过第一时序控制每个单位的核心行的连续曝光时间,通过第二时序控制所述机械手的运动时间,所述第一时序中每个单位的核心行的连续曝光时间之前和之后与所述第二时序中机械手的每个运动时间隔开一个等待时间。
- 根据权利要求18所述的数据处理系统,其特征在于,所述行切换及公共读取单元还用于通过第三时序控制每个单位的核心行的数据传输,所述第三时序中每个所述核心行的数据传输时间在对应的核心行的连续曝光时间之后。
- 一种基因测序系统,其特征在于,包括权利要求10-19任一项所述的数据处理系统。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/118410 WO2024055149A1 (zh) | 2022-09-13 | 2022-09-13 | 测序方法、处理系统、测序系统 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/118410 WO2024055149A1 (zh) | 2022-09-13 | 2022-09-13 | 测序方法、处理系统、测序系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024055149A1 true WO2024055149A1 (zh) | 2024-03-21 |
Family
ID=90274036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/118410 WO2024055149A1 (zh) | 2022-09-13 | 2022-09-13 | 测序方法、处理系统、测序系统 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024055149A1 (zh) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120173159A1 (en) * | 2010-12-30 | 2012-07-05 | Life Technologies Corporation | Methods, systems, and computer readable media for nucleic acid sequencing |
CN102703314A (zh) * | 2012-05-24 | 2012-10-03 | 中国科学院北京基因组研究所 | 用于dna测序仪的控制系统 |
US20140178862A1 (en) * | 2012-12-20 | 2014-06-26 | Xing Su | Photoinduced redox current (pirc) detection for dna sequencing using integrated transducer array |
CN105624020A (zh) * | 2014-11-07 | 2016-06-01 | 深圳华大基因研究院 | 用于检测dna片段的碱基序列的微流控芯片 |
CN106029897A (zh) * | 2014-03-25 | 2016-10-12 | 吉尼亚科技公司 | 使用堆叠晶片技术的基于纳米孔的测序芯片 |
CN107118955A (zh) * | 2017-05-12 | 2017-09-01 | 京东方科技集团股份有限公司 | 基因测序芯片及基因测序方法 |
CN113066534A (zh) * | 2021-03-08 | 2021-07-02 | 山东骥图生物科技有限公司 | 一种利用dna序列进行信息写入和读取的方法 |
CN114237911A (zh) * | 2021-12-23 | 2022-03-25 | 深圳华大医学检验实验室 | 基于cuda的基因数据处理方法、装置和cuda构架 |
-
2022
- 2022-09-13 WO PCT/CN2022/118410 patent/WO2024055149A1/zh unknown
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120173159A1 (en) * | 2010-12-30 | 2012-07-05 | Life Technologies Corporation | Methods, systems, and computer readable media for nucleic acid sequencing |
CN102703314A (zh) * | 2012-05-24 | 2012-10-03 | 中国科学院北京基因组研究所 | 用于dna测序仪的控制系统 |
US20140178862A1 (en) * | 2012-12-20 | 2014-06-26 | Xing Su | Photoinduced redox current (pirc) detection for dna sequencing using integrated transducer array |
CN106029897A (zh) * | 2014-03-25 | 2016-10-12 | 吉尼亚科技公司 | 使用堆叠晶片技术的基于纳米孔的测序芯片 |
CN105624020A (zh) * | 2014-11-07 | 2016-06-01 | 深圳华大基因研究院 | 用于检测dna片段的碱基序列的微流控芯片 |
CN107118955A (zh) * | 2017-05-12 | 2017-09-01 | 京东方科技集团股份有限公司 | 基因测序芯片及基因测序方法 |
CN113066534A (zh) * | 2021-03-08 | 2021-07-02 | 山东骥图生物科技有限公司 | 一种利用dna序列进行信息写入和读取的方法 |
CN114237911A (zh) * | 2021-12-23 | 2022-03-25 | 深圳华大医学检验实验室 | 基于cuda的基因数据处理方法、装置和cuda构架 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103207187A (zh) | 外观检查装置及外观检查方法 | |
US6987894B2 (en) | Appearance inspection apparatus and method in which plural threads are processed in parallel | |
US20110006797A1 (en) | Probe card and test equipment | |
WO2024055149A1 (zh) | 测序方法、处理系统、测序系统 | |
CN1928576A (zh) | 芯片测试系统和芯片测试方法 | |
US7113629B2 (en) | Pattern inspecting apparatus and method | |
US20210364550A1 (en) | Testing system and testing method | |
CN201524578U (zh) | 一种反应盘 | |
CN102159957B (zh) | 具有太阳能电池组的发光元件检测机台及其检测方法 | |
CN107807161B (zh) | 纳米针电极在超高密度电化学传感分析中的应用 | |
CN201527430U (zh) | 一种生物芯片分析仪 | |
US11384390B2 (en) | Method for controlling base sequence determination, base sequence determination system and control device | |
CN2682491Y (zh) | 调试功能内置型微型计算机 | |
Chou et al. | A Distributed Heterogeneous Inspection System for High Performance Inline Surface Defect Detection. | |
US11233932B2 (en) | Focusing position detection method, focusing position detector, recording medium, and focusing position detection program | |
US20240212370A1 (en) | Data acquisition device, data acquisition method, and biological sample observation system | |
CN117746984A (zh) | 半导体测序芯片的数据读取方法、读取系统、基因测序系统 | |
CN1416163A (zh) | 自动化集成电路整机测试系统、装置及其方法 | |
CN211871935U (zh) | 一种高通量多通道全自动数字pcr阅读分析系统 | |
CN114137195B (zh) | 一种基于图像拍摄分析的高通量生化检测系统及其方法 | |
CN219812215U (zh) | 一种600m像素图像传感器 | |
CN104833678A (zh) | 半导体元件测试系统及其影像处理加速方法 | |
US20230393893A1 (en) | System and method for distributed processing of large-scale streaming data | |
JPH03174738A (ja) | 半導体記憶装置の検査方法 | |
US20230026084A1 (en) | Self-learned base caller, trained using organism sequences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22958344 Country of ref document: EP Kind code of ref document: A1 |