WO2022032885A1 - Sample processing method, and device - Google Patents

Sample processing method, and device Download PDF

Info

Publication number
WO2022032885A1
WO2022032885A1 PCT/CN2020/125165 CN2020125165W WO2022032885A1 WO 2022032885 A1 WO2022032885 A1 WO 2022032885A1 CN 2020125165 W CN2020125165 W CN 2020125165W WO 2022032885 A1 WO2022032885 A1 WO 2022032885A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
sets
permutation
index sequence
fitness
Prior art date
Application number
PCT/CN2020/125165
Other languages
French (fr)
Chinese (zh)
Inventor
赵文妍
段广有
闵文波
方其
张艳
葛毅
廖国娟
Original Assignee
苏州金唯智生物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202010807364.6A external-priority patent/CN111961710B/en
Application filed by 苏州金唯智生物科技有限公司 filed Critical 苏州金唯智生物科技有限公司
Publication of WO2022032885A1 publication Critical patent/WO2022032885A1/en

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • C12Q1/6874Methods for sequencing involving nucleic acid arrays, e.g. sequencing by hybridisation

Definitions

  • Fig. 1a is a flowchart of a sample processing method provided by an embodiment of the present application.
  • the method may be executed by a sample processing apparatus, and the apparatus may be implemented by software and/or hardware, and the apparatus may be configured with a computer, a server, etc.
  • the method can be applied to the scene where the index sequence of the oligonucleotide linker and the corresponding sample for which the library has been completed are arranged in a test lane.
  • the ratio of bases in each position of the Index sequence matched by the sample in each lane is greater than or equal to the preset ratio at the same time. Specifically, the ratio of A, G, C, and T bases in each position of the Index sequence in each lane (the length of the Index sequence is temporarily not limited) must be greater than or equal to 12.5% at the same time.
  • this step is exemplified.
  • there are five samples A, B, C, D, and E and the lanes assigned to the five samples are 1, 2, 1, 2, and 1, respectively.
  • the additional lanes assigned to each sample are 2, 1, 2, 2, 1, respectively.
  • the data size of the samples in each lane is within the first preset data range, and the difference between the data sizes of the samples among the multiple lanes is within the second preset data range.
  • the data volume range of each lane 130G ⁇ total data volume ⁇ 90G; if there are multiple lanes, the data volume of each lane should not vary too much.
  • S320 Determine whether the Index sequence matched by the samples in the test channel meets the set condition.
  • the technical solution provided in the embodiment of the present application is to match the samples in each test channel of the sequencing chip to the Index sequence; if it is judged that the Index sequence matched by the sample in the test channel meets the set conditions; then determine that the sample and the Index sequence A true match is performed, and the sample is sequenced based on the true matched Index sequence, which can quickly and accurately match the Index sequence and improve efficiency.
  • the determining the fitness of each sample arrangement set in the multiple sample arrangement sets includes:
  • fitness is the fitness of the sample arrangement set;
  • A is the normalized value of the sample data volume;
  • B is the normalized value of the base ratio of the Index sequence of the sample in the test channel;
  • FIG. 5 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application. As shown in FIG. 5 , the apparatus includes: a matching module 510 , a judgment module 520 , and a determination/sequencing module 530 .

Abstract

Provided are a sample processing method and a device. The method comprises: allocating at least one test lane to each sample, and all samples forming a plurality of sample arrangement sets on the basis of allocated test lanes; selecting, from among the plurality of sample arrangement sets, at least two sample arrangement sets that meet a first set condition; performing cross interchange on the test lanes in every two selected sample arrangement sets, and performing test lane variation on the sample arrangement sets after being subjected to cross interchange, so as to obtain a plurality of new sample arrangement sets and take the new sample arrangement sets as sample arrangement sets; and returning to the operation of selecting, from among the plurality of sample arrangement sets, at least two sample arrangement sets that meet a first set condition, until a termination condition is met, and choosing a sample arrangement set that meets a second set condition as a final sample arrangement set.

Description

一种样本处理方法及设备A kind of sample processing method and equipment 技术领域technical field
本申请实施例涉及数据处理技术,尤其涉及一种样本处理方法及设备。The embodiments of the present application relate to data processing technologies, and in particular, to a sample processing method and device.
背景技术Background technique
随着技术的进步,采用测序仪(MGI)测序对细胞功能确定、遗传基因研究、疾病诊断等起着关键的重要作用。With the advancement of technology, the use of sequencer (MGI) sequencing plays a key and important role in the determination of cell functions, genetic research, and disease diagnosis.
在采用测序仪进行测序之前,需要对样本进行制备,其中,制备样本的主要步骤包括:片段化和/或筛分指定长度的目标序列;将目标片段转化成双链DNA;在目标片段末端连上寡核苷酸接头序列;以及定量最终的测序文库。Before using a sequencer for sequencing, a sample needs to be prepared. The main steps of sample preparation include: fragmenting and/or screening target sequences of specified lengths; converting the target fragments into double-stranded DNA; ligating the ends of the target fragments oligonucleotide linker sequences; and quantification of the final sequencing library.
目前在测序文库中,每个样本在测序芯片中的测试通道的排列情况只能通过人工的方式进行计算,工作繁琐,效率较低。Currently, in the sequencing library, the arrangement of the test channels of each sample in the sequencing chip can only be calculated manually, which is cumbersome and inefficient.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种样本处理方法及设备,可以快速准确地给出样本在测序芯片中的测试通道排列的情况,提高效率。The embodiment of the present application provides a sample processing method and device, which can quickly and accurately give the arrangement of the test channel of the sample in the sequencing chip, and improve the efficiency.
第一方面,本申请实施例提供了一种样本处理方法,包括:In a first aspect, an embodiment of the present application provides a sample processing method, including:
为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列;Allocate at least one test channel for each sample, and all samples form a plurality of sample arrangement sets based on the allocated test channel; wherein, the sample is the DNA sequence or RNA sequence to be tested;
从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合;Screening out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;
将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合以及将每个新样本排列集合作为样本排列集合;Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain multiple new sample permutation sets and convert each A new sample permutation set is used as a sample permutation set;
返回从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合操作,直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合。Return to the operation of screening at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.
第二方面,本申请实施例还提供了一种样本处理方法,包括:In a second aspect, the embodiment of the present application also provides a sample processing method, including:
将测序芯片的每条测试通道中的样本匹配Index序列;其中,所述样本为待测的DNA序列或者RNA序列;Match the sample in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;
判断所述测试通道中样本匹配的Index序列是否符合设定条件;Determine whether the Index sequence matched by the sample in the test channel meets the set condition;
若是,则确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序。If so, it is determined that the sample truly matches the Index sequence, and the sample is sequenced based on the truly matched Index sequence.
第三方面,本申请实施例还提供了一种设备,包括:In a third aspect, an embodiment of the present application also provides a device, including:
一个或多个处理器;one or more processors;
存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现本申请实施例提供的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the methods provided by the embodiments of the present application.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现本申请实施例提供的方法。In a fourth aspect, the embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the methods provided by the embodiments of the present application are implemented.
本申请实施例提供的技术方案,通过为每个样本分配至少一条测试通道(lane),所有样本基于分配的lane形成多个样本排列集合,筛选出符合第一设定条件的至少两个样本排列集合,并将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合,并将每个新样本排列集合作为样本排列集合,返回筛选样本排列集合的操作,直至达到终止条件为止,则筛选出符合第二设定条件的样本排列集合作为最终的样本排列集合,即通过为每个样本分配至少一条lane,所有样本基于分配的lane形成多个样本排列集合,通过对多个样本排列集合的筛选,以及对筛选得到的样本排列集合的交叉互换,变异,并返回筛选样本排列集合的操作,经过多次迭代,选择合适的样本排列集合作为最终的样本排列集合,可以快速准确地给出样本在测序芯片中排lane的情况,提高效率。In the technical solution provided by the embodiments of the present application, by assigning at least one test lane (lane) to each sample, all samples form multiple sample arrangement sets based on the assigned lanes, and screen out at least two sample arrangements that meet the first set condition Set, and cross-exchange the test channels in each two sample permutation sets in the screened sample permutation set, and mutate the test channel in the sample permutation set after cross-exchange to obtain multiple new sample permutation sets , and take each new sample permutation set as the sample permutation set, and return to the operation of screening the sample permutation set until the termination condition is reached, then filter out the sample permutation set that meets the second set condition as the final sample permutation set, that is, through Allocate at least one lane to each sample, and all samples form multiple sample array sets based on the assigned lanes, through the screening of multiple sample array sets, as well as cross-exchange, mutation, and return screening of the sample array sets obtained by screening In the operation of the sample arrangement set, after several iterations, a suitable sample arrangement set is selected as the final sample arrangement set, which can quickly and accurately give the situation of the samples in the sequencing chip, and improve the efficiency.
附图说明Description of drawings
图1a是本申请实施例提供的一种样本处理方法流程图;FIG. 1a is a flowchart of a sample processing method provided by an embodiment of the present application;
图1b是Index序列的示意图;Fig. 1b is the schematic diagram of Index sequence;
图2是本申请实施例提供的一种样本处理方法流程图;2 is a flowchart of a sample processing method provided by an embodiment of the present application;
图3a是本申请实施例提供的一种样本处理方法流程图;3a is a flowchart of a sample processing method provided by an embodiment of the present application;
图3b是本申请实施例提供的一种样本处理方法流程图;3b is a flowchart of a sample processing method provided by an embodiment of the present application;
图4是本申请实施例提供的一种样本处理装置结构框图;4 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application;
图5是本申请实施例提供的一种样本处理装置结构框图;5 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application;
图6是本申请实施例提供的一种设备的结构示意图。FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application.
具体实施方式detailed description
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.
图1a是本申请实施例提供的一种样本处理方法流程图,所述方法可以由样本处理装置来执行,所述装置可以由软件和/或硬件来实现,所述装置可以配置计算机、服务器等设备中,所述方法可以应用于对已完成建库的寡核苷酸接头Index序列以及对应的样本进行测试通道(lane)排列的场景中。Fig. 1a is a flowchart of a sample processing method provided by an embodiment of the present application. The method may be executed by a sample processing apparatus, and the apparatus may be implemented by software and/or hardware, and the apparatus may be configured with a computer, a server, etc. In the device, the method can be applied to the scene where the index sequence of the oligonucleotide linker and the corresponding sample for which the library has been completed are arranged in a test lane.
如图1a所示,本申请实施例提供的技术方案包括:As shown in Figure 1a, the technical solutions provided by the embodiments of the present application include:
S110:为每个样本分配多条测试通道,所有样本基于分配的多条测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列。S110: Allocate multiple test channels for each sample, and all samples form multiple sample arrangement sets based on the assigned multiple test channels; wherein the samples are DNA sequences or RNA sequences to be tested.
在本申请实施例中,样本为待测的DNA序列或者RNA序列,其中,来源单一的样本为单样本,来源于多处的样本为多样本。测序用的样本可以是单样本,也可以将一些样本混合在一起(多样本)进行测序。In the embodiment of the present application, the sample is a DNA sequence or RNA sequence to be detected, wherein a sample from a single source is a single sample, and a sample from multiple sources is a multi-sample. The sample for sequencing can be a single sample, or some samples can be mixed together (multi-sample) for sequencing.
在本申请实施例中,每个样本属于已建测序文库,其中,每个样本匹配有Index序列,但是没有提供样本在测序芯片中的排lane情况。其中,lane:表示测序芯片上的一条流通槽,测序文库与试剂均在lane中,测序信号的扫描也是按照一条lane上的一个tile进行。其中,可以采用测序仪(MGI)进行测序,MGI通过荧光信号识别lane中样本的序列。In the examples of the present application, each sample belongs to an established sequencing library, wherein each sample is matched with an Index sequence, but the arrangement of the samples in the sequencing chip is not provided. Among them, lane: indicates a flow cell on the sequencing chip. The sequencing library and reagents are in the lane, and the scanning of the sequencing signal is also performed according to a tile on a lane. Among them, the sequencer (MGI) can be used for sequencing, and the MGI recognizes the sequence of the sample in the lane through the fluorescent signal.
其中,表1为输入的样本的信息,其中,I5为样本匹配的Index序列,Data为样本的数据量(G),Name为样本的名称,可以将表1的内容输入到设备中,从而输入的内容进行处理。Among them, Table 1 is the information of the input sample, wherein, I5 is the Index sequence matched by the sample, Data is the data amount (G) of the sample, Name is the name of the sample, and the content of Table 1 can be input into the device, so as to input content is processed.
表1Table 1
NameName IDID I5I5 Date(G)Date(G)
Sample49Sample49 162162 CGTTGAGTCGTTGAGT 22
Sample50Sample50 161161 TTCCTGTGTTCCTGTG 22
Sample51Sample51 160160 TCGTCTCATCGTCTCA 22
Sample52Sample52 138138 GTGCTTACGTGCTTAC 22
Sample53Sample53 2929 GCACAACTGCACAACT 22
Sample54Sample54 277277 AACGGTCAAACGGTCA 22
Sample55Sample55 6363 GATGTGTGGATGTGTG 22
Sample56Sample56 262262 TCTCCGATTCTCCGAT 22
其中,建立测序文库的需要满足如下的条件:Among them, the establishment of a sequencing library needs to meet the following conditions:
A:每条lane中样本的数据量在第一预设数据范围内,且多条lane之间的样本的数据量之间的差值在第二预设数据范围内。具体的,每条lane的数据量范围:130G≥总数据量≥90G;若有多条lane,每条lane之间的数据量不要相差太多。A: The data size of the samples in each lane is within the first preset data range, and the difference between the data sizes of the samples among the multiple lanes is within the second preset data range. Specifically, the data volume range of each lane: 130G ≥ total data volume ≥ 90G; if there are multiple lanes, the data volume of each lane should not vary too much.
B:每条lane中样本匹配的Index序列不存在重复。B: There is no repetition of the Index sequence matched by the samples in each lane.
C:每条lane中样本匹配的Index序列各个位置的碱基比例同时大于或者等于预设比例。具体的,每条lane中Index序列(Index序列暂时不限制长度)每个位置的A,G,C,T碱基比例必须同时≥12.5%。C: The ratio of bases in each position of the Index sequence matched by the sample in each lane is greater than or equal to the preset ratio at the same time. Specifically, the ratio of A, G, C, and T bases in each position of the Index sequence in each lane (the length of the Index sequence is temporarily not limited) must be greater than or equal to 12.5% at the same time.
其中,每个位置碱基比例的计算方法可以是如下方法:每个位置的碱基比例要考虑Index序列的数据量。具体可以是:每个位置的x碱基比例=在相同位置存在碱基x的数据量/总数据量。其中,x可以是A,G,C,T。例如,如图1b所示,第一个位置C碱基的比例=(S1+S3)数据量/(S1+S2+S3+S4)总数据量。Wherein, the calculation method of the base ratio of each position may be the following method: the base ratio of each position should consider the data amount of the Index sequence. Specifically, it may be: the ratio of x bases at each position = the data amount of base x existing in the same position/total data amount. where x can be A, G, C, T. For example, as shown in Fig. 1b, the ratio of bases C at the first position=(S1+S3) data amount/(S1+S2+S3+S4) total data amount.
在本申请实施例中可以采用遗传算法的方法给测序文库中的每个样本初始化一个样本排列的情况,形成样本排列集合;需要给每个样本初始化多条lane,所有样本中的每个样本基于初始化的一条lane形成样本排列集合,所有样本中的每个样本基于初始化的多条lane形成多个样本排列集合。其中,遗传算法的基本步骤可以包括初始化,适应度函数计算,选择,交叉互换,变异。其中,样本排列集合中的元素为在分配的测试通道上的样本,在每个样本排列集合中,样本为基因,lane的排列情况为等位基因。其中,形成的多个样本排列集合可以 是100个,或者也可以是其他数量。In the embodiment of the present application, the genetic algorithm method can be used to initialize a sample arrangement for each sample in the sequencing library to form a sample arrangement set; multiple lanes need to be initialized for each sample, and each sample in all samples is based on An initialized lane forms a sample arrangement set, and each sample in all samples forms a plurality of sample arrangement sets based on the initialized multiple lanes. Among them, the basic steps of genetic algorithm can include initialization, fitness function calculation, selection, crossover and mutation. Among them, the elements in the sample arrangement set are samples on the assigned test channel, in each sample arrangement set, the samples are genes, and the arrangement of lanes is alleles. Wherein, the formed multiple sample arrangement sets may be 100, or may be other numbers.
S120:从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合。S120: Screen out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets.
在本申请实施例中,采用自然选择的方法(例如,锦标赛方法)从多个样本排列集合中筛选出符合第一设定条件的n个样本排列集合。其中,第一设定条件可以是适应度大于第一设定值,或者还可以是其他条件。其中,适应度的计算可以参考下述实施例的介绍。In this embodiment of the present application, a natural selection method (for example, a tournament method) is used to select n sample arrangement sets that meet the first set condition from multiple sample arrangement sets. The first setting condition may be that the fitness is greater than the first setting value, or may also be other conditions. The calculation of the fitness may refer to the introduction of the following embodiments.
S130:将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合。S130: Cross-exchange the test channels in every two sample arrangement sets in the screened sample arrangement sets, and mutate the test channels on the sample arrangement sets after the cross-exchange to obtain multiple new sample arrangement sets.
在本申请实施例中,在筛选出的样本排列集合中,将每两个样本排列集合中样本的lane进行交叉互换,并将交叉互换后的样本排列集合按照设定的规则进行lane的变异,得到多个新样本排列集合,并将新样本排列集合作为样本排列集合,得到多个样本排列集合。其中,设定的规则可以根据需要进行设定,其中,交叉互换后的样本排列集合中lane的变异率小于或者等于预设变异率,预设变异率可以是10%。In the embodiment of the present application, in the selected sample arrangement set, the lanes of the samples in each two sample arrangement sets are cross-exchanged, and the sample arrangement set after the cross-exchange is carried out according to the set rules. mutate to obtain multiple new sample permutation sets, and use the new sample permutation set as the sample permutation set to obtain multiple sample permutation sets. The set rule may be set as required, wherein the variation rate of lanes in the sample arrangement set after cross-exchange is less than or equal to the preset variation rate, and the preset variation rate may be 10%.
在本申请实施例中,对于本步骤进行举例说明,例如存在A,B,C,D和E五个样本,该五个样本分配的lane分别是1,2,1,2,1,该五个样本分配的另外的lane分别是2,1,2,2,1。则两个样本排列集合分别是S1和S2,其中,S1={A1,B2,C1,D2,E1}以及S2={A2,B1,C2,D2,E1},其中,A1表示A样本在第一条lane,样本排列集合中的其他元素表示的含义与A1类似。将S1和S2中样本分配的lane进行交叉互换,例如可以将A和B样本的lane进行交叉互换,得到交叉互换后的样本排列集合S3和S4,S3={A2,B1,C1,D2,E1},S4={A1,B2,C2,D2,E1}。在交叉互换后,可以将集合S3和S4中进行排lane的变异,例如,集合S3将A样本的排lane进行变异,变异为第一条lane,则集合S3变异后得到的新样本排列集合为{A1,B1,C1,D2,E1}。In the embodiment of the present application, this step is exemplified. For example, there are five samples A, B, C, D, and E, and the lanes assigned to the five samples are 1, 2, 1, 2, and 1, respectively. The additional lanes assigned to each sample are 2, 1, 2, 2, 1, respectively. Then the two sample arrangement sets are S1 and S2 respectively, where S1={A1, B2, C1, D2, E1} and S2={A2, B1, C2, D2, E1}, where A1 indicates that the A sample is in the first A lane, other elements in the sample arrangement set have similar meanings to A1. Cross-exchange the lanes allocated by the samples in S1 and S2. For example, you can cross-exchange the lanes of the A and B samples to obtain the cross-exchanged sample arrangement sets S3 and S4, S3={A2, B1, C1, D2, E1}, S4={A1, B2, C2, D2, E1}. After the cross-exchange, the sets S3 and S4 can be mutated in the lanes. For example, the set S3 mutates the row lanes of the A samples and mutates them into the first lane. Then the new samples obtained after the set S3 are mutated are arranged in sets. is {A1, B1, C1, D2, E1}.
需要说明的是,样本排列集合的形式并不局限于上述的表述形式,还可以是其他形式,例如,样本排列集合可以是样本按照lane排列形成的集合。具体的,若存在A,B,C,D和E五个样本,该五个样本分配的lane分别是1,2,1,2,1, 样本排列集合S1还可以是
Figure PCTCN2020125165-appb-000001
该五个样本分配的另外的lane分别是2,1,2,2,1,则样本排列集合S2还可以是
Figure PCTCN2020125165-appb-000002
则将S1和S2进行样本的lane的交叉互换,即将A和B样本的lane进行交叉互换,从而得到交叉互换后的样本排列集合。
It should be noted that the form of the sample arrangement set is not limited to the above-mentioned expression form, and may also be in other forms. For example, the sample arrangement set may be a set formed by arranging samples according to lanes. Specifically, if there are five samples A, B, C, D and E, the lanes assigned to the five samples are 1, 2, 1, 2, and 1 respectively, and the sample arrangement set S1 can also be
Figure PCTCN2020125165-appb-000001
The other lanes assigned to the five samples are 2, 1, 2, 2, and 1 respectively, then the sample arrangement set S2 can also be
Figure PCTCN2020125165-appb-000002
Then, the lanes of the samples S1 and S2 are cross-exchanged, that is, the lanes of the A and B samples are cross-exchanged, so as to obtain the sample arrangement set after the cross-exchange.
S140:将每个新样本排列集合作为样本排列集合。S140: Use each new sample arrangement set as a sample arrangement set.
在本申请实施例中,将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到的新样本排列集合作为样本排列集合。In the embodiment of the present application, the test channels in every two sample arrangement sets in the screened sample arrangement sets are cross-exchanged, and the sample arrangement sets after the cross-exchange are subjected to variation of the test channels, and a new The sample permutation set is taken as the sample permutation set.
S150:判断是否达到终止条件。S150: Determine whether the termination condition is reached.
若是,执行S160。若否,返回S120。If yes, execute S160. If not, return to S120.
在本申请实施例中,终止条件可以是返回次数达到设定次数,或者可以是当前得到的多个样本排列集合的平均适应度与上一次得到的多个样本排列集合的平均适应度的差值在预设差值范围内,或者其他终止条件。其中,设定次数可以是100次,其中,设定次数可以根据实际情况进行设置。In this embodiment of the present application, the termination condition may be that the number of returns reaches a set number of times, or may be the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time within a preset difference range, or other termination conditions. The set number of times may be 100 times, and the set number of times may be set according to actual conditions.
S160:选择符合第二设定条件的样本排列集合作为最终的样本排列集合。S160: Select a sample arrangement set that meets the second set condition as a final sample arrangement set.
在本申请实施例中,第二设定条件可以是适应度最大,或者适应度大于第二设定值,或者其他条件。In this embodiment of the present application, the second setting condition may be that the fitness is the largest, or the fitness is greater than the second setting value, or other conditions.
在本申请实施例中,经过多次迭代,达到终止条件后,选择符合第二设定条件的样本排列集合作为最终的样本排列集合,最终的样本排列集合为样本排lane的较佳方案。In the embodiment of the present application, after the termination condition is reached after multiple iterations, a sample arrangement set that meets the second set condition is selected as the final sample arrangement set, and the final sample arrangement set is a better solution of the sample arrangement lane.
本申请实施例提供的技术方案,通过为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合,筛选出符合第一设定条件的至少两个样本排列集合,并将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合,并将每个新样本排列集合作为样本排列集合,返回筛选样本排列集合的操作,直至达到终止条件为止,筛选出符合第二设定条件的样本排列集合作为最终的样本排列集合,即通过为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合,通过 对多个样本排列集合的筛选,以及对筛选得到的样本排列集合的交叉互换,变异,并返回筛选样本排列集合的操作,经过多次迭代,选择合适的样本排列集合作为最终的样本排列集合,可以快速准确地给出样本在测序芯片中排lane的情况,提高效率。In the technical solution provided by the embodiment of the present application, by assigning at least one test channel to each sample, all samples form multiple sample arrangement sets based on the assigned test channel, and screen out at least two sample arrangement sets that meet the first set condition, Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation set, and mutate the test channel in the sample permutation set after the cross-exchange to obtain multiple new sample permutation sets, and Take each new sample permutation set as the sample permutation set, and return to the operation of screening the sample permutation set until the termination condition is reached, and filter out the sample permutation set that meets the second set condition as the final sample permutation set, that is, by selecting the sample permutation set for each The samples are assigned at least one test channel, and all samples form multiple sample array sets based on the assigned test channels, through the screening of multiple sample array sets, as well as the cross-exchange, mutation of the sample array sets obtained by screening, and returning to the screened samples In the operation of arranging the set, after several iterations, a suitable sample arrangement set is selected as the final sample arrangement set, which can quickly and accurately give the situation of the samples arranged in the lane in the sequencing chip, and improve the efficiency.
图2是本申请实施例提供的一种样本处理方法流程图,在本实施例中,可选的,所述从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合,包括:FIG. 2 is a flowchart of a sample processing method provided by an embodiment of the present application. In this embodiment, optionally, at least two samples that meet the first set condition are selected from the plurality of sample arrangement sets. A collection of sample permutations, including:
确定所述多个样本排列集合中每个样本排列集合的适应度,筛选出适应度大于第一设定值的至少两个样本排列集合。The fitness of each sample arrangement set in the multiple sample arrangement sets is determined, and at least two sample arrangement sets whose fitness is greater than the first set value are screened out.
可选的,所述直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合,包括:Optionally, until the termination condition is reached, the sample arrangement set that meets the second set condition is selected as the final sample arrangement set, including:
直至返回次数达到设定次数为止,或者直至当前得到的多个样本排列集合的平均适应度与上一次得到的多个样本排列集合的平均适应度的差值在预设差值范围内为止,则在一代或者多代的多个样本排列集合中选择适应度最大的样本排列集合作为最终的样本排列集合;Until the number of returns reaches the set number of times, or until the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within the preset difference range, then Select the sample permutation set with the largest fitness as the final sample permutation set from the multiple sample permutation sets of one or more generations;
其中,在多代的多个样本排列集合中,每两代样本排列集合的平均适应度的差值在预设范围内。Wherein, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.
如图2所示,本申请实施例提供的技术方案包括:As shown in Figure 2, the technical solutions provided by the embodiments of the present application include:
S210:为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列。S210: Allocate at least one test channel for each sample, and all samples form multiple sample arrangement sets based on the allocated test channel; wherein the samples are DNA sequences or RNA sequences to be tested.
S220:确定所述多个样本排列集合中每个样本排列集合的适应度,筛选出适应度大于第一设定值的至少两个样本排列集合。S220: Determine the fitness of each sample arrangement set in the plurality of sample arrangement sets, and filter out at least two sample arrangement sets whose fitness is greater than the first set value.
在本申请实施例的一个实施方式中,可选的,所述确定所述多个样本排列集合中每个样本排列集合的适应度,包括:基于每个样本排列集合中的样本数据量归一化值、每条测试通道中样本匹配的寡核苷酸接头Index序列的碱基比例归一化值以及样本匹配的Index序列是否重复的结果确定所述样本排列集合的适应度。In an implementation of the embodiment of the present application, optionally, the determining the fitness of each sample arrangement set in the multiple sample arrangement sets includes: normalizing the amount of sample data in each sample arrangement set based on The normalized value of the base ratio of the index sequence of the sample-matched oligonucleotide linker Index sequence in each test channel, and the result of whether the sample-matched Index sequence is repeated determine the fitness of the sample permutation set.
在本申请实施例中的一个实施方式中,可选的,所述基于每个样本排列集合中的样本数据量归一化值、每条测试通道中样本的寡核苷酸接头Index序列的碱 基比例归一化值以及样本的Index序列是否重复的结果确定所述样本排列集合的适应度,包括:基于如下公式确定所述样本排列集合的适应度;In an embodiment of the embodiments of the present application, optionally, the normalized value of the sample data volume in each sample arrangement set, the base of the oligonucleotide adapter Index sequence of the sample in each test channel Determining the fitness of the sample permutation set based on the normalized value of the base ratio and the result of whether the Index sequence of the sample is repeated includes: determining the fitness of the sample permutation set based on the following formula;
fitness=A+B+Cfitness=A+B+C
其中,fitness为所述样本排列集合的适应度;A为所述样本数据量归一化值;B为lane中样本的Index序列的碱基比例归一化值;其中,若样本的Index序列存在重复,则C为-1,若样本的Index序列不存在重复,则C为0。Among them, fitness is the fitness of the sample arrangement set; A is the normalized value of the sample data volume; B is the normalized value of the base ratio of the Index sequence of the sample in the lane; wherein, if the Index sequence of the sample exists If it is repeated, C is -1, and if there is no repetition in the Index sequence of the sample, C is 0.
其中,上述公式为适应度函数。其中,A=所有lane中样本的最小数据量/数据量平均值,数据量平均值可以为:所有样本的数据量/lane的数量。Among them, the above formula is the fitness function. Among them, A=the minimum data amount of the samples in all lanes/the average value of the data amount, and the average value of the data amount may be: the data amount of all the samples/the number of lanes.
在本申请实施例中,lane中样本的Index序列的碱基比例归一化值的确定方法:在每条lane中,确定Index序列所有位置碱基比例的最小值,将各个lane中Index序列所有位置碱基比例的最小值相加,将最小值之和与lane的数量进行除法处理,再与预设比例进行除法处理,得到lane中样本的Index序列的碱基比例归一化值。即lane中样本的Index序列的碱基比例归一化值为:各条lane中Index序列所有位置碱基比例的最小值之和/lane的数量/预设比例。其中预设比例为0.125。In the embodiment of this application, the method for determining the normalized value of the base ratio of the Index sequence of the sample in the lane: in each lane, determine the minimum value of the base ratio at all positions of the Index sequence, and set all the Index sequences in each lane. The minimum value of the position base ratio is added, and the sum of the minimum value is divided by the number of lanes, and then divided by the preset ratio to obtain the normalized value of the base ratio of the Index sequence of the sample in the lane. That is, the normalized value of the base ratio of the Index sequence of the sample in the lane is: the sum of the minimum base ratios of all positions of the Index sequence in each lane/the number of lanes/the preset ratio. The preset ratio is 0.125.
由此,通过确定各个样本排列集合的适应度,并筛选出适应度大于第一设定值的样本排列集合,可以筛选较佳的样本排列集合,提高样本排lane的效率。Therefore, by determining the fitness of each sample arrangement set, and screening out the sample arrangement set whose fitness is greater than the first set value, a better sample arrangement set can be screened, and the efficiency of the sample arrangement lane can be improved.
S230:将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合。S230: Cross-exchange the test channels in every two sample arrangement sets in the screened sample arrangement sets, and mutate the test channels on the sample arrangement sets after the cross-exchange to obtain multiple new sample arrangement sets.
S240:将每个新样本排列集合作为样本排列集合。S240: Use each new sample arrangement set as a sample arrangement set.
S250:判断返回次数是否达到设定次数,或者判断当前得到的多个样本排列集合的平均适应度与上一次得到的多个样本排列集合的平均适应度的差值是否在预设差值范围内。S250: Determine whether the number of returns reaches a set number of times, or determine whether the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within a preset difference value range .
在本申请实施例中,设定次数可以是100次。预设差值范围可以根据需要进行设置。In this embodiment of the present application, the set number of times may be 100 times. The preset difference range can be set as required.
若是,执行S260,若否,返回S220。If yes, go to S260, if no, go back to S220.
S260:在一代或者多代的多个样本排列集合中选择适应度最大的样本排列集合作为最终的样本排列集合。S260: Select the sample arrangement set with the greatest fitness among the multiple sample arrangement sets of one or more generations as the final sample arrangement set.
在本申请实施例中,在多代的多个样本排列集合中,每两代样本排列集合的 平均适应度的差值在预设范围内。In the embodiment of the present application, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.
在本申请实施例中,当返回次数达到设定次数后,可以在当前代的多个样本排列集合中选择适应度最大的样本排列集合作为最终的样本排列集合。或者当返回次数达到设定次数后,多代样本排列集合的平均适应度趋向平稳,则在平均适应度平稳的多代的多个样本排列集合中选择适应度最大的的样本排列集合作为最终的样本排列集合。In the embodiment of the present application, when the number of returns reaches the set number of times, the sample arrangement set with the greatest fitness may be selected as the final sample arrangement set from among the multiple sample arrangement sets of the current generation. Or when the number of returns reaches the set number of times, the average fitness of the multi-generation sample permutation sets tends to be stable, then select the sample permutation set with the largest fitness as the final A collection of sample permutations.
在上述实施例的基础上,本申请实施例提供的技术方案还可以包括:对最终的样本排列集合中的每个样本进行测序。具体的,对样本排列集合中的每个样本进行基因序列的测序,从而便于对基因序列进行分析和研究。On the basis of the foregoing embodiments, the technical solutions provided in the embodiments of the present application may further include: sequencing each sample in the final sample arrangement set. Specifically, the gene sequence is sequenced for each sample in the sample arrangement set, so as to facilitate the analysis and research of the gene sequence.
图3a是本申请实施例提供的一种样本处理方法流程图,所述方法可以由样本处理装置来执行,所述装置可以由软件和/或硬件来实现,所述装置可以配置计算机、服务器等设备中,所述方法可以应用于对未建库的样本进行Index序列匹配的场景中。其中,样本数据量以及样本的排lane情况已经提供。Fig. 3a is a flowchart of a sample processing method provided by an embodiment of the present application. The method may be executed by a sample processing apparatus, and the apparatus may be implemented by software and/or hardware, and the apparatus may be configured with a computer, a server, etc. In the device, the method can be applied to a scenario in which Index sequence matching is performed on samples that have not been built into a library. Among them, the amount of sample data and the arrangement of samples have been provided.
如图3a所示,本申请实施例提供的技术方案包括:As shown in Figure 3a, the technical solutions provided by the embodiments of the present application include:
S310:将测序芯片的每条测试通道中的样本匹配Index序列;其中,所述样本为待测的DNA序列,或者RNA序列。S310: Match the sample in each test channel of the sequencing chip to the Index sequence; wherein the sample is the DNA sequence or RNA sequence to be tested.
在本申请实施例中,每个样本属于未建的测序文库,其中,样本的数据量和样本在测序芯片中的排lane的情况已知,但是没有提供样本的Index序列的信息。In the examples of the present application, each sample belongs to an unconstructed sequencing library, wherein the data amount of the sample and the arrangement of the sample in the sequencing chip are known, but the information of the Index sequence of the sample is not provided.
其中,表2是输入的样本的信息,其中,lane为样本在测序芯片中所在lane的信息。表3是数据库中Index序列的信息。Among them, Table 2 is the information of the input sample, wherein, lane is the information of the lane where the sample is located in the sequencing chip. Table 3 is the information of the Index sequence in the database.
表2Table 2
Sanple nameSanple name DataData lanelane
sample1sample1 2525 11
sample2sample2 1010 11
sample3sample3 4040 11
sample4sample4 3030 11
sample5sample5 1010 22
表3table 3
index IDindex ID index sequenceindex sequence
A01A01 CGCTACATCGCTACAT
B01B01 AATCCAGCAATCCAGC
C01C01 CGTCTAACCGTCTAAC
D01D01 AACTCGGAAACTCGGA
其中,测序文件建立需要满足如下条件:Among them, the establishment of the sequencing file needs to meet the following conditions:
A:每条lane中样本的数据量在第一预设数据范围内,且多条lane之间的样本的数据量之间的差值在第二预设数据范围内。具体的,每条lane的数据量范围:130G≥总数据量≥90G;若有多条lane,每条lane之间的数据量不要相差太多。A: The data size of the samples in each lane is within the first preset data range, and the difference between the data sizes of the samples among the multiple lanes is within the second preset data range. Specifically, the data volume range of each lane: 130G ≥ total data volume ≥ 90G; if there are multiple lanes, the data volume of each lane should not vary too much.
B:每条lane中样本匹配的Index序列不存在重复。B: There is no repetition of the Index sequence matched by the samples in each lane.
C:每条lane中样本匹配的Index序列各个位置的碱基比例同时大于或者等于预设比例。具体的,每条lane中Index序列(Index暂时不限制长度)每个位置的A,G,C,T碱基比例必须同时≥12.5%。C: The ratio of bases in each position of the Index sequence matched by the sample in each lane is greater than or equal to the preset ratio at the same time. Specifically, the ratio of A, G, C, and T bases in each position of the Index sequence in each lane (the Index does not limit the length temporarily) must be ≥12.5% at the same time.
其中,每个位置碱基比例的计算方法可以是如下方法:每个位置的碱基比例要考虑Index序列的数据量。具体可以是:每个位置的x碱基比例=在相同位置存在碱基x的数据量/总数据量。其中,x可以是A,G,C,T。例如,如图1b所示,第一个位置C碱基的比例=(S1+S3)数据量/(S1+S2+S3+S4)总数据量。Wherein, the calculation method of the base ratio of each position may be the following method: the base ratio of each position should consider the data amount of the Index sequence. Specifically, it may be: the ratio of x bases at each position = the data amount of base x existing in the same position/total data amount. where x can be A, G, C, T. For example, as shown in Fig. 1b, the ratio of bases C at the first position=(S1+S3) data amount/(S1+S2+S3+S4) total data amount.
在本申请实施例中,可以将测序芯片中的每条lane中的样本随机匹配Index序列,或者根据其他规则匹配Index序列。In the embodiment of the present application, the samples in each lane in the sequencing chip may be randomly matched to the Index sequence, or the Index sequence may be matched according to other rules.
S320:判断所述测试通道中样本匹配的Index序列是否符合设定条件。S320: Determine whether the Index sequence matched by the samples in the test channel meets the set condition.
若是,执行S330,若否,返回S310。If yes, go to S330, if no, go back to S310.
在本申请实施例的一个实施方式中,可选的,所述判断所述测试通道中样本匹配的Index序列是否符合设定条件,包括:所述判断匹配的Index序列是否满足如下条件:In an implementation of the embodiment of the present application, optionally, the judging whether the Index sequence matched by the samples in the test channel meets the set condition includes: judging whether the matched Index sequence satisfies the following conditions:
每条所述测试通道中样本匹配的Index序列不存在重复;There is no repetition of the Index sequence matched by the samples in each of the test channels;
每条所述测试通道中样本匹配的Index序列各个位置的碱基比例同时大于或者等于预设比例。其中,预设比例可以是0.125。The ratio of bases in each position of the Index sequence matched by the sample in each of the test channels is greater than or equal to the preset ratio at the same time. The preset ratio may be 0.125.
在本申请实施例中,在每条lane中,可以排列多个样本,每个lane中各个样 本匹配的Index序列不存在重复。其中,样本匹配的Index序列各个位置的碱基比例的计算方法可以参考上述实施例,不再累述。其中,每条lane中样本的数据量在第一预设数据范围内,且多条lane之间的样本的数据量之间的差值在第二预设数据范围内。In the embodiment of the present application, in each lane, multiple samples can be arranged, and there is no repetition of the Index sequence matched by each sample in each lane. Wherein, the calculation method of the base ratio of each position of the Index sequence matched by the sample may refer to the above-mentioned embodiment, and will not be repeatedly described. Wherein, the data volume of the samples in each lane is within the first preset data range, and the difference between the data volumes of the samples among the multiple lanes is within the second preset data range.
S330:确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序。S330: Determine that the sample truly matches the Index sequence, and sequence the sample based on the truly matched Index sequence.
在本申请实施例中,若lane中样本匹配的Index序列符合设定条件,确定样本与Index序列真实匹配,基于匹配的Index序列可以建立测序文库,并基于真实匹配的Index序列对样本进行测序,从而便于对基因序列进行分析和研究。In the embodiment of the present application, if the Index sequence matched by the sample in the lane meets the set conditions, it is determined that the sample is truly matched with the Index sequence, a sequencing library can be established based on the matched Index sequence, and the sample is sequenced based on the truly matched Index sequence, This facilitates the analysis and research of gene sequences.
本申请实施例提供的技术方案,通过将测序芯片的每条测试通道中的样本匹配Index序列;若判断测试通道中样本匹配的Index序列符合设定条件;则确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序,可以快速准确地匹配Index序列,提高效率。The technical solution provided in the embodiment of the present application is to match the samples in each test channel of the sequencing chip to the Index sequence; if it is judged that the Index sequence matched by the sample in the test channel meets the set conditions; then determine that the sample and the Index sequence A true match is performed, and the sample is sequenced based on the true matched Index sequence, which can quickly and accurately match the Index sequence and improve efficiency.
图3b是本申请实施例提供的一种样本处理方法流程图,如图3b所示,针对属于未建测序文库的样本,根据样本数量选择合适的Index序列建立测序文库,从而输出建立测序文库的结果和Index序列的碱基比例。针对属于已建测序文库的样本,根据样本排lane情况建立测序文库,最后输出建立测序文库的结果和Index序列的碱基比例。Fig. 3b is a flow chart of a sample processing method provided in the embodiment of the present application. As shown in Fig. 3b, for samples belonging to an unbuilt sequencing library, an appropriate Index sequence is selected according to the number of samples to establish a sequencing library, thereby outputting the established sequencing library. The base ratio of the result and the Index sequence. For the samples belonging to the established sequencing library, the sequencing library is established according to the sample arrangement, and finally the results of the established sequencing library and the base ratio of the Index sequence are output.
图4是本申请实施例提供的一种样本处理装置结构框图,如图4所示,所述装置包括:形成模块410、筛选模块420、互换/变异模块430和返回/选择模块440。FIG. 4 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application. As shown in FIG. 4 , the apparatus includes: a forming module 410 , a screening module 420 , an exchange/mutation module 430 , and a return/selection module 440 .
形成模块410,用于为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列;The forming module 410 is configured to assign at least one test channel to each sample, and all samples form a plurality of sample arrangement sets based on the assigned test channel; wherein, the samples are DNA sequences or RNA sequences to be tested;
筛选模块420,用于从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合;A screening module 420, configured to screen out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;
互换/变异模块430,用于将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的 变异,得到多个新样本排列集合以及将每个新样本排列集合作为样本排列集合;The swap/mutation module 430 is configured to cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain a plurality of new sample permutation sets and using each new sample permutation set as a sample permutation set;
返回/选择模块440,用于返回从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合操作,直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合。The returning/selection module 440 is configured to return to the operation of screening at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the one that meets the second set condition. The sample permutation set is used as the final sample permutation set.
可选的,筛选模块420,用于确定所述多个样本排列集合中每个样本排列集合的适应度,筛选出适应度大于第一设定值的至少两个样本排列集合。Optionally, the screening module 420 is configured to determine the fitness of each sample arrangement set in the multiple sample arrangement sets, and filter out at least two sample arrangement sets whose fitness is greater than the first set value.
可选的,所述直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合,包括:Optionally, until the termination condition is reached, the sample arrangement set that meets the second set condition is selected as the final sample arrangement set, including:
直至返回次数达到设定次数为止,或者直至当前得到的多个样本排列集合的平均适应度与上一次得到的多个样本排列集合的平均适应度的差值在预设差值范围内为止,则在一代或者多代的多个样本排列集合中选择适应度最大的样本排列集合作为最终的样本排列集合;Until the number of returns reaches the set number of times, or until the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within the preset difference range, then Select the sample permutation set with the largest fitness as the final sample permutation set from the multiple sample permutation sets of one or more generations;
其中,在多代的多个样本排列集合中,每两代样本排列集合的平均适应度的差值在预设范围内。Wherein, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.
可选的,所述确定所述多个样本排列集合中每个样本排列集合的适应度,包括:Optionally, the determining the fitness of each sample arrangement set in the multiple sample arrangement sets includes:
基于每个样本排列集合中的样本数据量归一化值、每条测试通道中样本匹配的寡核苷酸接头Index序列的碱基比例归一化值以及样本匹配的Index序列是否重复的结果确定所述样本排列集合的适应度。Based on the normalized value of the sample data volume in each sample arrangement set, the normalized value of the base ratio of the index sequence of the sample-matched oligonucleotide adaptor in each test channel, and the result of whether the sample-matched Index sequence is repeated The fitness of the sample permutation set.
可选的,所述基于每个样本排列集合中的样本数据量归一化值、每条测试通道中样本的寡核苷酸接头Index序列的碱基比例归一化值以及样本的Index序列是否重复的结果确定所述样本排列集合的适应度,包括:Optionally, the normalized value based on the amount of sample data in each sample arrangement set, the normalized value of the base ratio of the oligonucleotide linker Index sequence of the sample in each test channel, and whether the Index sequence of the sample is The repeated results determine the fitness of the set of sample permutations, including:
基于如下公式确定所述样本排列集合的适应度;Determine the fitness of the sample arrangement set based on the following formula;
fitness=A+B+Cfitness=A+B+C
其中,fitness为所述样本排列集合的适应度;A为所述样本数据量归一化值;B为测试通道中样本的Index序列的碱基比例归一化值;Wherein, fitness is the fitness of the sample arrangement set; A is the normalized value of the sample data volume; B is the normalized value of the base ratio of the Index sequence of the sample in the test channel;
其中,若样本的Index序列存在重复,则C为-1,若样本的Index序列不存在重复,则C为0。Among them, if the Index sequence of the sample is repeated, C is -1, and if there is no repetition of the Index sequence of the sample, C is 0.
可选的,所述装置还包括测序模块,用于对最终的样本排列集合中的每个样本进行测序。Optionally, the device further includes a sequencing module for sequencing each sample in the final sample arrangement set.
上述装置可执行本申请任意实施例所提供的方法,具备执行方法相应的功能模块和有益效果。The above apparatus can execute the method provided by any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
图5是本申请实施例提供的一种样本处理装置结构框图,如图5所示,所述装置包括:匹配模块510、判断模块520和确定/测序模块530。FIG. 5 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application. As shown in FIG. 5 , the apparatus includes: a matching module 510 , a judgment module 520 , and a determination/sequencing module 530 .
其中,匹配模块510,用于将测序芯片的每条测试通道中的样本匹配Index序列;其中,所述样本为待测的DNA序列或者RNA序列;Wherein, the matching module 510 is used to match the samples in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;
判断模块520,用于判断所述测试通道中样本匹配的Index序列是否符合设定条件;The judgment module 520 is used for judging whether the Index sequence matched by the samples in the test channel meets the set condition;
确定/测序模块530,用于若是,则确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序。A determination/sequencing module 530, configured to determine if the sample is truly matched with the Index sequence, and sequence the sample based on the truly matched Index sequence.
可选的,判断模块520,用于所述判断匹配的Index序列是否满足如下条件:Optionally, the judgment module 520 is used to judge whether the matched Index sequence satisfies the following conditions:
每条所述测试通道中样本匹配的Index序列不存在重复;There is no repetition of the Index sequence matched by the samples in each of the test channels;
每条所述测试通道中样本匹配的Index序列各个位置的碱基比例同时大于或者等于预设比例。The ratio of bases in each position of the Index sequence matched by the sample in each of the test channels is greater than or equal to the preset ratio at the same time.
可选的,每条所述测试通道中样本的数据量在第一预设数据范围内,且多条所述测试通道之间的样本的数据量之间的差值在第二预设数据范围内。Optionally, the data volume of the samples in each of the test channels is within the first preset data range, and the difference between the data volumes of the samples between the multiple test channels is within the second preset data range. Inside.
上述装置可执行本申请任意实施例所提供的方法,具备执行方法相应的功能模块和有益效果。The above apparatus can execute the method provided by any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
图6是本申请实施例提供的一种设备结构示意图,如图6所示,该设备包括:FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application. As shown in FIG. 6 , the device includes:
一个或多个处理器610,图6中以一个处理器610为例;One or more processors 610, one processor 610 is taken as an example in FIG. 6;
存储器620; memory 620;
所述设备还可以包括:输入装置630和输出装置640。The apparatus may further include: an input device 630 and an output device 640 .
所述设备中的处理器610、存储器620、输入装置630和输出装置640可以通过总线或者其他方式连接,图6中以通过总线连接为例。The processor 610 , the memory 620 , the input device 630 and the output device 640 in the device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 6 .
存储器620作为一种非暂态计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的一种样本处理方法对应的程序指令/模块(例如,附图4所示的形成模块410、筛选模块420、互换/变异模 块430和返回/选择模块440,或者附图5所示的匹配模块510、判断模块520和确定/测序模块530)。处理器610通过运行存储在存储器620中的软件程序、指令以及模块,从而执行计算机设备的各种功能应用以及数据处理,即实现上述方法实施例的一种样本处理方法,即:As a non-transitory computer-readable storage medium, the memory 620 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a sample processing method in the embodiments of the present application (for example, the accompanying drawings). The forming module 410, the screening module 420, the exchange/mutation module 430 and the return/selection module 440 shown in FIG. 4, or the matching module 510, the judgment module 520 and the determination/sequencing module 530 shown in FIG. 5). The processor 610 executes various functional applications and data processing of the computer device by running the software programs, instructions and modules stored in the memory 620, that is, a sample processing method of the above method embodiment is implemented, namely:
为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列;Allocate at least one test channel for each sample, and all samples form a plurality of sample arrangement sets based on the allocated test channel; wherein, the sample is the DNA sequence or RNA sequence to be tested;
从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合;Screening out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;
将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合以及将每个新样本排列集合作为样本排列集合;Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain multiple new sample permutation sets and convert each A new sample permutation set is used as a sample permutation set;
返回从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合操作,直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合。Return to the operation of filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.
或者,or,
将测序芯片的每条测试通道中的样本匹配Index序列;其中,所述样本为待测的DNA序列或者RNA序列;Match the sample in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;
判断所述测试通道中样本匹配的Index序列是否符合设定条件;Determine whether the Index sequence matched by the sample in the test channel meets the set condition;
若是,则确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序。If so, it is determined that the sample truly matches the Index sequence, and the sample is sequenced based on the truly matched Index sequence.
存储器620可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据计算机设备的使用所创建的数据等。此外,存储器620可以包括高速随机存取存储器,还可以包括非暂态性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态性固态存储器件。在一些实施例中,存储器620可选包括相对于处理器610远程设置的存储器,这些远程存储器可以通过网络连接至终端设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the computer equipment, and the like. Additionally, memory 620 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 620 may optionally include memory located remotely from the processor 610, and these remote memories may be connected to the terminal device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.
输入装置630可用于接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。输出装置640可包括显示屏等显示设备。The input device 630 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the computer device. The output device 640 may include a display device such as a display screen.
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例提供的一种样本处理方法:The embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements a sample processing method provided by the embodiments of the present application:
为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列;Allocate at least one test channel for each sample, and all samples form a plurality of sample arrangement sets based on the allocated test channel; wherein, the sample is the DNA sequence or RNA sequence to be tested;
从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合;Screening out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;
将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合以及将每个新样本排列集合作为样本排列集合;Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain multiple new sample permutation sets and convert each A new sample permutation set is used as a sample permutation set;
返回从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合操作,直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合。Return to the operation of filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.
或者,or,
将测序芯片的每条测试通道中的样本匹配Index序列;其中,所述样本为待测的DNA序列或者RNA序列;Match the sample in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;
判断所述测试通道中样本匹配的Index序列是否符合设定条件;Determine whether the Index sequence matched by the sample in the test channel meets the set condition;
若是,则确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序。If so, it is determined that the sample truly matches the Index sequence, and the sample is sequenced based on the truly matched Index sequence.
可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Any combination of one or more computer-readable media may be employed. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable Programmable Read Only Memory (EPROM or Flash), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据 信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括——但不限于——电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括——但不限于——无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如”C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
注意,上述仅为本申请的较佳实施例及所运用技术原理。本领域技术人员会理解,本申请不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本申请的保护范围。因此,虽然通过以上实施例对本申请进行了较为详细的说明,但是本申请不仅仅限于以上实施例,在不脱离本申请构思的情况下,还可以包括更多其他等效实施例,而本申请的范围由所附的权利要求范围决定。Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims (10)

  1. 一种样本处理方法,包括:A sample processing method comprising:
    为每个样本分配至少一条测试通道,所有样本基于分配的测试通道形成多个样本排列集合;其中,所述样本为待测的DNA序列或者RNA序列;Allocate at least one test channel for each sample, and all samples form a plurality of sample arrangement sets based on the allocated test channel; wherein, the sample is the DNA sequence or RNA sequence to be tested;
    从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合;Screening out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;
    将筛选出的样本排列集合中的每两个样本排列集合中的测试通道进行交叉互换,并将交叉互换后的样本排列集合进行测试通道的变异,得到多个新样本排列集合以及将每个新样本排列集合作为样本排列集合;Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain multiple new sample permutation sets and convert each A new sample permutation set is used as a sample permutation set;
    返回从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合操作,直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合。Return to the operation of filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.
  2. 根据权利要求1所述的方法,其中,所述从所述多个样本排列集合中筛选出符合第一设定条件的至少两个样本排列集合,包括:The method according to claim 1, wherein the filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets comprises:
    确定所述多个样本排列集合中每个样本排列集合的适应度,筛选出适应度大于第一设定值的至少两个样本排列集合。The fitness of each sample arrangement set in the multiple sample arrangement sets is determined, and at least two sample arrangement sets whose fitness is greater than the first set value are screened out.
  3. 根据权利要求1所述的方法,其中,所述直至达到终止条件为止,则选择符合第二设定条件的样本排列集合作为最终的样本排列集合,包括:The method according to claim 1, wherein, until the termination condition is reached, selecting a sample permutation set that meets the second set condition as the final sample permutation set, comprising:
    直至返回次数达到设定次数为止,或者直至当前得到的多个样本排列集合的平均适应度与上一次得到的多个样本排列集合的平均适应度的差值在预设差值范围内为止,则在一代或者多代的多个样本排列集合中选择适应度最大的样本排列集合作为最终的样本排列集合;Until the number of returns reaches the set number of times, or until the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within the preset difference range, then Select the sample permutation set with the largest fitness as the final sample permutation set from the multiple sample permutation sets of one or more generations;
    其中,在多代的多个样本排列集合中,每两代样本排列集合的平均适应度的差值在预设范围内。Wherein, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.
  4. 根据权利要求2所述的方法,其中,所述确定所述多个样本排列集合中每个样本排列集合的适应度,包括:The method according to claim 2, wherein the determining the fitness of each sample permutation set in the plurality of sample permutation sets comprises:
    基于每个样本排列集合中的样本数据量归一化值、每条测试通道中样本匹配的寡核苷酸接头Index序列的碱基比例归一化值以及样本匹配的Index序列是否重复的结果确定所述样本排列集合的适应度。Based on the normalized value of the sample data volume in each sample arrangement set, the normalized value of the base ratio of the index sequence of the sample-matched oligonucleotide adaptor in each test channel, and the result of whether the sample-matched Index sequence is repeated The fitness of the sample permutation set.
  5. 根据权利要求4所述的方法,其中,所述基于每个样本排列集合中的样本数据量归一化值、每条测试通道中样本的寡核苷酸接头Index序列的碱基比例 归一化值以及样本的Index序列是否重复的结果确定所述样本排列集合的适应度,包括:The method according to claim 4, wherein the normalized value based on the amount of sample data in each sample arrangement set and the base ratio of the oligonucleotide adapter Index sequence of the samples in each test channel is normalized The value and the result of whether the Index sequence of the sample is repeated determine the fitness of the sample arrangement set, including:
    基于如下公式确定所述样本排列集合的适应度;Determine the fitness of the sample arrangement set based on the following formula;
    fitness=A+B+Cfitness=A+B+C
    其中,fitness为所述样本排列集合的适应度;A为所述样本数据量归一化值;B为测试通道中样本的Index序列的碱基比例归一化值;Wherein, fitness is the fitness of the sample arrangement set; A is the normalized value of the sample data volume; B is the normalized value of the base ratio of the Index sequence of the sample in the test channel;
    其中,若样本的Index序列存在重复,则C为-1,若样本的Index序列不存在重复,则C为0。Among them, if the Index sequence of the sample is repeated, C is -1, and if there is no repetition of the Index sequence of the sample, C is 0.
  6. 根据权利要求1所述的方法,其中,还包括:对最终的样本排列集合中的每个样本进行测序。The method of claim 1, further comprising: sequencing each sample in the final sample permutation set.
  7. 一种样本处理方法,包括:A sample processing method comprising:
    将测序芯片的每条测试通道中的样本匹配Index序列;其中,所述样本为待测的DNA序列或者RNA序列;Match the sample in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;
    判断所述测试通道中样本匹配的Index序列是否符合设定条件;Determine whether the Index sequence matched by the sample in the test channel meets the set condition;
    若是,则确定所述样本与所述Index序列真实匹配,并基于真实匹配的Index序列对所述样本进行测序。If so, it is determined that the sample truly matches the Index sequence, and the sample is sequenced based on the truly matched Index sequence.
  8. 根据权利要求7所述的方法,其中,所述判断所述测试通道中样本匹配的Index序列是否符合设定条件,包括:The method according to claim 7, wherein the judging whether the Index sequence matched by the samples in the test channel meets a set condition comprises:
    所述判断匹配的Index序列是否满足如下条件:Describe whether the matched Index sequence satisfies the following conditions:
    每条所述测试通道中样本匹配的Index序列不存在重复;There is no repetition of the Index sequence matched by the samples in each of the test channels;
    每条所述测试通道中样本匹配的Index序列各个位置的碱基比例同时大于或者等于预设比例。The ratio of bases in each position of the Index sequence matched by the sample in each of the test channels is greater than or equal to the preset ratio at the same time.
  9. 根据权利要求7或8所述的方法,其中,The method of claim 7 or 8, wherein,
    每条所述测试通道中样本的数据量在第一预设数据范围内,且多条所述测试通道之间的样本的数据量之间的差值在第二预设数据范围内。The data amount of the samples in each of the test channels is within the first preset data range, and the difference between the data amounts of the samples among the plurality of test channels is within the second preset data range.
  10. 一种设备,其中,包括:A device comprising:
    一个或多个处理器;one or more processors;
    存储装置,用于存储一个或多个程序,storage means for storing one or more programs,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-9任一项所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.
PCT/CN2020/125165 2020-08-12 2020-10-30 Sample processing method, and device WO2022032885A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010807364.6A CN111961710B (en) 2020-08-12 Sample processing method and device
CN202010807364.6 2020-08-12

Publications (1)

Publication Number Publication Date
WO2022032885A1 true WO2022032885A1 (en) 2022-02-17

Family

ID=73365720

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/125165 WO2022032885A1 (en) 2020-08-12 2020-10-30 Sample processing method, and device

Country Status (1)

Country Link
WO (1) WO2022032885A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111961710A (en) * 2020-08-12 2020-11-20 苏州金唯智生物科技有限公司 Sample processing method and device
CN111961710B (en) * 2020-08-12 2024-04-26 苏州金唯智生物科技有限公司 Sample processing method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815343A (en) * 2017-01-16 2017-06-09 上海小海龟科技有限公司 A kind of data processing method and data processing equipment
CN107164464A (en) * 2017-04-27 2017-09-15 武汉华大医学检验所有限公司 A kind of method and primer for detecting the pollution of microarray dataset index sequence
WO2018197945A1 (en) * 2017-04-23 2018-11-01 Illumina Cambridge Limited Compositions and methods for improving sample identification in indexed nucleic acid libraries
CN110785813A (en) * 2017-07-31 2020-02-11 伊鲁米那股份有限公司 Sequencing system with multi-path biological sample aggregation
CN111961710A (en) * 2020-08-12 2020-11-20 苏州金唯智生物科技有限公司 Sample processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106815343A (en) * 2017-01-16 2017-06-09 上海小海龟科技有限公司 A kind of data processing method and data processing equipment
WO2018197945A1 (en) * 2017-04-23 2018-11-01 Illumina Cambridge Limited Compositions and methods for improving sample identification in indexed nucleic acid libraries
CN107164464A (en) * 2017-04-27 2017-09-15 武汉华大医学检验所有限公司 A kind of method and primer for detecting the pollution of microarray dataset index sequence
CN110785813A (en) * 2017-07-31 2020-02-11 伊鲁米那股份有限公司 Sequencing system with multi-path biological sample aggregation
CN111961710A (en) * 2020-08-12 2020-11-20 苏州金唯智生物科技有限公司 Sample processing method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111961710A (en) * 2020-08-12 2020-11-20 苏州金唯智生物科技有限公司 Sample processing method and device
CN111961710B (en) * 2020-08-12 2024-04-26 苏州金唯智生物科技有限公司 Sample processing method and device

Also Published As

Publication number Publication date
CN111961710A (en) 2020-11-20

Similar Documents

Publication Publication Date Title
Pell et al. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs
JP3931214B2 (en) Data analysis apparatus and program
WO2020199336A1 (en) Genovariation recognition method and apparatus, and storage medium
CN110797088B (en) Whole genome resequencing analysis and method for whole genome resequencing analysis
WO2020132572A1 (en) Source of origin deconvolution based on methylation fragments in cell-free-dna samples
KR102425673B1 (en) How to reorder sequencing data reads
CN107832584B (en) Gene analysis method, device, equipment and storage medium of metagenome
WO2022032885A1 (en) Sample processing method, and device
EP3535678B1 (en) Systems and methods for outlier significance assessment
CN111961710B (en) Sample processing method and device
EP2665010A1 (en) Nucleic acid information processing device and processing method thereof
US20120010823A1 (en) System for the quantification of system-wide dynamics in complex networks
CN104778088A (en) Method and system for optimizing parallel I/O (input/output) by reducing inter-progress communication expense
WO2023124779A1 (en) Third-generation sequencing data analysis method and device for point mutation detection
EP2665009A1 (en) Nucleic acid information processing device and processing method thereof
CN109887544A (en) RNA sequence parallel sorting method based on Non-negative Matrix Factorization
KR20170061911A (en) Method for constructing fused regression network and fused analysis system thereof
WO2007042270A1 (en) Method of identifying pattern in a series of data
KR102447359B1 (en) Apparatus and method for predicting novel disease genes based on the integration of diverse gene-gene relations
Zhao et al. DTA-SiST: de novo transcriptome assembly by using simplified suffix trees
Varrone et al. CellCharter: a scalable framework to chart and compare cell niches across multiple samples and spatial-omics technologies
US10629292B2 (en) Generation and use of simulated genomic data
US20230212560A1 (en) Systems, methods, and media for determining relative quality of oligonucleotide preparations
Mohsen et al. Improving de novo metatranscriptome assembly via machine learning algorithms
Fink et al. 1-D random landscapes and non-random data series

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20949379

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20949379

Country of ref document: EP

Kind code of ref document: A1