WO2022032885A1

WO2022032885A1 - Sample processing method, and device

Info

Publication number: WO2022032885A1
Application number: PCT/CN2020/125165
Authority: WO
Inventors: 赵文妍; 段广有; 闵文波; 方其; 张艳; 葛毅; 廖国娟
Original assignee: 苏州金唯智生物科技有限公司
Priority date: 2020-08-12
Filing date: 2020-10-30
Publication date: 2022-02-17
Also published as: CN111961710A

Abstract

Provided are a sample processing method and a device. The method comprises: allocating at least one test lane to each sample, and all samples forming a plurality of sample arrangement sets on the basis of allocated test lanes; selecting, from among the plurality of sample arrangement sets, at least two sample arrangement sets that meet a first set condition; performing cross interchange on the test lanes in every two selected sample arrangement sets, and performing test lane variation on the sample arrangement sets after being subjected to cross interchange, so as to obtain a plurality of new sample arrangement sets and take the new sample arrangement sets as sample arrangement sets; and returning to the operation of selecting, from among the plurality of sample arrangement sets, at least two sample arrangement sets that meet a first set condition, until a termination condition is met, and choosing a sample arrangement set that meets a second set condition as a final sample arrangement set.

Description

A kind of sample processing method and equipment

technical field

The embodiments of the present application relate to data processing technologies, and in particular, to a sample processing method and device.

Background technique

With the advancement of technology, the use of sequencer (MGI) sequencing plays a key and important role in the determination of cell functions, genetic research, and disease diagnosis.

Before using a sequencer for sequencing, a sample needs to be prepared. The main steps of sample preparation include: fragmenting and/or screening target sequences of specified lengths; converting the target fragments into double-stranded DNA; ligating the ends of the target fragments oligonucleotide linker sequences; and quantification of the final sequencing library.

Currently, in the sequencing library, the arrangement of the test channels of each sample in the sequencing chip can only be calculated manually, which is cumbersome and inefficient.

SUMMARY OF THE INVENTION

The embodiment of the present application provides a sample processing method and device, which can quickly and accurately give the arrangement of the test channel of the sample in the sequencing chip, and improve the efficiency.

In a first aspect, an embodiment of the present application provides a sample processing method, including:

Allocate at least one test channel for each sample, and all samples form a plurality of sample arrangement sets based on the allocated test channel; wherein, the sample is the DNA sequence or RNA sequence to be tested;

Screening out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;

Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain multiple new sample permutation sets and convert each A new sample permutation set is used as a sample permutation set;

Return to the operation of screening at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.

In a second aspect, the embodiment of the present application also provides a sample processing method, including:

Match the sample in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;

Determine whether the Index sequence matched by the sample in the test channel meets the set condition;

If so, it is determined that the sample truly matches the Index sequence, and the sample is sequenced based on the truly matched Index sequence.

In a third aspect, an embodiment of the present application also provides a device, including:

one or more processors;

storage means for storing one or more programs,

When the one or more programs are executed by the one or more processors, the one or more processors implement the methods provided by the embodiments of the present application.

In a fourth aspect, the embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the methods provided by the embodiments of the present application are implemented.

In the technical solution provided by the embodiments of the present application, by assigning at least one test lane (lane) to each sample, all samples form multiple sample arrangement sets based on the assigned lanes, and screen out at least two sample arrangements that meet the first set condition Set, and cross-exchange the test channels in each two sample permutation sets in the screened sample permutation set, and mutate the test channel in the sample permutation set after cross-exchange to obtain multiple new sample permutation sets , and take each new sample permutation set as the sample permutation set, and return to the operation of screening the sample permutation set until the termination condition is reached, then filter out the sample permutation set that meets the second set condition as the final sample permutation set, that is, through Allocate at least one lane to each sample, and all samples form multiple sample array sets based on the assigned lanes, through the screening of multiple sample array sets, as well as cross-exchange, mutation, and return screening of the sample array sets obtained by screening In the operation of the sample arrangement set, after several iterations, a suitable sample arrangement set is selected as the final sample arrangement set, which can quickly and accurately give the situation of the samples in the sequencing chip, and improve the efficiency.

Description of drawings

FIG. 1a is a flowchart of a sample processing method provided by an embodiment of the present application;

Fig. 1b is the schematic diagram of Index sequence;

2 is a flowchart of a sample processing method provided by an embodiment of the present application;

3a is a flowchart of a sample processing method provided by an embodiment of the present application;

3b is a flowchart of a sample processing method provided by an embodiment of the present application;

4 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application;

5 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application.

detailed description

The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, the drawings only show some but not all the structures related to the present application.

Fig. 1a is a flowchart of a sample processing method provided by an embodiment of the present application. The method may be executed by a sample processing apparatus, and the apparatus may be implemented by software and/or hardware, and the apparatus may be configured with a computer, a server, etc. In the device, the method can be applied to the scene where the index sequence of the oligonucleotide linker and the corresponding sample for which the library has been completed are arranged in a test lane.

As shown in Figure 1a, the technical solutions provided by the embodiments of the present application include:

S110: Allocate multiple test channels for each sample, and all samples form multiple sample arrangement sets based on the assigned multiple test channels; wherein the samples are DNA sequences or RNA sequences to be tested.

In the embodiment of the present application, the sample is a DNA sequence or RNA sequence to be detected, wherein a sample from a single source is a single sample, and a sample from multiple sources is a multi-sample. The sample for sequencing can be a single sample, or some samples can be mixed together (multi-sample) for sequencing.

In the examples of the present application, each sample belongs to an established sequencing library, wherein each sample is matched with an Index sequence, but the arrangement of the samples in the sequencing chip is not provided. Among them, lane: indicates a flow cell on the sequencing chip. The sequencing library and reagents are in the lane, and the scanning of the sequencing signal is also performed according to a tile on a lane. Among them, the sequencer (MGI) can be used for sequencing, and the MGI recognizes the sequence of the sample in the lane through the fluorescent signal.

Among them, Table 1 is the information of the input sample, wherein, I5 is the Index sequence matched by the sample, Data is the data amount (G) of the sample, Name is the name of the sample, and the content of Table 1 can be input into the device, so as to input content is processed.

Table 1

NameName	IDID	I5I5	Date(G)Date(G)
Sample49Sample49	162162	CGTTGAGTCGTTGAGT	22
Sample50Sample50	161161	TTCCTGTGTTCCTGTG	22
Sample51Sample51	160160	TCGTCTCATCGTCTCA	22
Sample52Sample52	138138	GTGCTTACGTGCTTAC	22
Sample53Sample53	2929	GCACAACTGCACAACT	22
Sample54Sample54	277277	AACGGTCAAACGGTCA	22
Sample55Sample55	6363	GATGTGTGGATGTGTG	22
Sample56Sample56	262262	TCTCCGATTCTCCGAT	22

Among them, the establishment of a sequencing library needs to meet the following conditions:

A: The data size of the samples in each lane is within the first preset data range, and the difference between the data sizes of the samples among the multiple lanes is within the second preset data range. Specifically, the data volume range of each lane: 130G ≥ total data volume ≥ 90G; if there are multiple lanes, the data volume of each lane should not vary too much.

B: There is no repetition of the Index sequence matched by the samples in each lane.

C: The ratio of bases in each position of the Index sequence matched by the sample in each lane is greater than or equal to the preset ratio at the same time. Specifically, the ratio of A, G, C, and T bases in each position of the Index sequence in each lane (the length of the Index sequence is temporarily not limited) must be greater than or equal to 12.5% at the same time.

Wherein, the calculation method of the base ratio of each position may be the following method: the base ratio of each position should consider the data amount of the Index sequence. Specifically, it may be: the ratio of x bases at each position = the data amount of base x existing in the same position/total data amount. where x can be A, G, C, T. For example, as shown in Fig. 1b, the ratio of bases C at the first position=(S1+S3) data amount/(S1+S2+S3+S4) total data amount.

In the embodiment of the present application, the genetic algorithm method can be used to initialize a sample arrangement for each sample in the sequencing library to form a sample arrangement set; multiple lanes need to be initialized for each sample, and each sample in all samples is based on An initialized lane forms a sample arrangement set, and each sample in all samples forms a plurality of sample arrangement sets based on the initialized multiple lanes. Among them, the basic steps of genetic algorithm can include initialization, fitness function calculation, selection, crossover and mutation. Among them, the elements in the sample arrangement set are samples on the assigned test channel, in each sample arrangement set, the samples are genes, and the arrangement of lanes is alleles. Wherein, the formed multiple sample arrangement sets may be 100, or may be other numbers.

S120: Screen out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets.

In this embodiment of the present application, a natural selection method (for example, a tournament method) is used to select n sample arrangement sets that meet the first set condition from multiple sample arrangement sets. The first setting condition may be that the fitness is greater than the first setting value, or may also be other conditions. The calculation of the fitness may refer to the introduction of the following embodiments.

S130: Cross-exchange the test channels in every two sample arrangement sets in the screened sample arrangement sets, and mutate the test channels on the sample arrangement sets after the cross-exchange to obtain multiple new sample arrangement sets.

In the embodiment of the present application, in the selected sample arrangement set, the lanes of the samples in each two sample arrangement sets are cross-exchanged, and the sample arrangement set after the cross-exchange is carried out according to the set rules. mutate to obtain multiple new sample permutation sets, and use the new sample permutation set as the sample permutation set to obtain multiple sample permutation sets. The set rule may be set as required, wherein the variation rate of lanes in the sample arrangement set after cross-exchange is less than or equal to the preset variation rate, and the preset variation rate may be 10%.

In the embodiment of the present application, this step is exemplified. For example, there are five samples A, B, C, D, and E, and the lanes assigned to the five samples are 1, 2, 1, 2, and 1, respectively. The additional lanes assigned to each sample are 2, 1, 2, 2, 1, respectively. Then the two sample arrangement sets are S1 and S2 respectively, where S1={A1, B2, C1, D2, E1} and S2={A2, B1, C2, D2, E1}, where A1 indicates that the A sample is in the first A lane, other elements in the sample arrangement set have similar meanings to A1. Cross-exchange the lanes allocated by the samples in S1 and S2. For example, you can cross-exchange the lanes of the A and B samples to obtain the cross-exchanged sample arrangement sets S3 and S4, S3={A2, B1, C1, D2, E1}, S4={A1, B2, C2, D2, E1}. After the cross-exchange, the sets S3 and S4 can be mutated in the lanes. For example, the set S3 mutates the row lanes of the A samples and mutates them into the first lane. Then the new samples obtained after the set S3 are mutated are arranged in sets. is {A1, B1, C1, D2, E1}.

It should be noted that the form of the sample arrangement set is not limited to the above-mentioned expression form, and may also be in other forms. For example, the sample arrangement set may be a set formed by arranging samples according to lanes. Specifically, if there are five samples A, B, C, D and E, the lanes assigned to the five samples are 1, 2, 1, 2, and 1 respectively, and the sample arrangement set S1 can also be

The other lanes assigned to the five samples are 2, 1, 2, 2, and 1 respectively, then the sample arrangement set S2 can also be

Then, the lanes of the samples S1 and S2 are cross-exchanged, that is, the lanes of the A and B samples are cross-exchanged, so as to obtain the sample arrangement set after the cross-exchange.

S140: Use each new sample arrangement set as a sample arrangement set.

In the embodiment of the present application, the test channels in every two sample arrangement sets in the screened sample arrangement sets are cross-exchanged, and the sample arrangement sets after the cross-exchange are subjected to variation of the test channels, and a new The sample permutation set is taken as the sample permutation set.

S150: Determine whether the termination condition is reached.

If yes, execute S160. If not, return to S120.

In this embodiment of the present application, the termination condition may be that the number of returns reaches a set number of times, or may be the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time within a preset difference range, or other termination conditions. The set number of times may be 100 times, and the set number of times may be set according to actual conditions.

S160: Select a sample arrangement set that meets the second set condition as a final sample arrangement set.

In this embodiment of the present application, the second setting condition may be that the fitness is the largest, or the fitness is greater than the second setting value, or other conditions.

In the embodiment of the present application, after the termination condition is reached after multiple iterations, a sample arrangement set that meets the second set condition is selected as the final sample arrangement set, and the final sample arrangement set is a better solution of the sample arrangement lane.

In the technical solution provided by the embodiment of the present application, by assigning at least one test channel to each sample, all samples form multiple sample arrangement sets based on the assigned test channel, and screen out at least two sample arrangement sets that meet the first set condition, Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation set, and mutate the test channel in the sample permutation set after the cross-exchange to obtain multiple new sample permutation sets, and Take each new sample permutation set as the sample permutation set, and return to the operation of screening the sample permutation set until the termination condition is reached, and filter out the sample permutation set that meets the second set condition as the final sample permutation set, that is, by selecting the sample permutation set for each The samples are assigned at least one test channel, and all samples form multiple sample array sets based on the assigned test channels, through the screening of multiple sample array sets, as well as the cross-exchange, mutation of the sample array sets obtained by screening, and returning to the screened samples In the operation of arranging the set, after several iterations, a suitable sample arrangement set is selected as the final sample arrangement set, which can quickly and accurately give the situation of the samples arranged in the lane in the sequencing chip, and improve the efficiency.

FIG. 2 is a flowchart of a sample processing method provided by an embodiment of the present application. In this embodiment, optionally, at least two samples that meet the first set condition are selected from the plurality of sample arrangement sets. A collection of sample permutations, including:

The fitness of each sample arrangement set in the multiple sample arrangement sets is determined, and at least two sample arrangement sets whose fitness is greater than the first set value are screened out.

Optionally, until the termination condition is reached, the sample arrangement set that meets the second set condition is selected as the final sample arrangement set, including:

Until the number of returns reaches the set number of times, or until the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within the preset difference range, then Select the sample permutation set with the largest fitness as the final sample permutation set from the multiple sample permutation sets of one or more generations;

Wherein, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.

As shown in Figure 2, the technical solutions provided by the embodiments of the present application include:

S210: Allocate at least one test channel for each sample, and all samples form multiple sample arrangement sets based on the allocated test channel; wherein the samples are DNA sequences or RNA sequences to be tested.

S220: Determine the fitness of each sample arrangement set in the plurality of sample arrangement sets, and filter out at least two sample arrangement sets whose fitness is greater than the first set value.

In an implementation of the embodiment of the present application, optionally, the determining the fitness of each sample arrangement set in the multiple sample arrangement sets includes: normalizing the amount of sample data in each sample arrangement set based on The normalized value of the base ratio of the index sequence of the sample-matched oligonucleotide linker Index sequence in each test channel, and the result of whether the sample-matched Index sequence is repeated determine the fitness of the sample permutation set.

In an embodiment of the embodiments of the present application, optionally, the normalized value of the sample data volume in each sample arrangement set, the base of the oligonucleotide adapter Index sequence of the sample in each test channel Determining the fitness of the sample permutation set based on the normalized value of the base ratio and the result of whether the Index sequence of the sample is repeated includes: determining the fitness of the sample permutation set based on the following formula;

fitness=A+B+C

Among them, fitness is the fitness of the sample arrangement set; A is the normalized value of the sample data volume; B is the normalized value of the base ratio of the Index sequence of the sample in the lane; wherein, if the Index sequence of the sample exists If it is repeated, C is -1, and if there is no repetition in the Index sequence of the sample, C is 0.

Among them, the above formula is the fitness function. Among them, A=the minimum data amount of the samples in all lanes/the average value of the data amount, and the average value of the data amount may be: the data amount of all the samples/the number of lanes.

In the embodiment of this application, the method for determining the normalized value of the base ratio of the Index sequence of the sample in the lane: in each lane, determine the minimum value of the base ratio at all positions of the Index sequence, and set all the Index sequences in each lane. The minimum value of the position base ratio is added, and the sum of the minimum value is divided by the number of lanes, and then divided by the preset ratio to obtain the normalized value of the base ratio of the Index sequence of the sample in the lane. That is, the normalized value of the base ratio of the Index sequence of the sample in the lane is: the sum of the minimum base ratios of all positions of the Index sequence in each lane/the number of lanes/the preset ratio. The preset ratio is 0.125.

Therefore, by determining the fitness of each sample arrangement set, and screening out the sample arrangement set whose fitness is greater than the first set value, a better sample arrangement set can be screened, and the efficiency of the sample arrangement lane can be improved.

S230: Cross-exchange the test channels in every two sample arrangement sets in the screened sample arrangement sets, and mutate the test channels on the sample arrangement sets after the cross-exchange to obtain multiple new sample arrangement sets.

S240: Use each new sample arrangement set as a sample arrangement set.

S250: Determine whether the number of returns reaches a set number of times, or determine whether the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within a preset difference value range .

In this embodiment of the present application, the set number of times may be 100 times. The preset difference range can be set as required.

If yes, go to S260, if no, go back to S220.

S260: Select the sample arrangement set with the greatest fitness among the multiple sample arrangement sets of one or more generations as the final sample arrangement set.

In the embodiment of the present application, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.

In the embodiment of the present application, when the number of returns reaches the set number of times, the sample arrangement set with the greatest fitness may be selected as the final sample arrangement set from among the multiple sample arrangement sets of the current generation. Or when the number of returns reaches the set number of times, the average fitness of the multi-generation sample permutation sets tends to be stable, then select the sample permutation set with the largest fitness as the final A collection of sample permutations.

On the basis of the foregoing embodiments, the technical solutions provided in the embodiments of the present application may further include: sequencing each sample in the final sample arrangement set. Specifically, the gene sequence is sequenced for each sample in the sample arrangement set, so as to facilitate the analysis and research of the gene sequence.

Fig. 3a is a flowchart of a sample processing method provided by an embodiment of the present application. The method may be executed by a sample processing apparatus, and the apparatus may be implemented by software and/or hardware, and the apparatus may be configured with a computer, a server, etc. In the device, the method can be applied to a scenario in which Index sequence matching is performed on samples that have not been built into a library. Among them, the amount of sample data and the arrangement of samples have been provided.

As shown in Figure 3a, the technical solutions provided by the embodiments of the present application include:

S310: Match the sample in each test channel of the sequencing chip to the Index sequence; wherein the sample is the DNA sequence or RNA sequence to be tested.

In the examples of the present application, each sample belongs to an unconstructed sequencing library, wherein the data amount of the sample and the arrangement of the sample in the sequencing chip are known, but the information of the Index sequence of the sample is not provided.

Among them, Table 2 is the information of the input sample, wherein, lane is the information of the lane where the sample is located in the sequencing chip. Table 3 is the information of the Index sequence in the database.

Table 2

Sanple nameSanple name	DataData	lanelane
sample1sample1	2525	11
sample2sample2	1010	11
sample3sample3	4040	11
sample4sample4	3030	11
sample5sample5	1010	22

table 3

index IDindex ID	index sequenceindex sequence
A01A01	CGCTACATCGCTACAT
B01B01	AATCCAGCAATCCAGC
C01C01	CGTCTAACCGTCTAAC
D01D01	AACTCGGAAACTCGGA

Among them, the establishment of the sequencing file needs to meet the following conditions:

C: The ratio of bases in each position of the Index sequence matched by the sample in each lane is greater than or equal to the preset ratio at the same time. Specifically, the ratio of A, G, C, and T bases in each position of the Index sequence in each lane (the Index does not limit the length temporarily) must be ≥12.5% at the same time.

In the embodiment of the present application, the samples in each lane in the sequencing chip may be randomly matched to the Index sequence, or the Index sequence may be matched according to other rules.

S320: Determine whether the Index sequence matched by the samples in the test channel meets the set condition.

If yes, go to S330, if no, go back to S310.

In an implementation of the embodiment of the present application, optionally, the judging whether the Index sequence matched by the samples in the test channel meets the set condition includes: judging whether the matched Index sequence satisfies the following conditions:

There is no repetition of the Index sequence matched by the samples in each of the test channels;

The ratio of bases in each position of the Index sequence matched by the sample in each of the test channels is greater than or equal to the preset ratio at the same time. The preset ratio may be 0.125.

In the embodiment of the present application, in each lane, multiple samples can be arranged, and there is no repetition of the Index sequence matched by each sample in each lane. Wherein, the calculation method of the base ratio of each position of the Index sequence matched by the sample may refer to the above-mentioned embodiment, and will not be repeatedly described. Wherein, the data volume of the samples in each lane is within the first preset data range, and the difference between the data volumes of the samples among the multiple lanes is within the second preset data range.

S330: Determine that the sample truly matches the Index sequence, and sequence the sample based on the truly matched Index sequence.

In the embodiment of the present application, if the Index sequence matched by the sample in the lane meets the set conditions, it is determined that the sample is truly matched with the Index sequence, a sequencing library can be established based on the matched Index sequence, and the sample is sequenced based on the truly matched Index sequence, This facilitates the analysis and research of gene sequences.

The technical solution provided in the embodiment of the present application is to match the samples in each test channel of the sequencing chip to the Index sequence; if it is judged that the Index sequence matched by the sample in the test channel meets the set conditions; then determine that the sample and the Index sequence A true match is performed, and the sample is sequenced based on the true matched Index sequence, which can quickly and accurately match the Index sequence and improve efficiency.

Fig. 3b is a flow chart of a sample processing method provided in the embodiment of the present application. As shown in Fig. 3b, for samples belonging to an unbuilt sequencing library, an appropriate Index sequence is selected according to the number of samples to establish a sequencing library, thereby outputting the established sequencing library. The base ratio of the result and the Index sequence. For the samples belonging to the established sequencing library, the sequencing library is established according to the sample arrangement, and finally the results of the established sequencing library and the base ratio of the Index sequence are output.

FIG. 4 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application. As shown in FIG. 4 , the apparatus includes: a forming module 410 , a screening module 420 , an exchange/mutation module 430 , and a return/selection module 440 .

The forming module 410 is configured to assign at least one test channel to each sample, and all samples form a plurality of sample arrangement sets based on the assigned test channel; wherein, the samples are DNA sequences or RNA sequences to be tested;

A screening module 420, configured to screen out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;

The swap/mutation module 430 is configured to cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain a plurality of new sample permutation sets and using each new sample permutation set as a sample permutation set;

The returning/selection module 440 is configured to return to the operation of screening at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the one that meets the second set condition. The sample permutation set is used as the final sample permutation set.

Optionally, the screening module 420 is configured to determine the fitness of each sample arrangement set in the multiple sample arrangement sets, and filter out at least two sample arrangement sets whose fitness is greater than the first set value.

Optionally, the determining the fitness of each sample arrangement set in the multiple sample arrangement sets includes:

Based on the normalized value of the sample data volume in each sample arrangement set, the normalized value of the base ratio of the index sequence of the sample-matched oligonucleotide adaptor in each test channel, and the result of whether the sample-matched Index sequence is repeated The fitness of the sample permutation set.

Optionally, the normalized value based on the amount of sample data in each sample arrangement set, the normalized value of the base ratio of the oligonucleotide linker Index sequence of the sample in each test channel, and whether the Index sequence of the sample is The repeated results determine the fitness of the set of sample permutations, including:

Determine the fitness of the sample arrangement set based on the following formula;

fitness=A+B+C

Wherein, fitness is the fitness of the sample arrangement set; A is the normalized value of the sample data volume; B is the normalized value of the base ratio of the Index sequence of the sample in the test channel;

Among them, if the Index sequence of the sample is repeated, C is -1, and if there is no repetition of the Index sequence of the sample, C is 0.

Optionally, the device further includes a sequencing module for sequencing each sample in the final sample arrangement set.

The above apparatus can execute the method provided by any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.

FIG. 5 is a structural block diagram of a sample processing apparatus provided by an embodiment of the present application. As shown in FIG. 5 , the apparatus includes: a matching module 510 , a judgment module 520 , and a determination/sequencing module 530 .

Wherein, the matching module 510 is used to match the samples in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;

The judgment module 520 is used for judging whether the Index sequence matched by the samples in the test channel meets the set condition;

A determination/sequencing module 530, configured to determine if the sample is truly matched with the Index sequence, and sequence the sample based on the truly matched Index sequence.

Optionally, the judgment module 520 is used to judge whether the matched Index sequence satisfies the following conditions:

The ratio of bases in each position of the Index sequence matched by the sample in each of the test channels is greater than or equal to the preset ratio at the same time.

Optionally, the data volume of the samples in each of the test channels is within the first preset data range, and the difference between the data volumes of the samples between the multiple test channels is within the second preset data range. Inside.

FIG. 6 is a schematic structural diagram of a device provided by an embodiment of the present application. As shown in FIG. 6 , the device includes:

One or more processors 610, one processor 610 is taken as an example in FIG. 6;

memory 620;

The apparatus may further include: an input device 630 and an output device 640 .

The processor 610 , the memory 620 , the input device 630 and the output device 640 in the device may be connected by a bus or in other ways, and the connection by a bus is taken as an example in FIG. 6 .

As a non-transitory computer-readable storage medium, the memory 620 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a sample processing method in the embodiments of the present application (for example, the accompanying drawings). The forming module 410, the screening module 420, the exchange/mutation module 430 and the return/selection module 440 shown in FIG. 4, or the matching module 510, the judgment module 520 and the determination/sequencing module 530 shown in FIG. 5). The processor 610 executes various functional applications and data processing of the computer device by running the software programs, instructions and modules stored in the memory 620, that is, a sample processing method of the above method embodiment is implemented, namely:

Return to the operation of filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.

or,

The memory 620 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the computer equipment, and the like. Additionally, memory 620 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 620 may optionally include memory located remotely from the processor 610, and these remote memories may be connected to the terminal device through a network. Examples of such networks include, but are not limited to, the Internet, an intranet, a local area network, a mobile communication network, and combinations thereof.

The input device 630 may be used to receive input numerical or character information, and to generate key signal input related to user settings and function control of the computer device. The output device 640 may include a display device such as a display screen.

The embodiments of the present application provide a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements a sample processing method provided by the embodiments of the present application:

or,

Any combination of one or more computer-readable media may be employed. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (a non-exhaustive list) of computer readable storage media include: electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable Programmable Read Only Memory (EPROM or Flash), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this document, a computer-readable storage medium can be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

A computer-readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .

Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

Note that the above are only preferred embodiments of the present application and applied technical principles. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the protection scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and can also include more other equivalent embodiments without departing from the concept of the present application. The scope is determined by the scope of the appended claims.

Claims

A sample processing method comprising:

Allocate at least one test channel for each sample, and all samples form a plurality of sample arrangement sets based on the allocated test channel; wherein, the sample is the DNA sequence or RNA sequence to be tested;

Screening out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets;

Cross-exchange the test channels in every two sample permutation sets in the screened sample permutation sets, and mutate the test channels in the cross-exchanged sample permutation sets to obtain multiple new sample permutation sets and convert each A new sample permutation set is used as a sample permutation set;

Return to the operation of filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets, until the termination condition is reached, then select the sample arrangement set that meets the second set condition as the final sample arrangement gather.
The method according to claim 1, wherein the filtering out at least two sample arrangement sets that meet the first set condition from the plurality of sample arrangement sets comprises:

The fitness of each sample arrangement set in the multiple sample arrangement sets is determined, and at least two sample arrangement sets whose fitness is greater than the first set value are screened out.
The method according to claim 1, wherein, until the termination condition is reached, selecting a sample permutation set that meets the second set condition as the final sample permutation set, comprising:

Until the number of returns reaches the set number of times, or until the difference between the average fitness of the currently obtained multiple sample permutation sets and the average fitness of the multiple sample permutation sets obtained last time is within the preset difference range, then Select the sample permutation set with the largest fitness as the final sample permutation set from the multiple sample permutation sets of one or more generations;

Wherein, in the multiple sample arrangement sets of multiple generations, the difference between the average fitness of the sample arrangement sets of every two generations is within a preset range.
The method according to claim 2, wherein the determining the fitness of each sample permutation set in the plurality of sample permutation sets comprises:

Based on the normalized value of the sample data volume in each sample arrangement set, the normalized value of the base ratio of the index sequence of the sample-matched oligonucleotide adaptor in each test channel, and the result of whether the sample-matched Index sequence is repeated The fitness of the sample permutation set.
The method according to claim 4, wherein the normalized value based on the amount of sample data in each sample arrangement set and the base ratio of the oligonucleotide adapter Index sequence of the samples in each test channel is normalized The value and the result of whether the Index sequence of the sample is repeated determine the fitness of the sample arrangement set, including:

Determine the fitness of the sample arrangement set based on the following formula;

fitness=A+B+C

Wherein, fitness is the fitness of the sample arrangement set; A is the normalized value of the sample data volume; B is the normalized value of the base ratio of the Index sequence of the sample in the test channel;

Among them, if the Index sequence of the sample is repeated, C is -1, and if there is no repetition of the Index sequence of the sample, C is 0.
The method of claim 1, further comprising: sequencing each sample in the final sample permutation set.
A sample processing method comprising:

Match the sample in each test channel of the sequencing chip to the Index sequence; wherein, the sample is the DNA sequence or RNA sequence to be tested;

Determine whether the Index sequence matched by the sample in the test channel meets the set condition;

If so, it is determined that the sample truly matches the Index sequence, and the sample is sequenced based on the truly matched Index sequence.
The method according to claim 7, wherein the judging whether the Index sequence matched by the samples in the test channel meets a set condition comprises:

Describe whether the matched Index sequence satisfies the following conditions:

There is no repetition of the Index sequence matched by the samples in each of the test channels;

The ratio of bases in each position of the Index sequence matched by the sample in each of the test channels is greater than or equal to the preset ratio at the same time.
The method of claim 7 or 8, wherein,

The data amount of the samples in each of the test channels is within the first preset data range, and the difference between the data amounts of the samples among the plurality of test channels is within the second preset data range.
A device comprising:

one or more processors;

storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-9.