CN115775591A - Primer design method, device, equipment and readable storage medium - Google Patents

Primer design method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN115775591A
CN115775591A CN202310034710.5A CN202310034710A CN115775591A CN 115775591 A CN115775591 A CN 115775591A CN 202310034710 A CN202310034710 A CN 202310034710A CN 115775591 A CN115775591 A CN 115775591A
Authority
CN
China
Prior art keywords
target
primer
gene
mutation
gene sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310034710.5A
Other languages
Chinese (zh)
Other versions
CN115775591B (en
Inventor
张家兵
李�杰
苏艳芳
黄君君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Huace Aipu Medical Laboratory Co ltd
Centre Testing International Group Co ltd
Original Assignee
Shanghai Huace Aipu Medical Laboratory Co ltd
Centre Testing International Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Huace Aipu Medical Laboratory Co ltd, Centre Testing International Group Co ltd filed Critical Shanghai Huace Aipu Medical Laboratory Co ltd
Priority to CN202310034710.5A priority Critical patent/CN115775591B/en
Publication of CN115775591A publication Critical patent/CN115775591A/en
Application granted granted Critical
Publication of CN115775591B publication Critical patent/CN115775591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application belongs to the technical field of gene capture, and provides a primer design method, a device, equipment and a readable storage medium. The method can solve the problems of poor specificity and low sensitivity of the traditional primer in the prior art. The method comprises the steps of obtaining a target gene database, wherein the target gene database comprises a target gene and hotspot mutation information of the target gene, and the hotspot mutation information comprises: mutation site position information, mutation sequences and replacement sequences; carrying out primer design on a target gene to generate a target wild type primer; carrying out mutation treatment on the gene sequence of the target wild type primer according to the hotspot mutation information to generate a target mutant type primer; converting the target wild type primer and the target mutant type primer into target primers, and determining a target primer set as a target primer pool.

Description

Primer design method, device, equipment and readable storage medium
Technical Field
The application belongs to the technical field of gene capture, and particularly relates to a primer design method, a device, equipment and a readable storage medium.
Background
At present, when gene capture is performed based on Polymerase Chain Reaction (PCR) technology, the primers used are designed based on deoxyribonic Acid (DNA) sequences on a reference genome. Wherein the sequence of the DNA in the reference genome is detected by the human genome project, and comprises the genetic group genes of most life bodies. However, due to individual differences among different organisms, mutations may be generated to varying degrees based on a reference genome. Therefore, the primer designed based on the reference genome in the prior art is prone to have poor specificity and low sensitivity when capturing the target region fragment generating mutation.
Disclosure of Invention
In view of this, embodiments of the present application provide a primer design method, apparatus, device and readable storage medium, so as to solve the problems of poor specificity and low sensitivity of the conventional primer in the prior art.
A first aspect of an embodiment of the present application provides a method for designing a primer, including: acquiring a target gene database, wherein the target gene database comprises a target gene and hotspot mutation information of the target gene, and the hotspot mutation information comprises: mutation site position information, a mutation sequence and a substitution sequence; carrying out primer design on a target gene to generate a target wild type primer; carrying out mutation treatment on the gene sequence of the target wild type primer according to the hotspot mutation information to generate a target mutant type primer; converting the target wild type primer and the target mutant type primer into target primers, and determining a target primer set as a target primer pool.
With reference to the first aspect, in a first possible implementation manner of the first aspect, when the primer design is performed on the target gene, the reference genome of the primer is hg19; the maximum number of mismatches of the primer is 3; the minimum length of the primer is 18, and the maximum length of the primer is 25; the minimum GC proportion of the primers is 20, and the maximum GC proportion is 80.
With reference to the first aspect, in a second possible implementation manner of the first aspect, the primer design is performed on the target gene to generate a target wild-type primer, and the method includes: carrying out primer design on a target gene to obtain an initial wild type gene sequence; screening an initial wild type gene sequence according to a first preset standard to generate a target wild type primer; wherein, the first preset standard is as follows: RANK 1; duplicate original wild-type gene sequences were pooled.
With reference to the first aspect, in a third possible implementation manner of the first aspect, performing mutation processing on a gene sequence of a target wild-type primer according to hotspot mutation information to generate a target mutant-type primer includes: carrying out mutation treatment on the gene sequence of the target wild type primer according to the hotspot mutation information to obtain an initial mutant type gene sequence; screening the initial mutant type gene sequence based on a second preset standard to obtain a target mutant type primer, wherein the second preset standard is as follows: the mutation site is located on the initial mutant gene sequence; the length of the initial mutant gene sequence is 17-24bp; the repeated initial mutant gene sequences were pooled.
With reference to the first aspect, in a fourth possible implementation manner of the first aspect, when the gene sequence corresponding to the target wild-type primer is a sense strand, performing mutation processing on the gene sequence of the target wild-type primer according to the hotspot mutation information to obtain an initial mutant-type gene sequence includes: and replacing the mutation sequence in the sense chain with a replacement sequence according to the position information of the mutation site to obtain an initial mutant gene sequence.
With reference to the first aspect, in a fifth possible implementation manner of the first aspect, when the gene sequence corresponding to the target wild-type primer is an antisense strand, performing mutation processing on the gene sequence of the target wild-type primer according to the hot spot mutation information to obtain an initial mutant type gene sequence, includes: generating an anti-complementary gene sequence of the antisense strand; replacing the mutant sequence in the anti-complementary gene sequence with a replacement sequence according to the position information of the mutant site; and carrying out reverse complementary treatment on the replaced reverse complementary gene sequence to obtain an initial mutant gene sequence.
With reference to the first aspect, in a sixth possible implementation manner of the first aspect, the converting the target wild-type primer and the target mutant-type primer into the target primer includes: adding a promoter at one end of the gene sequence of the target wild type primer and the target mutant type primer, and adding a gene sequence of a tail stem-loop structure at the other end of the gene sequence of the target wild type primer and the target mutant type primer to form a first primer; screening the first primer based on a third preset standard to obtain a target primer; the third preset criterion is: the gene sequence of the first primer is a hot spot mutation gene sequence; the length of the gene sequence of the first primer is 17-24bp; the Self-completeness is less than or equal to 1; the number of mismatches was 0; the prototype spacer adjacent motif PAM was removed.
A second aspect of embodiments of the present application provides a primer design apparatus, including: the acquisition unit is used for acquiring a target gene database, the target gene database comprises a target gene and hotspot mutation information of the target gene, and the hotspot mutation information comprises: mutation site position information, mutation sequences and replacement sequences; the primer design unit is used for carrying out primer design on a target gene to generate a target wild type primer; the mutation processing unit is used for carrying out mutation processing on the gene sequence of the target wild type primer according to the hotspot mutation information to generate a target mutant type primer; and the determining unit is used for converting the target wild type primer and the target mutant type primer into a target primer and determining the target primer set as a target primer pool.
A third aspect of embodiments of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method according to any one of the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the steps of the method according to any one of the first aspects.
Compared with the prior art, the embodiment of the application has the beneficial effects that:
the target primer pool obtained by the primer design method provided by the embodiment of the application further comprises a target mutant primer on the basis of the target wild type primer. Because the mutant primers are determined according to the hotspot mutation information of the target genes in the target gene database, the addition of the target mutant primers can specifically combine with the mutated gene fragments in the gene capturing process, so that the target capturing of low-frequency mutation is realized, the sensitivity and specificity are improved, the capturing efficiency is further improved, the sample sequencing depth is reduced, and the sequencing cost is effectively saved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a schematic diagram showing the composition of a target gene database provided in the examples of the present application;
FIG. 2 is a schematic diagram of the structure of a mutation site of a target gene provided in the examples of the present application;
FIG. 3 is a schematic flow chart of a primer design method provided in the examples of the present application;
FIG. 4 is a schematic diagram of the process of mutation of a sense strand gene sequence provided in the examples herein;
FIG. 5 is a schematic diagram showing a process of mutating an antisense strand gene sequence provided in the examples of the present application;
FIG. 6 is a schematic diagram of a target primer structure provided in an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a primer design apparatus provided in an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The technical solutions provided in the present application are explained in detail below with reference to specific examples.
A primer (primer) refers to a macromolecule having a specific nucleotide sequence, which is synthesized by stimulation at the start of nucleotide polymerization, and is widely used in polymerase chain reaction, sequencing, probe synthesis, and the like. For example, in the second generation sequencing target region capture technology, when performing gene capture based on PCR amplification technology, more than two pairs of primers are usually added to the target reaction system, and the primers can pair with the base pairs of the template DNA at the corresponding positions, i.e., A and T pairs, and C and G pairs. Under the reaction condition of PCR, each pair of primers can be combined with a target region of a DNA template, and the reaction is initiated to amplify a plurality of nucleic acid fragments, so that the target region fragments in a target reaction system are captured and enriched.
In the second generation sequencing target region capture technology, the gene sequence of the target region fragment and the gene sequence of the primer thereof are partially or completely complementary.
For example, if the two gene sequences after the de-rotation of the target region fragment to be captured are:
5'-ATTGCGTTATTTGGGGCCCCTTTCG-3'; and
3’-TAACGCAATAAACCCCGGGGAAAGC-5’。
then, the two primers corresponding to the target region slice can be:
3'-TAACGCAATAAACCCCGGGGAAAGC-5'; and the combination of (a) and (b),
5’-ATTGCGTTATTTGGGGCCCCTTTCG-3’。
currently, the primers used in PCR technology are usually designed based on DNA sequences on a reference genome. Wherein the sequence of the DNA in the reference genome is detected by the human genome project, and comprises the genetic group genes of most life bodies. However, due to individual differences between different organisms, mutations may be generated to varying degrees based on the reference genome. Therefore, the primer designed based on the reference genome in the prior art is prone to have poor specificity and low sensitivity when capturing the target region fragment generating mutation.
In order to solve the problems that a primer in the traditional technology is poor in specificity and low in sensitivity when capturing a target region fragment generating mutation, the embodiment of the application provides a primer design method.
The primer design method provided in the embodiment of the application is applied to electronic equipment and comprises the following three parts: creating a target gene database; (II) optimizing primer design software parameters; and (III) designing a primer. The contents of the above three parts will be described below.
The electronic device creates a target gene database.
In this embodiment, the target gene database is a collection of target genes, as shown in FIG. 1, including several common target genes related to pathogenic genes. The target gene database also comprises basic parameters corresponding to all target genes. Wherein, the basic parameters include the nucleic acid type of the target gene and the Chromosome (Chr) position of the target gene in the nucleic acid type.
In addition, the target gene database also comprises a target gene and hotspot mutation information of the target gene, wherein the hotspot mutation information comprises: mutation site position information, mutation sequence and replacement sequence. The mutation site position information comprises the start point position coordinates of a target gene corresponding to the hot spot mutation gene, the start point coordinates of the mutation site position of the hot spot mutation gene in the corresponding target gene and the termination coordinates of the mutation site position. After the target gene is mutated, the mutated target gene comprises a normal gene and a hot spot mutation gene. It will be appreciated that mutations may occur at every base on the sequence of a DNA molecule, but in practice the sites of mutation are not completely randomly distributed. The DNA molecules have different mutation frequencies in each part, i.e., the mutation frequencies in some parts of the DNA molecules are much higher than the average, and these parts are called hot spot mutations. For example, referring to fig. 2 and table 1 below, if fig. 2 shows the gene structure of target gene 1 of chromosome 7 in table 1 after mutation at the hot spot mutation position, the start point position coordinate of target gene 1 of chromosome 7 is 7. Therefore, after determining each target gene related to the pathogenic gene in the target gene database, the mutation sequence and the replacement sequence corresponding to the target gene, and the position information of the mutation site of the mutation sequence and the replacement sequence in the corresponding target gene can be determined.
In one example, referring to table 1, the target gene database includes:
TABLE 1
Figure 499340DEST_PATH_IMAGE001
In some embodiments, the electronic device may mine a desired target gene related to a disease-causing gene from a Cosmic-genetic resource database according to identification information of common disease-causing gene related genes input by a user, and finally create a target gene database of the target gene based on all the mined target genes. In addition, the electronic device can also find a specific gene in the Cosmic-genetic resource database according to a target region segment detected or captured by actual needs, and add the specific gene to the target gene database.
And (II) optimizing the primer design software parameters by the electronic equipment.
In this embodiment, the primers are designed based on chopchopchopchop software, and parameters in the software need to be optimized before designing the primers.
In the chopchopchopchop software, the optimization parameters that are usually required to be set include: pypochops. Py-Target $ { region } -J-P-T1-MNGG-maxMismatches 3-g 20-scarifying method doench _ 2016-backboneAGGCTAGTCCGT-Ghg-tCODING-nN-R4-a 20-filterGCmin20-filterGCmax80-3'product \/size min =150, product \/size max =290, primer min \/size min =18, primer max \/size =25, primer imu opt u/size =22, primu \/min \/tm 57, primer \/max \/63, primer \/u \/opt 60'.
In this embodiment, the chopchopchop primer design software is optimized by the following parameters:
-Target, target gene or region;
j, creating a visualization file using json;
-P, designing primers using Primer3 to identify mutations;
-T, select mode, default crisp sper/Cas9:1;
-M, PAM mode: NGG;
-maxMismatches, maximum number of mismatches: 3;
-g, guide RNA size: 20;
-ScoringMethod, scoring method: DOENCH _2016;
-backbone, penalizing the self-complementary region with respect to the backbone region: AGGCTAGTCCGT;
-G, reference genome: hg19;
t, target region type, default coding region: CODING;
-n, restriction enzyme company: n;
-R, restriction enzyme binding region minimum length: 4;
a, minimum distance between primer and target site: 20;
-filterGCmin, minimum GC fraction (default 0): 20;
-filterGCmax, maximum GC fraction (default 100): 80;
-3, primer option:
PRODUCT _ SIZE _ MIN, PRODUCT minimum length: 150;
PRODUCT _ SIZE _ MAX, maximum length of PRODUCT: 290, respectively;
PRIMER _ MIN _ SIZE, PRIMER minimum length: 18;
prime _ MAX _ SIZE, maximum PRIMER length: 25;
prime _ OPT _ SIZE, PRIMER optimal length: 22;
PRIMER _ MIN _ TM, PRIMER minimum annealing temperature: 57;
PRIMER _ MAX _ TM, maximum annealing temperature of PRIMERs: 63;
prime _ OPT _ TM, optimal annealing temperature for PRIMERs: 60.
in this embodiment, the electronic device optimizes the chopchop primer design software according to the parameters, and after the optimization is completed, the chop primer design software can design the primers according to the set parameters.
It should be noted that after the electronic device completes optimization of the chopchop software through the parameters, the chop software can complete the primer design for each target gene in the target gene database. However, when the electronic device performs primer design through the chopchopchop software, only one target gene can be processed each time, and the target gene database has a plurality of target genes, and when the target genes in the target gene database need to be processed in batch, the chop software after parameter optimization needs to be optimized again in batch design.
In some embodiments, the electronic device may implement the batch design functionality of the chopchopchopchop software based on the batch design primer code entered by the user.
After the chopchopchopchop software is optimized in batch design, the primers can be designed in batch for the target genes in the target gene database, so that the efficiency of primer design is improved.
In this embodiment, optimized chopchopchopchopchop software capable of batch primer design is referred to as target software. The electronic device is designed based on the target software when performing the subsequent primer design operation.
(III) design of primers by electronic device
FIG. 3 is a flowchart of a primer design method provided in the embodiment of the present application, and referring to FIG. 3, the method includes the following steps S301-S304.
S301, the electronic equipment acquires a target gene database, wherein the target gene database comprises basic parameters of a target gene and hotspot mutation information of the target gene.
It should be emphasized that the basic parameters of the target gene involved in this example include: a nucleic acid type of a target gene and a chromosomal location of the target gene in the nucleic acid type; and hot spot mutation genes corresponding to each target gene and coordinate information of the hot spot mutation genes.
The hotspot mutation information comprises: mutation site position information, mutation sequence and replacement sequence.
S302, the electronic equipment determines a wild type primer pool based on basic parameters of the target gene and a first preset standard.
The pool of wild type primers refers to the collection of target wild type primers.
The electronic equipment carries out batch design on target genes in a target gene database based on input parameters (such as a reference genome, a minimum GC ratio, a minimum distance between a primer and a target site and the like) in target software and basic parameters (such as coordinate information of the target genes) of the target genes to obtain an initial wild type gene sequence; and then, screening the initial wild type gene sequence by the electronic equipment based on a first preset standard to obtain target wild type primers, and determining a set of all the obtained target wild type primers as a wild type primer pool.
In this embodiment, the first preset criterion includes:
1) RANK rating is 1.
2) Duplicate original wild-type gene sequences were pooled.
The RANK grade represents a recommended grade and is used for representing an output result of chopchopchop software, and the RANK grade of 1 represents an optimal sequence; the merged repeated gene sequence indicates that the electronic device needs to merge the initial wild-type gene sequence with the repeated gene sequence when screening the initial wild-type gene sequence through a first preset standard.
After the electronic device determines the target wild-type primer through the target software and a first preset standard, it can be determined whether the gene sequence corresponding to the target wild-type primer is a sense strand sequence or an antisense strand sequence, that is, the gene sequence corresponding to the target wild-type primer in the wild-type primer pool includes a target wild-type gene sequence (+) and a target wild-type gene sequence (-), wherein the target wild-type gene sequence (+) means that the gene sequence corresponding to the target wild-type primer is a sense strand, and the target wild-type gene sequence (-) means that the gene sequence corresponding to the target wild-type primer is an antisense strand. The sense strand (which may also be referred to as the coding strand, sense strand or plus strand (+)) refers to a strand of DNA carrying a nucleotide sequence encoding amino acid information of a protein; the antisense strand (which may also be referred to as the reverse strand (-)) refers to a strand whose nucleotide sequence is complementary to the sense strand.
S303, the electronic equipment processes the target wild type primer in the wild type primer pool according to the hotspot mutation information of the target gene and a second preset standard to determine a mutant type primer pool.
It is understood that each target gene in the target gene database has a corresponding hot spot mutation site based on individual differences of different living bodies. In this embodiment, the target wild-type primers in the wild-type primer pool are processed according to the hot spot mutant genes corresponding to the target genes and the coordinate information of the hot spot mutant genes, so as to determine the mutant-type primer pool.
The mutant primer pool refers to a collection of target mutant primers. The mode of determining the target mutant primer will be explained below.
First, the electronic device determines an initial mutant gene sequence.
Based on the fact that the primer and the target gene in the embodiment are partially or completely complementary, after the target gene is mutated, the primer also needs to be mutated in a corresponding mode, so that the electronic device can mutate the target wild-type primer according to the hotspot mutation information of the target gene to obtain the target mutant-type primer. In other words, the electronic device may take the hot spot mutation information of the target gene as the hot spot mutation information of the target wild-type primer, and mutate the target wild-type primer to obtain the target mutant-type primer.
In this embodiment, the mutation operation of the electronic device on each target wild-type primer in the wild-type primer pool includes two different processing modes:
when the gene sequence corresponding to the target wild-type primer is a sense strand, that is, the gene sequence corresponding to the target wild-type primer is a target wild-type gene sequence (+), referring to fig. 4, the electronic device first obtains the target wild-type gene sequence (+) 5'-TTTTGGGCTGGCCAAACTGCTGG-3' corresponding to the target wild-type primer, and then replaces the corresponding mutation site on the target wild-type primer according to the coordinate information of the hotspot mutation gene sequence in the target wild-type primer and the replacement (alt) sequence, that is, replaces the target wild-type gene sequence (+) 5'-TTTTGGGCTGGCCAAACTGCTGG-3' corresponding to the target wild-type primer, so as to obtain the initial gene sequence 5'-TTTTGGGAGGGCCAAACTGCTGG-3'.
When the gene sequence corresponding to the target wild-type primer is an antisense strand, i.e., the gene sequence corresponding to the target wild-type primer is a target wild-type gene sequence (-), referring to fig. 5, the electronic device first obtains the target wild-type gene sequence (-) 5'-GTCCACGCTGGCCATCACGTAGG-3' corresponding to the target wild-type primer; then generating the anti-complementary gene sequences 5'-CCTACGTGATGGCCAGCGTGGAC-3' and 3'-GGATGCACTACCGGTCGCACCTG-5' of the target wild type gene sequence (-) according to the target wild type gene sequence (-); further, determining position coordinate information and a replacement sequence of the hotspot mutant gene sequence in the anti-complementary gene sequence, and obtaining 5'-CCTTCCAGGAAGCCTACGTGATGGCCAGCGTGGAC-3' and 3'-GGAAGGTCCTTCGGATGCACTACCGGTCGCACCTG-5' after replacement; finally, reverse complementary treatment is carried out to obtain an initial mutant gene sequence 5'-GTCCACGCTGGCCATCACGTAAGGTCCTTCGGAGG-3'.
And finally, the electronic equipment screens the initial mutant gene sequence based on a second preset standard to obtain target mutant primers, and then determines the set of all the obtained target mutant primers as a mutant primer pool.
In this embodiment, the second preset criterion includes:
1) The mutation site is located on the initial mutant gene sequence;
2) The length of the initial mutant gene sequence is 17-24bp;
3) Duplicate initial mutant gene sequences were pooled.
Wherein, the mutation site is positioned on the gene sequence corresponding to the target mutant primer, and the obtained gene sequence corresponding to the target mutant primer has a hot spot mutation gene sequence; the merged repeated gene sequences indicate that when the electronic device screens the initial mutant gene sequences through a second preset standard, the initial mutant gene sequences with repeated gene sequences need to be merged.
After the electronic equipment determines the target mutant type primer, the primer can capture the target region fragment generating mutation in the second generation sequencing target region capture technology.
For example, if the two gene sequences after the de-rotation of the target region fragment to be captured are:
5'-ATTGCGTTATTTGGGGCCCCTTTCG-3'; and
3’-TAACGCAATAAACCCCGGGGAAAGC-5’。
in the gene sequence 5'-ATTGCGTTATTTGGGGCCCCTTTCG-3', experiments show that the GCG sequence is easy to mutate into the GAC sequence. The primer designed by the traditional method is
3'-TAACGCAATAAACCCCGGGGAAAGC-5'; therefore, the primer designed by the traditional method can not capture the 5'-ATTGACTTATTTGGGGCCCCTTTCG-3' sequence because the GCG sequence is easy to mutate into the GAC sequence. The obtained primer sequence is 3'-TAACTGAATAAACCCCGGGGAAAGC-5' by the method, and the capture of the mutated 5'-ATTGACTTATTTGGGGCCCCTTTCG-3' sequence can be realized by the primer sequence.
S304, the electronic equipment determines a mixed type primer pool according to the wild type primer pool and the mutant type primer pool.
And the electronic equipment mixes all target wild type primers and target mutant primers in the wild type primer pool and the mutant type primer pool, and determines a set of all mixed primers as a mixed type primer pool.
S305, the electronic equipment determines a primer structure, and screens gene sequences in the mixed primer pool according to a third preset standard to determine a target primer.
First, the electronic device performs primer structure design on gene sequences corresponding to all primers in the mixed primer pool, that is, gene sequences corresponding to all primers in the mixed primer pool are added with gene sequences of a promoter and a tail stem-loop structure according to the primer structure, so as to obtain a first primer, which is shown in fig. 6.
Wherein, the promoter can be T7 promoter, and the gene sequence of the T7 promoter is:
5'-TAATACGACTCACTATAGGG-3'。
the gene sequence of the tail stem-loop structure is as follows:
5'-GTTTTAGAGCTAGAAATAGCAAGTTAAAATAAGGCTAGTCCGTTATCAACTTGAAAAAGTGGCACCGAGTCGGTGCTTTTTTT-3'。
then, the electronic equipment screens the first primers based on a third preset standard to obtain target primers, and then determines the set of all the obtained target primers as a target type primer pool.
In this embodiment, the third preset criterion includes:
1) The gene sequence of the first primer is a common hot spot mutation gene sequence;
2) The length of the gene sequence of the first primer is 17-24bp;
3) The Self-completeness is less than or equal to 1; wherein Self-complementarity indicates the complementarity within the primer and between the primer and the standard backbone sequence.
4) The number of mismatches was 0;
5) And removing the PAM.
Wherein, the mutation site is positioned on the gene sequence corresponding to the target mutation type primer and expressed, and the obtained gene sequence corresponding to the target primer comprises a common hot spot mutation gene sequence; the length of the gene sequence corresponding to the target primer is 17-24bp; the Self-complementation is less than or equal to 1, the number of mismatches (mismatches) is 0, and the PAM removal means that a prototype spacer adjacent motif (PAM) in the obtained target primer is removed.
It should be understood that the CRISPR-Cas9 system developed by the research of a bacterial defense system against foreign plasmid or phage infection based on Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and CRISPR associated nucleases (CRISPR associated proteins, cas) can realize gene editing and gene capture in cells and organisms.
The target primer obtained by the primer design method provided in the embodiment of the application can be applied to the system. When gene capture is performed through the system, the target primers in the target primer pool provided in this embodiment are used as reaction parameters, the target primers are added into a target reaction system of the CRISPR-Cas9 system, and a reaction is initiated through the target primers to amplify nucleic acid fragments of a plurality of target regions, so as to capture and enrich target region fragments in the target reaction system. After the target primers in the target primer pool are applied to the CRISPR-Cas9 system, the specificity and the sensitivity of a gene capture technology can be obviously improved, and the primer cost is reduced.
It should be noted that, by the method steps S301 to S303 provided in the above embodiment, the target primer in the obtained mixed primer pool can be applied to gene capture, and in the embodiment of the present application, on the basis of the method, S304 is added, so that the target primer obtained by further processing in step S304 can be applied to the CRISPR-Cas9 system, and thus, based on the characteristics of high specificity and high sensitivity of the CRISPR-Cas9 system, the specificity and sensitivity of the gene capture technology can be further improved.
The target primer pool obtained by the primer pool design method provided by the embodiment of the application further comprises a target mutant type primer on the basis of the target wild type primer. The addition of the target mutant type primer can specifically combine with the variant gene fragment in the gene capturing process, so that the target capturing of low-frequency mutation is realized, the sensitivity and specificity are improved, the capturing efficiency is further improved, the sample sequencing depth is reduced, and the sequencing cost is effectively saved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 7 is a schematic diagram of a primer design apparatus provided in an embodiment of the present application, and as shown in fig. 7, the apparatus includes: the acquisition unit is used for acquiring a target gene database, the target gene database comprises a target gene and hotspot mutation information of the target gene, and the hotspot mutation information comprises: mutation site position information, mutation sequences and replacement sequences; a primer design unit, which is used for carrying out primer design on a target gene to generate a target wild type primer; the mutation processing unit is used for carrying out mutation processing on the gene sequence of the target wild type primer according to the hotspot mutation information to generate a target mutant type primer; and the determining unit is used for converting the target wild type primer and the target mutant type primer into a target primer and determining the set of the target primers as a target primer pool.
Fig. 8 is a schematic diagram of an electronic device according to an embodiment of the present application. As shown in fig. 8, the electronic apparatus 8 of this embodiment includes: a processor 80, a memory 81 and a computer program 82, such as a program of a primer design method, stored in said memory 81 and executable on said processor 80. The processor 80 implements the steps in each of the primer design method embodiments described above when executing the computer program 82. Alternatively, the processor 80 implements the functions of the modules/units in the above-described device embodiments when executing the computer program 82.
Illustratively, the computer program 82 may be partitioned into one or more modules/units that are stored in the memory 81 and executed by the processor 80 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing certain functions, which are used to describe the execution of the computer program 82 in the electronic device 8.
The electronic device 8 may be a tablet computer, a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The electronic device may include, but is not limited to, a processor 80, a memory 81. Those skilled in the art will appreciate that fig. 8 is merely an example of an electronic device 8 and does not constitute a limitation of the electronic device 8 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the electronic device may also include input-output devices, network access devices, buses, etc.
The Processor 80 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 81 may be an internal storage unit of the electronic device 8, such as a hard disk or a memory of the electronic device 8. The memory 81 may also be an external storage device of the electronic device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the electronic device 8. The memory 81 is used for storing the computer program and other programs and data required by the electronic device. The memory 81 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/electronic device and method may be implemented in other ways. For example, the above-described apparatus/electronic device embodiments are merely illustrative, and for example, the division of the modules or units is only one type of logical function division, and other division manners may exist in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by hardware related to instructions of a computer program, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the methods described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media which may not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method of primer design, comprising:
obtaining a target gene database, wherein the target gene database comprises a target gene and hotspot mutation information of the target gene, and the hotspot mutation information comprises: mutation site position information, mutation sequences and replacement sequences;
carrying out primer design on the target gene to generate a target wild type primer;
carrying out mutation treatment on the gene sequence of the target wild type primer according to the hotspot mutation information to generate a target mutant type primer;
and converting the target wild type primer and the target mutant type primer into target primers, and determining a set of the target primers as a target primer pool.
2. The method according to claim 1, wherein in designing a primer for the target gene,
the reference genome of the primer is hg19;
the maximum number of mismatches of the primers is 3;
the minimum length of the primer is 18, and the maximum length of the primer is 25;
the minimum GC proportion of the primers is 20, and the maximum GC proportion is 80.
3. The method of claim 1 or 2, wherein the primer design of the target gene to generate a target wild-type primer comprises:
carrying out primer design on the target gene to obtain an initial wild type gene sequence;
screening the initial wild type gene sequence according to a first preset standard to generate the target wild type primer;
wherein the first preset standard is as follows:
RANK 1;
combining the repeated original wild-type gene sequences.
4. The method according to claim 1, wherein the mutating the gene sequence of the target wild-type primer according to the hot spot mutation information to generate a target mutant-type primer comprises:
carrying out mutation treatment on the gene sequence of the target wild type primer according to the hotspot mutation information to obtain an initial mutant gene sequence;
screening the initial mutant type gene sequence based on a second preset standard to obtain the target mutant type primer,
the second preset standard is as follows:
the mutation site is located on the initial mutant gene sequence;
the length of the initial mutant gene sequence is 17-24bp;
combining the repeated initial mutant gene sequences.
5. The method according to claim 4, wherein when the gene sequence corresponding to the target wild-type primer is a sense strand, the mutating the gene sequence of the target wild-type primer according to the hot spot mutation information to obtain an initial mutant gene sequence comprises:
and replacing the mutation sequence in the sense chain with the replacement sequence according to the position information of the mutation site to obtain the initial mutant gene sequence.
6. The method as claimed in claim 4, wherein when the gene sequence corresponding to the target wild-type primer is an antisense strand, the mutating the gene sequence of the target wild-type primer according to the hot spot mutation information to obtain an initial mutant gene sequence comprises:
generating an anti-complementary gene sequence of the antisense strand;
replacing the mutant sequence in the anti-complementary gene sequence with the replacement sequence according to the position information of the mutant site;
and carrying out reverse complementary treatment on the replaced reverse complementary gene sequence to obtain the initial mutant gene sequence.
7. The method of claim 1, wherein converting the target wild-type primer and the target mutant primer into a target primer comprises:
adding a promoter at one end of the gene sequence of the target wild type primer and the target mutant type primer, and adding a tail stem-loop structure at the other end of the gene sequence to form a first primer;
screening the first primer based on a third preset standard to obtain the target primer;
the third preset standard is as follows:
the gene sequence of the first primer is a hot spot mutation gene sequence;
the length of the gene sequence of the first primer is 17-24bp;
Self-complementarity≤1;
the number of mismatches was 0;
the prototype spacer adjacent motif PAM was removed.
8. A primer design apparatus, comprising:
an obtaining unit, configured to obtain a target gene database, where the target gene database includes a target gene and hotspot mutation information of the target gene, and the hotspot mutation information includes: mutation site position information, a mutation sequence and a substitution sequence;
the primer design unit is used for carrying out primer design on the target gene to generate a target wild type primer;
a mutation processing unit, configured to perform mutation processing on the gene sequence of the target wild-type primer according to the hotspot mutation information, so as to generate a target mutant-type primer;
and the determining unit is used for converting the target wild type primer and the target mutant type primer into target primers and determining the set of the target primers as a target primer pool.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.
CN202310034710.5A 2023-01-10 2023-01-10 Primer design method, device, equipment and readable storage medium Active CN115775591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310034710.5A CN115775591B (en) 2023-01-10 2023-01-10 Primer design method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310034710.5A CN115775591B (en) 2023-01-10 2023-01-10 Primer design method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN115775591A true CN115775591A (en) 2023-03-10
CN115775591B CN115775591B (en) 2023-06-09

Family

ID=85393341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310034710.5A Active CN115775591B (en) 2023-01-10 2023-01-10 Primer design method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN115775591B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL117350A0 (en) * 1995-03-09 1996-07-23 Procter & Gamble Proteinase k variants having decreased adsorption and increased hydrolysis
CN101235415A (en) * 2007-01-30 2008-08-06 中山大学达安基因股份有限公司 Method for detecting gene simple point mutation by using TaqMan probe quantitative polymerase chain reaction technique
CN108179188A (en) * 2017-12-15 2018-06-19 苏州药明泽康生物科技有限公司 A kind of novel agent box for detecting gene mutation
CN115011672A (en) * 2022-06-30 2022-09-06 重庆邮电大学 Ultralow frequency gene mutation detection method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL117350A0 (en) * 1995-03-09 1996-07-23 Procter & Gamble Proteinase k variants having decreased adsorption and increased hydrolysis
CN101235415A (en) * 2007-01-30 2008-08-06 中山大学达安基因股份有限公司 Method for detecting gene simple point mutation by using TaqMan probe quantitative polymerase chain reaction technique
CN108179188A (en) * 2017-12-15 2018-06-19 苏州药明泽康生物科技有限公司 A kind of novel agent box for detecting gene mutation
CN115011672A (en) * 2022-06-30 2022-09-06 重庆邮电大学 Ultralow frequency gene mutation detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘欣: "钙通道CACNA1C基因突变致心原性猝死相关早期复极的分子遗传学机制探讨" *

Also Published As

Publication number Publication date
CN115775591B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
Etter et al. Local de novo assembly of RAD paired-end contigs using short sequencing reads
Dabney et al. Length and GC-biases during sequencing library amplification: a comparison of various polymerase-buffer systems with ancient and modern DNA sequencing libraries
Xie et al. sgRNAcas9: a software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites
Nachmanson et al. Targeted genome fragmentation with CRISPR/Cas9 enables fast and efficient enrichment of small genomic regions and ultra-accurate sequencing with low DNA input (CRISPR-DS)
Chin et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data
Kozich et al. Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform
KR20190117529A (en) Method and system for generation and error correction of unique molecular index sets with heterogeneous molecular length
Jørgensen et al. Comparative analysis of protein coding sequences from human, mouse and the domesticated pig
Moreton et al. A consensus approach to vertebrate de novo transcriptome assembly from RNA-seq data: assembly of the duck (Anas platyrhynchos) transcriptome
Zhao et al. CNVannotator: a comprehensive annotation server for copy number variation in the human genome
Marinov et al. Pitfalls of mapping high-throughput sequencing data to repetitive sequences: Piwi’s genomic targets still not identified
Rong et al. Mutational bias and the protein code shape the evolution of splicing enhancers
CN107429298B (en) Method for designing primer for polymerase chain reaction and primer combination
JP2021536612A (en) Detection of genetic variants based on merged and unmerged reads
Zhang et al. Classification and clustering of RNA crosslink-ligation data reveal complex structures and homodimers
CN115775591B (en) Primer design method, device, equipment and readable storage medium
Forsberg et al. CLC Bio Integrated Platform for Handling and Analysis of Tag Sequencing Data
Yang et al. A noise trimming and positional significance of transposon insertion system to identify essential genes in Yersinia pestis
Ray Cancer Identification and Gene Classification using DNA Microarray Gene Expression Patterns
CN115762628A (en) Detection method and detection device for gene progressive infiltration among biological populations
CN112634990B (en) Method for screening PCR primer design template and application
Li et al. UMARS: un-mappable reads solution
Kamboj et al. Ub-ISAP: a streamlined UNIX pipeline for mining unique viral vector integration sites from next generation sequencing data
D’Agaro New advances in NGS technologies
JP2019527443A (en) Computer mounting method for designing synthetic DNA, terminal, system and computer readable medium for designing synthetic DNA

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant