CN105260395B - The storage of STR data and paternity test sequence comparison method based on inverted index structure - Google Patents
The storage of STR data and paternity test sequence comparison method based on inverted index structure Download PDFInfo
- Publication number
- CN105260395B CN105260395B CN201510590067.XA CN201510590067A CN105260395B CN 105260395 B CN105260395 B CN 105260395B CN 201510590067 A CN201510590067 A CN 201510590067A CN 105260395 B CN105260395 B CN 105260395B
- Authority
- CN
- China
- Prior art keywords
- str
- data
- sample
- index
- inverted index
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
Abstract
The invention discloses a kind of STR data storage based on inverted index structure and paternity test sequence comparison method, belong to data storage and processing technology field.STR data storage and paternity test sequence comparison method of the present invention based on inverted index structure, it is main to include two aspects:First, the STR date storage methods based on inverted index structure, this method can be established different data fields, store STR data with inverted index structure in data field according to str locus seat selected by sample;The comparison method second, paternity test is sorted, inverted index structure of this method based on dividing domain, calculates the affiliation of sample in look for relative sample and database, realization is quick, stablizes, reliably looks for relative online.
Description
Technical field
The invention belongs to data storage and processing technology field, and in particular to a kind of STR numbers based on inverted index structure
According to storage and paternity test sequence comparison method.
Background technology
According to incompletely statistics, look for relative personnel in the whole nation shared about 500,000 at present, wherein since history, natural calamity, society are asked
Chaos caused by war orphan caused by reason (Japanese orphan), natural calamity orphan and the abducted populations such as topic etc. constitute the master of looking for relative personnel
Body.In recent years, with the continuous development of biotechnology, carrying out looking for relative by gene technology becomes more and more feasible.
Looking for relative based on gene technology is that mainly human inheritance's mark is detected by using paternity test technology,
And according to identification of the Inheritance Analysis on Genetic to doubtful parent and child genetic connection.DNA is the underlying carrier of human inheritance's information,
What the chromosome of the mankind was mainly made of DNA, each human body cell has 22 pairs of autosome chromosomes and 1 pair of sex chromosome, altogether
Meter 46, respectively from father and mother.Parent both sides are respectively filial generation and provide a hemichromosome, are mutually paired in after fertilization,
Form the chromosome of filial generation.Whole chromosome system is formed since human body there are about 3,000,000,000 nucleotide, and in reproduction cell shape
It is random into preceding exchange and combination, so in addition to identical twin, there is identical core without any two people
Nucleotide sequence, here it is the genetic polymorphism of people.Despite the presence of the polymorphism of heredity, but each human chromosome is inevitable also only
Its parent can be come from, here it is the theoretical foundation of DNA paternity tests.At present, applied during paternity test relatively broad
Be the identification technology based on short tandem repeat (short tandem repeat, STR), due to its extremely sensitiveization, height
Feature, the technologies such as degree personalization, fully digitalization have become the authentication technique of Global Access.One typical autosome
Str locus seat data are as follows:
Site | STR |
D8S1179 | 13/14 |
D21S11 | 31/32 |
D7S820 | 11/12 |
CSF1PO | 10/13 |
D3S1358 | 15/16 |
D5S818 | 11/13 |
D13S317 | 8/12 |
D16S539 | 9/12 |
D2S1338 | 17/23 |
D19S433 | 14/14 |
vWA | 16/18 |
D12S391 | 18/21 |
D18S51 | 13/13 |
AMEL | X/Y |
D6S1043 | 12/19 |
FGA | 22/23 |
At present, the solution of paternity test problem relies on relevant database to store and compare STR data more, to realize
The judgement of sample donor parent child relationship.For a site, its STR data is mainly made of two numerals, one of them
From father, another then derives from mother.In detection process, it is assumed that each 16 sites of sample detection are (including one
A gender site).The same loci of each sample can have the numerical value of two allele.Two with biology parent child relationship
In 15 STR bit points of a tested person, the data in each site require that at least one numerical value is identical.For this
For problem, to judge whether there is parent child relationship between two individuals, at most need on each site to compare 4 times, 15
Site then at most needs to compare 60 times.When the sample size stored in system gradually increases, its contrast conting amount also will gradually increase
Add.Therefore, although solving depositing for looking for relative database to a certain extent using the storage of relevant database and alignments
Storage and search problem, but the characteristics of due to human body str locus seat data itself, it is not appropriate for the relationship type number for using " form "
Stored according to storehouse, and largely have impact on the comparison efficiency of STR data.In addition, gene information can also in the presence of what is made a variation
Energy property, once STR data are undergone mutation, will be further increased and carry out looking for relative and paternity test difficulty using STR data.
The content of the invention
In order to overcome the problems of the above-mentioned prior art, it is an object of the invention to provide one kind to be based on inverted index knot
Structure STR data storage and paternity test sequence comparison method, this method can effectively improve looking for relative database robustness and
Comparison efficiency, while can effectively ensure that the reliability of looking for relative result.
The present invention is to be achieved through the following technical solutions:
The storage of STR data and paternity test sequence comparison method disclosed by the invention based on inverted index structure, including
Following steps:
1) the STR data storage based on inverted index structure
First, all STR data are pre-processed, it is reference format that the STR data sets of each sample, which are arranged,;So
Afterwards, using each site as a data field, respective STR data will be stored in each data field;Finally, by STR data
Stored in a manner of inverted index;
2) the paternity test sequence based on the STR data stored in a manner of inverted index compares
First, STR data to be looked for relative are pre-processed, it is reference format that the STR data sets of each sample, which are arranged,;So
Afterwards, the STR data in each site are compared in respective data field, and form final parent child relationship index;Finally,
Judge to whether there is parent child relationship between sample, if parent child relationship index is higher than specific value, then it is assumed that the confession of candidate samples
The donor of body and sample to be looked for relative has a parent child relationship, on the contrary then think parent child relationship is not present between the two.
In step 1), STR data are pre-processed, it is reference format that the STR data sets of each sample, which are arranged, specifically
It is as follows:
Sample data set is denoted as X={ x1,x2..., xn};
Wherein, xiRepresent the STR data of i-th of individual,Wherein,Represent j-th of STR
The title of locus, vjkRepresent the characteristic value of STR on locus j on k-th of chromosome.
The foundation of data field is as follows in step 1):
The STR data of all samples are traveled through, establish the set STR of str locus seat titleN={ str1,str2,…,
strm, for STRNIn each stri, different data fields is established, is denoted as di;I=1,2 ... m.
STR data are stored in a manner of inverted index in step 1), sample data set X are traveled through, to any xi, traversal:(vj1/vj2), ifCorresponding data field dmIn there are vj1Index, then by xiIt is added in the index;If no
There are vj1Inverted index, then establish the index, and by xiIt is added in index;For vj2Adopt and located in a like fashion
Reason.
STR data to be looked for relative are pre-processed in step 2), it is specific as follows:
It is following form by looking for relative sample arrangement:Y={ strj:(vj1/vj2), wherein strjRepresent j-th of str locus seat
Title, vjkRepresent the characteristic value of STR on locus j on k-th of chromosome.
The calculating of parent child relationship index is as follows in step 2):
For sample y, str is traveled throughj:(vj1/vj2), if there is strjCorresponding domain dm, then v is obtainedj1And vj2Index institute
Corresponding sample set, is denoted as X respectivelyj1And Xj2;
Take Xj1And Xj2Union, be denoted as Xj=Xj1∪Xj2;
Obtaining each strjCorresponding XjAfterwards, X is calculatedjUnionEach member in X
Element is candidate samples;
To each element x in Xi, calculateWherein:
Then qiFor candidate samples xiParent child relationship index.
Parent child relationship is judged whether in step 2), it is specific as follows:
According to qiTo candidate samples xiDescending sort is carried out, if qi>=θ, then it is assumed that the donor of sample y to be looked for relative and candidate
Sample xiDonor there is parent child relationship;It is on the contrary, then it is assumed that there is no parent child relationship between the two;Wherein, θ is set in advance for system
Fixed threshold value, subtracts 1 for the quantity of locus.
Compared with prior art, the present invention has technique effect beneficial below:
1st, the search efficiency of higher
Traditional looking for relative database often uses relevant database, and by vertical segmentation, establish view, establish and count
The means such as information optimize, improve the search efficiency of system.But these method search algorithms are relative complex, and it is not easy to be scarce
Few relevant background knowledge operation maintenance personnel is understood.The present invention sets different data to store according to the difference of locus point position
Domain, and stored paired str locus seat data using inverted index structure in different data fields, is drastically increased and is
The search efficiency of system.
2nd, the scalability of higher
Traditional looking for relative database based on relevant database, often requires that looking for relative person must use specific gene position
Point significantly limit the use scope for database of looking for relative to be detected, and great inconvenience is brought to vast looking for relative user.This
Invention does not require specifically for looking for relative gene point position, if necessary to the point Bits Expanding to system, it is only necessary to which increase is corresponding
Data field, without being modified to basic data structure, drastically increases the scope of application of system.
3rd, influence of the gene mutation to paternity test effect is avoided
Gene mutation is one of huge obstacle of accuracy of paternity test in limitation looking for relative database.Due to genetic mutation
Presence so that parent-offspring two instead of between str locus seat data might not be completely superposed, therefore when using relevant database
During storing str locus seat data, the complete matching between str locus seat data becomes difficult to operate, in SQL statement
WHERE conditions are difficult to accurately match, and significantly limit the effect of paternity test.The present invention utilizes the row of falling in different pieces of information domain
Index structure stores str locus seat data, in inquiry, it is only necessary to which the comparison score in different pieces of information domain calculates sample
Parent child relationship index between this has the possibility of genetic connection between can obtaining sample donor, and is ranked up according to this,
Influence of the gene mutation to paternity test effect is largely avoid, the present invention can effectively improve the robustness of looking for relative database
And comparison efficiency, while can effectively ensure that the reliability of looking for relative result.
Brief description of the drawings
Fig. 1 is system overall framework schematic diagram;
Fig. 2 is the str locus seat data storage used in the present invention based on inverted index structure;
Fig. 3 is the algorithm flow chart of the STR data storage based on inverted index structure;
The algorithm flow chart that paternity test sequences of the Fig. 4 based on the STR data stored in a manner of inverted index compares;
Data structure when Fig. 5 and Fig. 6 does not store No. 00002 sample respectively and store 00002 in D8S1179 domains
Data structure after number sample;
Fig. 7 and Fig. 8 is the looking for relative system prototype realized according to principle of the invention design, and Fig. 7 is based on inverted index structure
The STR data of storage, Fig. 8 are the looking for relative result that looking for relative algorithm obtains.
Embodiment
With reference to specific embodiment, the present invention is described in further detail, it is described be explanation of the invention and
It is not to limit.
Referring to Fig. 1, STR data storage and paternity test sequence comparison method of the present invention based on inverted index structure are main
To include two aspects:First, the STR date storage methods based on inverted index structure, referring to Fig. 3, this method can be according to sample
Selected str locus seat, establishes different data fields, stores STR data with inverted index structure in data field;Second, parent
Son identification sequence comparison method, referring to Fig. 4, inverted index structure of this method based on dividing domain, calculates looking for relative sample and data
The affiliation of sample in storehouse, realizes quick, stable, reliable online looking for relative.
1. the STR date storage methods based on inverted index structure
STR date storage methods based on inverted index structure, comprise the following steps:First, all STR data are carried out
Pretreatment, it is reference format that the STR data sets of each sample, which are arranged,;Then, using each site as a data field, often
Respective STR data will be stored in a data field.Finally, the mode of STR data inverted indexs is stored.Detailed process such as Fig. 3
It is shown, specifically:
Step 1:Data prediction.Data preparation to be stored is following form:Sample data set is denoted as X={ x1,
x2..., xn, wherein xiRepresent the STR data of i-th of individual, be represented byWhereinRepresent
The title of j-th of str locus seat, vjkRepresent the characteristic value of STR on locus j on k-th of chromosome.
Step 2:Establish data field.The STR data of all samples are traveled through, establish the set STR of str locus seat titleN
={ str1,str2,…,strm, for STRNIn each stri, different data fields is established, is denoted as di。
Step 3:Data store.Sample data set X is traveled through, to any xi, traversal:(vj1/vj2), ifIt is corresponding
Data field dmIn there are vj1Index, then by xiIt is added in the index;If there is no vj1Inverted index, then establish should
Index, and by xiIt is added in index.For vj2Adopt and handled in a like fashion.
Str locus seat data after being stored using the above method are as shown in Figure 2.Wherein, the D8S1179 shown in top,
D21S11 etc. is the data field corresponding to str locus seat;Lower left is the corresponding data key of data, digital representation therein
STR numerical value;Inverted index list of the lower right corresponding to data key, the ID number of this donor of each numerical tabular sample.Such as
List corresponding to STR numerical value 12 includes the numerals such as 1,5,7,13,22, represents that certain chromosome of sample 1,5,7,13,22 exists
Numerical value on the D3S1358 of site is 12.
2. the paternity test sequence comparison method based on inverted index structure
On the basis of the STR date storage methods based on inverted index structure, str locus seat number as shown in Figure 2 is obtained
According to storage organization., will be main using the paternity test sequence comparison method based on inverted index structure, this method when being looked for relative
Comprise the following steps:First, STR data to be looked for relative are pre-processed, it is reticle that the STR data sets of each sample, which are arranged,
Formula;Then, the STR data in each site are compared in respective data field, and form final parent child relationship index;
Finally, judge to whether there is parent child relationship between sample, if parent child relationship index is higher than specific value, then it is assumed that candidate samples
The donor of donor and sample to be looked for relative there is parent child relationship, it is on the contrary then think parent child relationship is not present between the two.
Step 1:Data prediction.It is following form by looking for relative sample arrangement:Y={ strj:(vj1/vj2), wherein strj
Represent the title of j-th of str locus seat, vjkRepresent the characteristic value of STR on locus j on k-th of chromosome.
Step 2:Calculate parent child relationship index.For sample y, str is traveled throughj:(vj1/vj2), if there is strjIt is corresponding
Domain dm, then v is obtainedj1And vj2The corresponding sample set of index, is denoted as X respectivelyj1And Xj2.Take Xj1And Xj2Union, be denoted asObtaining each strjCorresponding XjAfterwards, X is calculatedjUnionIn X
Each element is candidate samples.To each element x in Xi, calculateWherein,
Then qiFor candidate samples xiParent child relationship index.
Step 3:Judge whether parent child relationship.According to qiTo candidate samples xiDescending sort is carried out, if qi>=θ is then
Think the donor and candidate samples x of sample y to be looked for relativeiDonor there is parent child relationship;It is on the contrary then think to be not present between the two
Parent child relationship.Wherein θ is the threshold value that system is previously set, general it is contemplated that the quantity for being arranged to locus subtracts 1.
Instantiation is as follows:
Need the looking for relative sample that stores as shown in table 1, sample to be looked for relative is as shown in table 2.
The sample instantiation to be stored in the looking for relative database of table 1
Sample ID | 00001 | 00002 | 00003 | 00004 | 00005 | …… |
D8S1179 | 14/15 | 13/15 | 10/13 | 13/15 | 13/14 | …… |
D21S11 | 30.2/31 | 29/32.2 | 30/31.2 | 29/32.2 | 29/30 | …… |
D7S820 | 10/11 | 8/9 | 11/11 | 11/11 | 10/12 | …… |
CSF1PO | 10/11 | 12/14 | 10/13 | 11/12 | 10/10 | …… |
D3S1358 | 15/16 | 16/16 | 16/17 | 15/15 | 15/16 | …… |
D5S818 | 10/11 | 11/12 | 11/13 | 10/11 | 10/13 | …… |
D13S317 | 12/12 | 11/11 | 11/12 | 10/11 | 11/11 | …… |
D16S539 | 10/13 | 9/11 | 11/12 | 11/12 | 10/12 | …… |
D2S1338 | 20/23 | 21/23 | 18/24 | 20/23 | 18/19 | …… |
D19S433 | 13/14 | 13/13 | 14/15 | 13/15.2 | 13/14 | …… |
vWA | 17/20 | 14/16 | 16/17 | 13/14 | 17/19 | …… |
D12S391 | 18/21 | 19/20 | 17/17.3 | 18/19 | 18/18 | …… |
D18S51 | 13/14 | 13/15 | 14/15 | 13/16 | 14/17 | …… |
AMEL | X/X | X/Y | X/Y | X/Y | X/Y | …… |
D6S1043 | 14/21.3 | 19/20 | 13/14 | 10/19 | 17/18 | …… |
FGA | 19/22 | 19/24 | 23/24 | 24/26 | 23/23 | …… |
2 sample instantiation to be looked for relative of table
Site | STR |
D8S1179 | 13/15 |
D21S11 | 29/31 |
D7S820 | 11/11 |
CSF1PO | 11/11 |
D3S1358 | 15/15 |
D5S818 | 10/12 |
D13S317 | 10/10 |
D16S539 | 9/11 |
D2S1338 | 18/23 |
D19S433 | 13/14 |
vWA | 14/14 |
D12S391 | 18/19 |
D18S51 | 13/15 |
AMEL | X/Y |
D6S1043 | 10/18 |
FGA | 23/24 |
1st, the STR data storage based on inverted index structure
Step 1:Data prediction.Will all samples to be stored to arrange be reference format, using No. 00001 sample as
Example, the result after it is arranged are:
x1={ D8S1179:(14/15),D21S11:(30.2/31),D7S820:(10/11),CSF1PO:(10/11),
D3S1358:(15/16),D5S818:(10/11),D13S317:(12/12),D16S539:(10/13),D2S1338:(20/
23),D19S433:(13/14),vWA:(17/20),D12S391:(18/21),D18S51:(13/14),AMEL:(X/X),
D6S1043:(14/21.3),FGA:(19/22)}
Step 2:Establish data field.In this example, the locus title of all sample datas is completely the same, therefore establishes
Data field share 16:
Step 3:Data store.Data storage has stored in database at this time by taking ID is 00002 sample as an example
ID is 00001 sample, as shown in Figure 5.First group of data D8S1179 is obtained first:(13/15), deposit in the database at this time
In data field D8S1179, and there is index 15 and index 13 may be not present, it is therefore desirable to newly-built 13 index, and 00002 is added
Into 13 and 15 index, as shown in fig. 6, traveling through each data field of No. 00002 sample successively in the manner described above afterwards.
2nd, the paternity test sequence comparison method based on inverted index structure
Step 1:Data prediction.
It is following form that looking for relative sample in table 2, which is arranged,:
Y={ D8S1179:(13/15),D21S11:(29/31),D7S820:(11/11),CSF1PO:(11/11),
D3S1358:(15/15),D5S818:(10/12),D13S317:(10/10),D16S539:(9/11),D2S1338:(18/
23),D19S433:(13/14),vWA:(14/14),D12S391:(18/19),D18S51:(13/15),AMEL:(X/Y),
D6S1043:(10/18),FGA:(23/24)}
Step 2:Calculate parent child relationship index.
For sample y, firstly for first data field D8S1179, it takes its sample set there are 13 and 15 index
Union Xj={ 00001,00002,00003,00004,00005 ... }, calculates the union X=in all domains on this basis
{00001,00002,00003,00004,00005,...}.To each element x in Xi, its score is calculated, as shown in table 3:
3 sample score of table
Sample ID | 00001 | 00002 | 00003 | 00004 | 00005 | …… |
D8S1179 | 1 | 1 | 1 | 1 | 1 | …… |
D21S11 | 0 | 1 | 0 | 1 | 1 | …… |
D7S820 | 1 | 0 | 1 | 1 | 0 | …… |
CSF1PO | 1 | 0 | 0 | 1 | 0 | …… |
D3S1358 | 1 | 0 | 0 | 1 | 1 | …… |
D5S818 | 1 | 1 | 0 | 1 | 1 | …… |
D13S317 | 0 | 0 | 0 | 1 | 0 | …… |
D16S539 | 0 | 1 | 1 | 1 | 0 | …… |
D2S1338 | 1 | 1 | 1 | 1 | 1 | …… |
D19S433 | 1 | 1 | 1 | 1 | 1 | …… |
vWA | 0 | 1 | 0 | 1 | 0 | …… |
D12S391 | 1 | 1 | 0 | 1 | 1 | …… |
D18S51 | 1 | 1 | 1 | 1 | 0 | …… |
AMEL | 1 | 1 | 1 | 1 | 1 | …… |
D6S1043 | 0 | 0 | 0 | 1 | 1 | …… |
FGA | 0 | 1 | 1 | 1 | 1 | …… |
qi | 10 | 11 | 8 | 16 | 10 |
Step 3:Judge whether parent child relationship.
Descending sort is carried out to candidate samples according to score, makes θ=15, then can determine whether that sample 00004 and y is closed with parent-offspring
System.
As shown in Figure 7 and Figure 8, wherein Fig. 7 illustrates this patent description to Database Systems prototype according to the system design
The STR data based on inverted index structure storage, Fig. 8 illustrates the result of looking for relative.
In conclusion the storage of STR data and paternity test sequence proposed by the present invention based on inverted index structure compare
Method.This method by establishing the modes such as data field, the index for establishing STR data values, by by paired str locus seat data with
The form of inverted index is stored;On this basis, sorted comparison method by paternity test based on inverted index structure,
The parent child relationship index between sample is calculated, the quick comparison of paternity test is realized, improves the efficiency of retrieval and inquisition, reduce
Influence of the gene mutation to paternity test comparison efficiency;And due to the use of data field, drastically increase this method
The scope of application.
Claims (4)
1. the storage of STR data and paternity test sequence comparison method based on inverted index structure, it is characterised in that including following
Step:
1) the STR data storage based on inverted index structure
First, all STR data are pre-processed, it is reference format that the STR data sets of each sample, which are arranged,;Then, will
Each site will store respective STR data as a data field in each data field;Finally, by STR data with the row of falling
The mode of index stores;
STR data are pre-processed, it is reference format that the STR data sets of each sample, which are arranged, specific as follows:
Sample data set is denoted as X={ x1,x2,...,xn};
Wherein, xiRepresent the STR data of i-th of individual,Wherein,Represent j-th of str locus
The title of seat, vjkRepresent the characteristic value of STR on locus j on k-th of chromosome;
2) the paternity test sequence based on the STR data stored in a manner of inverted index compares
First, STR data to be looked for relative are pre-processed, it is reference format that the STR data sets of each sample, which are arranged,;Then,
The STR data in each site are compared in respective data field, and form final parent child relationship index;Finally, sentence
It whether there is parent child relationship between random sample sheet, if parent child relationship index is higher than specific value, then it is assumed that the donor of candidate samples
There is parent child relationship with the donor of sample to be looked for relative, it is on the contrary then think parent child relationship is not present between the two;
Wherein, STR data to be looked for relative are pre-processed, it is specific as follows:
It is following form by looking for relative sample arrangement:Y={ strj:(vj1/vj2), wherein strjRepresent the name of j-th of str locus seat
Claim, vjkRepresent the characteristic value of STR on locus j on k-th of chromosome;
The calculating of the parent child relationship index is as follows:
For sample y, str is traveled throughj:(vj1/vj2), if there is strjCorresponding domain dm, then v is obtainedj1And vj2Corresponding to index
Sample set, be denoted as X respectivelyj1And Xj2;
Take Xj1And Xj2Union, be denoted as Xj=Xj1∪Xj2;
Obtaining each strjCorresponding XjAfterwards, X is calculatedjUnion X=X1∪X2∪...∪XJ, each element in X is
Candidate samples;
To each element x in Xi, calculateWherein:
Then qiFor candidate samples xiParent child relationship index.
2. the storage of STR data and paternity test sequence comparison method according to claim 1 based on inverted index structure,
It is characterized in that, the foundation of data field is as follows in step 1):
The STR data of all samples are traveled through, establish the set STR of str locus seat titleN={ str1,str2,...,strm, pin
To STRNIn each stri, different data fields is established, is denoted as di;I=1,2 ... m.
3. the storage of STR data and paternity test sequence comparison method according to claim 1 based on inverted index structure,
It is characterized in that, storing STR data in a manner of inverted index in step 1), sample data set X is traveled through, to any xi, time
Go throughIfCorresponding data field dmIn there are vj1Index, then by xiIt is added in the index;If
There is no vj1Inverted index, then establish the index, and by xiIt is added in index;For vj2Adopt and located in a like fashion
Reason.
4. the storage of STR data and paternity test sequence comparison method according to claim 1 based on inverted index structure,
Parent child relationship is judged whether in step 2), it is specific as follows:
According to qiTo candidate samples xiDescending sort is carried out, if qi>=θ, then it is assumed that the donor and candidate samples of sample y to be looked for relative
xiDonor there is parent child relationship;It is on the contrary, then it is assumed that there is no parent child relationship between the two;Wherein, θ is what system was previously set
Threshold value, subtracts 1 for the quantity of locus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510590067.XA CN105260395B (en) | 2015-09-16 | 2015-09-16 | The storage of STR data and paternity test sequence comparison method based on inverted index structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510590067.XA CN105260395B (en) | 2015-09-16 | 2015-09-16 | The storage of STR data and paternity test sequence comparison method based on inverted index structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105260395A CN105260395A (en) | 2016-01-20 |
CN105260395B true CN105260395B (en) | 2018-05-01 |
Family
ID=55100087
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510590067.XA Expired - Fee Related CN105260395B (en) | 2015-09-16 | 2015-09-16 | The storage of STR data and paternity test sequence comparison method based on inverted index structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105260395B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110349634B (en) * | 2019-07-11 | 2022-09-16 | 顾永才 | System for searching discrete relatives by using gene technology |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101169628A (en) * | 2007-11-14 | 2008-04-30 | 中控科技集团有限公司 | Data storage method and device |
US8775410B2 (en) * | 2009-02-09 | 2014-07-08 | The Hong Kong Polytechnic University | Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface |
-
2015
- 2015-09-16 CN CN201510590067.XA patent/CN105260395B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101169628A (en) * | 2007-11-14 | 2008-04-30 | 中控科技集团有限公司 | Data storage method and device |
US8775410B2 (en) * | 2009-02-09 | 2014-07-08 | The Hong Kong Polytechnic University | Method for using dual indices to support query expansion, relevance/non-relevance models, blind/relevance feedback and an intelligent search interface |
Non-Patent Citations (1)
Title |
---|
短串联重复序列的研究;王冰梅等;《北华大学学报(自然科学版)》;20060228;第7卷(第1期);论文第43-46页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105260395A (en) | 2016-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11335435B2 (en) | Identifying ancestral relationships using a continuous stream of input | |
Si et al. | Model-based clustering for RNA-seq data | |
Edwards et al. | High-resolution genetic mapping with pooled sequencing | |
US20160232224A1 (en) | Categorization and filtering of scientific data | |
CN107577924B (en) | Long-chain non-coding RNA subcellular position prediction method based on deep learning | |
CN106485096B (en) | The miRNA- Relationship To Environmental Factors prediction technique learnt based on random two-way migration and multi-tag | |
Yuan et al. | Nonconvex penalty based low-rank representation and sparse regression for eQTL mapping | |
Bhadra et al. | Identification of multiview gene modules using mutual information-based hypograph mining | |
Claerhout et al. | Ysurnames? The patrilineal Y-chromosome and surname correlation for DNA kinship research | |
Gupta et al. | Gene mutation classification through text evidence facilitating cancer tumour detection | |
CN115526246A (en) | Self-supervision molecular classification method based on deep learning model | |
Binder et al. | Cluster-localized sparse logistic regression for SNP data | |
CN109993305A (en) | Ancestral source polymorphism prediction technique based on big data intelligent algorithm | |
CN105260395B (en) | The storage of STR data and paternity test sequence comparison method based on inverted index structure | |
CN109033746B (en) | Protein compound identification method based on node vector | |
Wang et al. | Network clustering analysis using mixture exponential-family random graph models and its application in genetic interaction data | |
Chen et al. | Multi-objective evolutionary triclustering with constraints of time-series gene expression data | |
Chowdhury et al. | Cell type identification from single-cell transcriptomic data via gene embedding | |
Can et al. | A literature review on the use of genetic algorithms in data mining | |
Liu et al. | Similarity network fusion based on random walk and relative entropy for cancer subtype prediction of multigenomic data | |
CN113380326B (en) | Gene expression data analysis method based on PAM clustering algorithm | |
Zhao et al. | An improved graph representation learning method for drug-target interaction prediction over heterogeneous biological information graph | |
Zhao et al. | A computational method for detecting the associations between multiple loci and phenotypes | |
Wang et al. | Imputing missing values for genetic interaction data | |
Gong et al. | BDLR: lncRNA identification using ensemble learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180501 Termination date: 20190916 |
|
CF01 | Termination of patent right due to non-payment of annual fee |