CN110232140B - Disposable approximate pattern matching method with local-overall constraint - Google Patents

Disposable approximate pattern matching method with local-overall constraint Download PDF

Info

Publication number
CN110232140B
CN110232140B CN201910530672.6A CN201910530672A CN110232140B CN 110232140 B CN110232140 B CN 110232140B CN 201910530672 A CN201910530672 A CN 201910530672A CN 110232140 B CN110232140 B CN 110232140B
Authority
CN
China
Prior art keywords
node
tree
parent
leaf
gamma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910530672.6A
Other languages
Chinese (zh)
Other versions
CN110232140A (en
Inventor
武优西
菅博境
于磊
成淑慧
朱昌瑞
单劲松
刘靖宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei University of Technology
Original Assignee
Hebei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei University of Technology filed Critical Hebei University of Technology
Priority to CN201910530672.6A priority Critical patent/CN110232140B/en
Publication of CN110232140A publication Critical patent/CN110232140A/en
Application granted granted Critical
Publication of CN110232140B publication Critical patent/CN110232140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a one-time approximate pattern matching method with local-overall constraint, which relates to the technical field of electric digital data processing and solves the one-time approximate pattern matching problem under (delta, gamma) -distance by utilizing a net tree structure. The method overcomes the defects that the prior art is difficult to consider generality, accuracy and flexibility of solution and the solution set needs to be non-redundant aiming at the problem of one-time approximate pattern matching with local-overall constraint.

Description

Disposable approximate pattern matching method with local-overall constraint
Technical Field
The technical scheme of the invention relates to the technical field of electric digital data processing, in particular to a disposable approximate pattern matching method with local-overall constraint.
Background
With the continuous development of internet technology, the scale of data is increasing dramatically, how to find valuable information from a large amount of data by using a data mining technology becomes a research hotspot, frequent pattern mining refers to finding frequently occurring patterns from a large amount of data, and the main task is pattern matching, because frequent pattern mining usually needs to calculate the support degree of a pattern, and the essence of the support degree calculation is the problem of pattern matching, the pattern matching is the basis and the core of the frequent pattern mining, and with the change of science and technology, the pattern matching technology has been widely applied in various fields, not only to simple time series matching, but also to simple biological series matching.
The pattern matching refers to a process of searching for a subsequence which is the same as or similar to the pattern P in the sequence S, and the development trend of the process mainly includes the following steps: from traditional pattern matching to pattern matching with gap constraints; from exact pattern matching to approximate pattern matching; and thirdly, from the mode matching under the no special condition to the mode matching under the disposable condition.
Compared with the traditional pattern matching, the gap constraint is introduced into the pattern matching problem, so that the problem is more difficult to solve, the influence of data noise can be reduced, and the flexibility of pattern matching is improved.
Example a. given sequence S ═ S1s2s3s4s5s6s7Cbcabc, mode P ═ P1[0,1]p2[0,1]p3=c[0,1]b[0,1]c。
Mode with gap constraint: p ═ P1[min1,max1]p2…[minj,maxj]pj+1…[minm-1,maxm-1]pmWherein, minjAnd maxjRespectively represent sub-patterns pjSum sub-pattern pj+1The minimum number and the maximum number of wildcards which can be matched with each other, and the interval [ minj,maxj]Called gap constraint, m denotes the length of the pattern P.
In example a, P for a given pattern1[0,1]p2[0,1]p3=c[0,1]b[0,1]c, with c [0,1]]b is for example, [0,1]]The method is called gap constraint, wherein 0 means that at least 0 wildcards can be matched between the sub-pattern c and the sub-pattern b, and 1 means that at most 1 wildcard can be matched between the sub-pattern c and the sub-pattern b; for ease of description of each occurrence, all the characters of the pattern P that occur in the sequence S are indicated by the subscripts on the positions of the corresponding characters in the sequence S, in this example, 2 occurrences of the given pattern P in the given sequence S are each<1,2,3>And<3,5,7>;<1,2,3>denotes a subsequence s1s2s3And mode p1p2p3Identical, wherein the character s1And character s2Has no character in between, satisfies the gap constraint of 0,1]The character s2And character s3There is no character in between, and the gap constraint [0,1] is satisfied];<3,5,7>Denotes a subsequence s3s5s7And mode p1p2p3Identical, wherein the character s3And character s51 character in between, satisfy the gap constraint [0,1]]The character s5And character s7There are also 1 character in between, satisfying the gap constraint [0,1](ii) a It follows that "? And "", the gap constraint is more flexible.
The pattern matching with the gap constraint is accurate pattern matching and approximate pattern matching according to the matching type, the accurate pattern matching is not allowed to have errors, but noise is introduced in the actual data acquisition and preprocessing processes, the accurate pattern matching cannot obtain effective information at the moment, the pattern matching in the approximate sense is needed, the approximate pattern matching allows errors, and more valuable information can be found compared with the accurate pattern matching.
The measurement distance is a key of approximate pattern matching, the Hamming distance is a commonly used measurement distance and refers to the number of different characters at corresponding positions of two sequences with equal length, but the Hamming distance cannot measure the local approximation degree between the sequences, so that the sequences are not similar integrally, and larger noise is introduced, the other measurement distance is a (delta, gamma) -distance initially used for music information retrieval, and the problem that the Hamming distance introduces noise can be solved.
Example b. a given pattern P is b [0,1] d [0,1] b.
Fig. 1 of the drawings of the specification shows that (1) in fig. 1 coincides with a given pattern P without gaps; the (2), (3) in fig. 1 have a gap constraint and can be matched exactly to a given pattern P.
Using Hamming distance as the metric distance for a given pattern P: giving two sequences X and Y with equal length, wherein the Hamming distance between the sequences X and Y is the number of different characters at corresponding positions; the (4), (5), (6) in fig. 1 cannot exactly match the given pattern P and can approximately match the given pattern P when the threshold of the Hamming distance is 1, but the shaded segments all have great deviation from the corresponding segments of the given pattern P, so that the (4), (5), (6) in fig. 1 are not similar to the (1) in fig. 1 as a whole.
Using (δ, γ) -distance as the metric distance for a given pattern P: delta-distance: given any two characters x and y in the character set Σ, the δ -distance between the characters x and y is | x-y |, denoted Dδ(x, y); γ -distance: given two sequences of equal length X ═ X1x2…xnAnd the sequence Y ═ Y1y2…ynThe gamma-distance of sequence X and sequence Y being
Figure BDA0002099636890000021
Wherein i is not less than 1 and not more than n, and is denoted as Dγ(X, Y), wherein n represents the length of the sequence.
Given two sequences of equal length X ═ X1x2…xnAnd the sequence Y ═ Y1y2…ynGiven a local threshold δ and an overall threshold γ, when:
(1)
Figure BDA0002099636890000022
then sequence X and sequence Y are said to satisfy a local proximity constraint, referred to as local constraint for short, and denoted as X ═δY;
(2)Dγ(X, Y) is less than or equal to γ, then sequence X and sequence Y are said to satisfy the global proximity constraint, which is called global constraint for short, and is denoted as X ═ YγY;
(3)X=δY and X ═γY, then sequence X and sequence Y satisfy both local constraint and global constraint, referred to as local-global constraint, denoted X ═(δ,γ)Y;
(7), (8), (9) in fig. 1 cannot exactly match the given pattern P, but can approximately match the given pattern P at a (δ, γ) -distance, where (7), (8) in fig. 1 can approximately match the given pattern P when the local threshold δ is 1 and the global threshold γ is 1, and both have good similarity to (1) in fig. 1 as a whole, illustrating that the δ -distance can measure the local similarity between sequences such that the local constraint is satisfied between the sequences; fig. 1(9) cannot approximately match a given pattern P when the local threshold δ is 1 and the global threshold γ is 1, but can approximately match the given pattern P when the local threshold δ is 1 and the global threshold γ is 2, which illustrates that γ -distance can measure the global approximation between sequences so that the global constraint is satisfied between sequences.
FIG. 1 of the drawings of the specification shows that global dissimilarity between sequences is caused by the fact that Hamming distance cannot measure local proximity between sequences; the (delta, gamma) -distance can measure the local approximation degree and the overall approximation degree between the sequences at the same time, and the overall similarity between the sequences is ensured.
In the pattern matching problem with gap constraint, when no constraint condition is applied, that is, under no special condition, characters in the sequence S are allowed to appear for multiple times of repeated use, the number of the appearance of the pattern P in the sequence S is exponential, but in practical application, each character in the sequence S represents an event occurring at a certain time or a certain time period, and in order to avoid counting the repetition thereof, it is more reasonable to apply a one-time condition constraint, and the following example C adopts the sequence S and the pattern P in the example a to describe in detail the (δ, γ) -approximate pattern matching under the one-time condition.
Example c. given sequence S ═ S1s2s3s4s5s6s7Cbcabc, mode P ═ P1[0,1]p2[0,1]p3=c[0,1]b[0,1]c, the local threshold δ is 1, and the global threshold γ is 1.
The following occurs: given sequence S ═ S1s2...snThe pattern P ═ P1[min1,max1]p2…[minm-1,maxm-1]pmLocal threshold δ and global threshold γ, when m integers l are present1,l2,…,lmThe following two conditions are satisfied simultaneously:
(1)1≤l1<l2<…<lm≤n,minj≤lj+1-lj-1≤maxjwherein j is more than or equal to 1 and less than or equal to m-1;
(2)
Figure BDA0002099636890000031
then call<l1,l2,…,lm>Is the occurrence of a given pattern P in a given sequence S, denoted as
Figure BDA0002099636890000032
Pattern matching without special conditions means that characters at any position in the sequence S can be repeatedly used for a plurality of times, and FIG. 2 of the attached drawings of the specification shows that under the special conditions, 4 occurrences of a given pattern P in a given sequence S in example A are respectively<1,2,3>、<3,5,6>、<3,5,7>And<5,6,7>(ii) a For the<3,5,6>In other words, the subsequence s3s5s6And mode p1p2p3Not all of them, where the character s6Sub-pattern p ═ b3C, b ≠ c, but Dδ(s2,p2)=|s2-p 21 ≦ δ ≦ 1 ≦ a-b |, so that the subsequence s3s5s6And mode p1p2p3Satisfies a local constraint, i.e. s3s5s6δp1p2p3While D isγ(s3s5s6,p1p2p3)=|s3-p1|+|s5-p2|+|s6-p 31 | ═ 0+0+1 ≦ γ ═ 1, then subsequence s3s5s6And mode p1p2p3Satisfies an overall constraint, i.e. s3s5s6γp1p2p3Thus, therefore, it is<3,5,6>Is the occurrence of a given pattern P in a given sequence S, i.e. S3s5s6(δ,γ)p1p2p3(ii) a For the<5,6,7>In other words, the subsequence s5s6s7And mode p1p2p3Identical, the gamma-distance between the two is 0, the delta-distance between the corresponding characters is also 0,<5,6,7>is the occurrence of an exact match; therefore, the exact match is a special case of the approximate match, and the approximate match is more general than the exact match; pattern matching in the one-time condition means that a character at any position in the sequence S can only be used once at most, and in the one-time condition, the given pattern P in example A has 2 occurrences in the given sequence S, respectively<1,2,3>And<5,6,7>when selected, is<1,2,3>Then it cannot select<3,5,6>And<3,5,7>because of the character s3Has been already covered<1,2,3>The use is carried out; therefore, compared with the mode matching under no special condition, the mode matching under the one-time condition can not only reduce the result set and simplify the calculation, but also avoid the repeated use of characters.
The problem of pattern matching generally requires to have generality, accuracy and flexibility, and the solution set needs to be non-redundant, which can reduce the difficulty of data analysis and processing, but the existing related technologies are difficult to simultaneously meet the conditions, for example, the document published by wu et al, "a heuristic method for solving the MPMGOOC problem, the computer reports" the pattern matching with gap constraint and one-time condition is researched based on the net tree structure, the document provides a heuristic method, namely, the right-most biparental strategy and the greedy search biparental strategy are used for finding the optimal occurrence, the quality of the solution is improved compared with other methods, and the method has reference value for solving other complex problems, but the method researches accurate pattern matching, cannot process the data noise problem, and has no generality; he et al, published in the literature "SAIL-APPROX: An effective On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints, ieee." by dynamic programming, have studied the one-time Approximate Pattern Matching of band-pass allocators and Length Constraints and have proven the correctness and effectiveness of the proposed method, but the literature has studied the Approximate Pattern Matching at Hamming distance, the Hamming distance has not considered the local Constraints between sequences, which can cause huge bias when measuring time sequences, without having accuracy; the document "Approximate pattern matching with gap constraint", Journal of Information science ", published by down et al, studies an Approximate pattern matching with gap constraint, and proposes an efficient solution method based on a single net tree, which can find more valuable Information in many fields than the precise pattern matching, but the document studies the Approximate pattern matching without special conditions, and the special conditions allow characters at any position in the sequence to be used for many times, so that the number of the patterns appearing in the sequence grows exponentially along with the length of the patterns, and the complexity of the processing problem is increased; liu et al, "an improved BM pattern matching method, computer engineering" on the basis of BM method, through judging whether there are consecutive characters in the pattern, and then change the comparative order of the pattern, improved the matching efficiency of BM method, but this document matches and is consecutive characters, there is no interval constraint, lack the flexibility.
In summary, for the one-time approximate pattern matching problem with local-global constraint, the existing one-time approximate pattern matching technology hardly considers the generality, accuracy and flexibility of solution, and the solution set needs to be non-redundant, and a good method for solving the problem does not exist so far.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method realizes the one-time approximate pattern matching under the (delta, gamma) -distance under the condition of gap constraint, and overcomes the defects that the prior art is difficult to consider the generality, the accuracy and the flexibility of solution and the solution set needs to be non-redundant aiming at the one-time approximate pattern matching problem with the local-integral constraint.
The technical scheme adopted by the invention for solving the technical problem is as follows: the disposable approximate pattern matching method with local-overall constraint solves the disposable approximate pattern matching problem under the (delta, gamma) -distance by utilizing a net tree structure, and comprises the following specific steps:
step one, determining the layer number of the net tree:
reading in a given sequence S, determining the length n, each character in the sequence S being marked as a character S1S character2…, character snReading in a given pattern P, determining its length as m, and recording each sub-pattern of the pattern P as a sub-pattern P1Sub-mode p2…, submode pmHere 0<m<n; determining the number of layers of the net tree according to the number of the sub-modes in the given mode P, wherein the net tree has m layers which are respectively marked as a 1 st layer, a 2 nd layer, … and an m th layer;
second, a net tree is created and nodes are calculated
Figure BDA0002099636890000041
Tree root path array of
Figure BDA0002099636890000042
And leaf path array
Figure BDA0002099636890000043
Given a local threshold value δ and a global threshold value γ, where 0 ≦ δ ≦ γ, and creating a net tree in which the node labeled i in level j is designated as the node S in the first step and the pattern P in the first step
Figure BDA0002099636890000044
Unlike exact pattern matching, each node in a net tree created by one-time approximate pattern matching with local-global constraints
Figure BDA0002099636890000045
Character s to be recordediAnd sub-pattern pjDelta-distance D betweenδ(si,pj) And is combined withAnd each node
Figure BDA0002099636890000046
Tree root path array with size of gamma +1
Figure BDA0002099636890000047
To describe the node
Figure BDA0002099636890000048
Number of tree root paths to reach tree root level
Figure BDA0002099636890000049
Array subscript
0,1, …, γ, current node
Figure BDA00020996368900000410
Among the paths to the root layer, the number of root paths with a distance of γ d is represented as
Figure BDA00020996368900000411
Initializing the alpha-gamma-beta-gamma to 0, wherein d is more than or equal to 0 and less than or equal to gamma; during the process of creating the net tree, according to each node
Figure BDA00020996368900000412
Tree root path array of
Figure BDA00020996368900000413
Invalid parent relations and invalid nodes in the net tree can be judged and deleted in advance; for the same reason, each node
Figure BDA00020996368900000414
It is also desirable to use an array of leaf paths of size γ +1
Figure BDA00020996368900000415
To describe the node
Figure BDA00020996368900000416
Number of leaf paths to leaf layer
Figure BDA00020996368900000417
Array subscript
0,1, …, γ, current node
Figure BDA0002099636890000051
The number of leaf paths with a distance of y-d in the path to the leaf layer is indicated as
Figure BDA0002099636890000052
It is also initialized to 0, where d is 0. ltoreq. gamma;
the specific treatment method comprises the following steps:
for a given sequence S of characters S in the first step aboveiProcessing and calculating in turn the submode P in the given pattern P in the first step above1Sub-mode p2…, submode pmDelta-distance D betweenδ(si,pj) Wherein j is more than or equal to 1 and less than or equal to m:
when D is presentδ(si,pj)>δ, where 1 ≦ j ≦ m, without creating nodes at level j of the created net tree
Figure BDA0002099636890000053
When D is presentδ(si,pj) And ≦ δ, where j ≦ 1, creating nodes directly at the root level of the created mesh tree
Figure BDA0002099636890000054
And calculating the node by formula (1)
Figure BDA0002099636890000055
Tree root path array of
Figure BDA0002099636890000056
Figure BDA0002099636890000057
In the formula (1), d represents a node
Figure BDA0002099636890000058
Gamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,p1) Representing a character siAnd sub-pattern p1Delta-distance therebetween;
when D is presentδ(si,pj) δ is not more than 1<If j is less than or equal to m, then a node is created in the jth layer of the created net tree
Figure BDA0002099636890000059
And judging whether the node exists in all the nodes of the j-1 th layer of the created net tree
Figure BDA00020996368900000510
Satisfies the gap constraint [ min ]j-1,maxj-1]Parent node (c): when there is no tie point
Figure BDA00020996368900000511
Satisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is deleted, the node is deleted
Figure BDA00020996368900000512
When existing and node
Figure BDA00020996368900000513
Satisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is the node, the parent node is recorded as
Figure BDA00020996368900000514
And continuously judging the parent node by the following method
Figure BDA00020996368900000515
And node
Figure BDA00020996368900000516
Whether the parent relationship can be established:
when in use
Figure BDA00020996368900000517
Then the parent node
Figure BDA00020996368900000518
And node
Figure BDA00020996368900000519
Can establish a parent relationship between them, and calculate the nodes by the formula (2)
Figure BDA00020996368900000520
Tree root path array of
Figure BDA00020996368900000521
Figure BDA00020996368900000522
In the formula (2), d represents a node
Figure BDA00020996368900000523
Gamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,
Figure BDA00020996368900000524
representing the union node in the j-1 th layer of the net tree
Figure BDA00020996368900000525
Satisfies the gap constraint [ min ]j-1,maxj-1]And can be connected to a node
Figure BDA00020996368900000526
Establishing parent nodes of parent relations;
when in use
Figure BDA00020996368900000527
Then the parent node
Figure BDA00020996368900000528
And node
Figure BDA00020996368900000529
The parent relationship can not be established;
traversing all nodes of the j-1 th layer of the net tree in sequence, and calculating the nodes
Figure BDA00020996368900000530
Tree root path array of
Figure BDA00020996368900000531
When in use
Figure BDA00020996368900000532
Then the node is deleted
Figure BDA00020996368900000533
After each character in the given sequence S in the first step is processed, a net tree is created;
in the created net tree, each node of the net tree is calculated
Figure BDA00020996368900000534
Array of leaf paths
Figure BDA00020996368900000535
Node point
Figure BDA00020996368900000536
Array of leaf paths
Figure BDA00020996368900000537
Starting from the leaf layer:
for leaf node
Figure BDA00020996368900000538
Calculating the leaf path array by formula (3)
Figure BDA00020996368900000539
Figure BDA00020996368900000540
In the formula (3), d represents a node
Figure BDA00020996368900000541
Gamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pm) Representing a character siAnd sub-pattern pmDelta-distance therebetween;
for net tree j level node
Figure BDA0002099636890000061
Wherein j is more than or equal to 1<m, when there is a child node, the child node is recorded as
Figure BDA0002099636890000062
And calculating the node by formula (4)
Figure BDA0002099636890000063
Array of leaf paths
Figure BDA0002099636890000064
Figure BDA0002099636890000065
In the formula (4), d represents a node
Figure BDA0002099636890000066
Gamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,
Figure BDA0002099636890000067
representation node
Figure BDA0002099636890000068
The child node of (2);
go through in turn
Figure BDA0002099636890000069
All the child nodes of the node are calculated
Figure BDA00020996368900000610
Array of leaf paths
Figure BDA00020996368900000611
According to the node
Figure BDA00020996368900000612
Tree root path array of
Figure BDA00020996368900000613
Calculating the node by the formula (5)
Figure BDA00020996368900000614
Tree root path number of
Figure BDA00020996368900000615
According to the node
Figure BDA00020996368900000616
Array of leaf paths
Figure BDA00020996368900000617
Calculating the node by the formula (6)
Figure BDA00020996368900000618
Number of leaf paths of
Figure BDA00020996368900000619
Figure BDA00020996368900000620
In the formula (5), d represents a node
Figure BDA00020996368900000621
A γ -distance in the range of 0 to γ to the root layer of the tree, γ representing the overall threshold;
Figure BDA00020996368900000622
in the formula (6), d represents a node
Figure BDA00020996368900000623
A γ -distance in the range of 0 to γ to the leaf layer, γ representing the overall threshold;
according to the node
Figure BDA00020996368900000624
Tree root path array of
Figure BDA00020996368900000625
And leaf path array
Figure BDA00020996368900000626
Calculating the node by the formula (7)
Figure BDA00020996368900000627
Root leaf path number of
Figure BDA00020996368900000628
Figure BDA00020996368900000629
In the formula (7), k represents a node
Figure BDA00020996368900000630
D to reach the root level of the treeδ(si,pj) Gamma-distance in the range to gamma, h denotes a node
Figure BDA00020996368900000631
D to the leaf layerδ(si,pj) To gamma + Dδ(si,pj) -a distance γ in the range of k, γ representing the overall threshold;
in the created net tree, the nodes are connected
Figure BDA00020996368900000632
Root leaf path number of
Figure BDA00020996368900000633
Calculating the position-related number RP (i) of the node, by means of which node
Figure BDA00020996368900000634
Tree root path number of
Figure BDA00020996368900000635
Calculating the path branch number PB (i) of the node:
node point
Figure BDA00020996368900000636
The location related number RP (i) of (A) is that all nodes labeled i in the net tree
Figure BDA00020996368900000637
Root leaf path number of
Figure BDA00020996368900000638
The sum is calculated by equation (8):
Figure BDA00020996368900000639
in the formula (8), m represents the number of layers of the net tree,
Figure BDA00020996368900000640
representation node
Figure BDA00020996368900000641
The number of tree root leaf paths;
node point
Figure BDA00020996368900000642
The ancestor set of (1) is a set formed by nodes on all paths of the node to the root layer of the tree; the common ancestor set of a node set is a set formed by intersection of ancestor sets of all elements in the node set; node point
Figure BDA00020996368900000643
The path branch number PB (i) of (1) refers to all nodes labeled i in a common ancestor set A (D) of a node set D
Figure BDA00020996368900000644
Tree root path number of
Figure BDA00020996368900000645
The sum is calculated by equation (9):
Figure BDA00020996368900000646
in formula (9), l represents the depth of the common ancestor set A (D),
Figure BDA00020996368900000647
representation node
Figure BDA00020996368900000648
The number of tree root paths;
thirdly, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure:
the problem of one-time approximate pattern matching under the (delta, gamma) -distance is solved by utilizing the net trees constructed in the first step and the second step, specifically, one tree root leaf path in the net tree corresponds to one appearance, the appearance is searched, namely, the tree root leaf path meeting the local-overall constraint is searched in the net tree, based on the net tree structure, the tree root leaf path meeting the local-overall constraint is searched in the net tree by adopting a heuristic method, and the heuristic method is any one of the following four similar appearance searching strategies:
first, the right-most parent strategy and the greedy search parent strategy: the right-most parent strategy is to start from the last node of the leaf layer, select the right-most parent node of the current node under the condition of meeting the local-overall constraint, the greedy search parent strategy is to start from the last node of the leaf layer, and select the optimal parent node of the current node under the condition of meeting the local-overall constraint, wherein the optimal parent node is the parent node with smaller position correlation number, and when the position correlation numbers of the two parent nodes are the same, the parent node with larger path branch number is selected as the optimal parent node in the common ancestor set of the existing path nodes;
second, left-most and greedy search parent strategies: the left-most parent strategy is that a left-most parent node of a current node is selected preferentially from a first node of a leaf layer under the condition of meeting local-overall constraint, at the moment, greedy search is performed on the left-most parent strategy, namely the left-most parent node of the current node is selected from the first node of the leaf layer, and the optimal parent node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal parent node is the parent node with a small position correlation number;
third, the right-most child strategy and the greedy search child strategy: the rightmost child strategy is that a rightmost child node of a current node is selected preferentially from the last node of a tree root layer under the condition of meeting local-overall constraint, at the moment, child strategy is searched greedily from the last node of the tree root layer, and the optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with smaller position correlation number;
fourth, left-most child strategy and greedy search child strategy: the left-most child strategy is that a left-most child node of a current node is selected from a first node of a tree root layer under the condition of meeting local-overall constraint, at the moment, a child strategy is searched greedily from the first node of the tree root layer, and an optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with a small position correlation number;
thereby completing one-time approximate pattern matching at (delta, gamma) -distance by using the net tree structure;
outputting all occurrences of a given pattern P in said first step in a given sequence S in said first step, i.e. all occurrences in the output result set OCC (S, P), on a display, thereby completing a one-time approximate pattern matching with local-global constraints.
The above-mentioned one-time approximate pattern matching method with local-global constraint uses VC + +6.0 as programming software, Visio2013 as drawing tool, and uses Pentium (R) Dual-Core 32Processor + as Processor, and Windows7 as operating system and above, and the software and hardware environment are all known to those skilled in the art.
The invention has the beneficial effects that: compared with the prior art, the invention has the prominent substantive characteristics as follows:
(1) the method of the invention researches the disposable approximate pattern matching with local-overall constraint through the net tree structure based on the (delta, gamma) -distance, and solves the problem of the traditional HaReading in given sequence S and given pattern P, determining the number of net tree layers matched with the submode number of given pattern P, creating a net tree according to input conditions, pre-judging and deleting invalid parent relationship and invalid nodes in the process of creating net tree, and calculating the residual nodes
Figure BDA0002099636890000081
Tree root path array of
Figure BDA0002099636890000082
And leaf path array
Figure BDA0002099636890000083
Finally, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure;
(2) CN107239500A discloses a character string matching method and system, relating to the field of character matching, wherein the system realizes the matching of character strings by defining several modules and orderly connecting together, the invention researches the character string accurate matching without special conditions and does not allow gaps among characters, while the invention researches the approximate pattern matching under one-time conditions and allows gaps among characters, which is the maximum substantive difference between the two;
(3) CN108647299A discloses a rarely-used character matching method, a character string pattern matching method and a storage medium, the invention firstly extracts rarely-used characters in the pattern, then judges whether the rarely-used characters exist in the text, and finally judges whether the pattern is equal to the text segment containing the rarely-used characters, thereby realizing the rarely-used character matching, the invention researches the accurate pattern matching of the character string without special conditions, and does not allow gaps to exist among the characters, while the invention researches the approximate pattern matching under the one-time condition, and allows gaps to exist among the characters, which is the maximum substantive difference between the two;
(4) CN106446062A discloses a retrieval system and method of continuous characters and fuzzy characters, which realizes double matching of continuous character string pattern matching and fuzzy character matching, the invention researches continuous character string matching without special conditions, and does not allow gaps among characters, while the invention researches pattern matching under one-time conditions, and allows gaps among characters, which is the maximum substantive difference between the two;
(5) CN103425739A discloses a fast string matching method, which obtains a simple hash table of each character by preprocessing a target string, and then implements fast matching of the string through the hash table, but the invention researches continuous string matching without special conditions and does not allow gaps between characters, while the invention researches pattern matching under one-time conditions and allows gaps between characters, which is the most substantial difference between the two;
(6) CN107229759A discloses a character string pattern matching method, which researches the accurate pattern matching without special condition, the result set is redundant and the anti-interference performance is poor, but researches the approximate pattern matching under one-time condition, which not only simplifies the result set, but also can find more valuable information compared with the accurate pattern matching, which is the maximum substantive difference between the two;
(7) CN109408681 discloses a character string matching method, which firstly divides a target character string and a character string to be matched into sub-character strings with preset lengths respectively, then distributes a hash value for each seed character string in a sub-character string set, and finally calculates the similarity between the target character string and the character string to be matched according to the hash sequence, when the similarity is larger than a preset similarity threshold value, the target character string and the character string to be matched are determined to be matched, the invention needs to divide the character strings in advance and cannot measure the local similarity between the sequences, but the invention adopts the (delta, gamma) -distance to carry out approximate measurement on the sequences, gives consideration to the local similarity and the overall similarity, does not need to divide the sequences in advance, and only needs to traverse the sequences once, namely completes the approximate pattern matching under the (delta, gamma) -distance;
(8) CN109165326 discloses a method and apparatus for matching character strings, the method and apparatus determine fields corresponding to each word contained in a character string by segmenting the character string, further determine matching degree between character strings according to weight value of each field, when the matching degree is greater than a preset threshold, consider two character strings to be matched, the invention performs approximate measurement on the sequences by weight value, although the invention is also an approximate match, the invention cannot measure local approximation degree between the sequences, and does not allow gaps between the characters, but the invention performs approximate measurement on the sequences by (δ, γ) -distance, can simultaneously measure local approximation degree and overall approximation degree between the sequences, and allows gaps between the characters, which is the maximum substantial difference between the two;
(9) CN108920361A discloses a string matching code similarity detection method, which detects the similarity of the code structure by the method of the longest common substring, but can not measure the local similarity between sequences, but adopts the (delta, gamma) -distance to carry out approximate measurement on the sequences, and gives consideration to the local similarity and the overall similarity, which is the maximum substantive difference between the two;
(10) CN108920483A discloses a character string fast matching method based on suffix array, which firstly utilizes binary search to limit the occurrence position within an interval, then further excludes suffixes with length smaller than the pattern and different from the last character of the pattern on the interval, thereby realizing fast matching of the character string, but the invention researches accurate pattern matching without special conditions and does not allow gaps between the characters, while the invention researches approximate pattern matching under one-time conditions and allows gaps between the characters, which is the maximum substantive difference between the two;
(11) CN108255836A discloses a character string matching method and device, the invention calculates the matching value of key character strings, and calculates the first edit distance between character strings by using the preset rule, finally obtains the similarity according to the first edit distance and the matching value, but the edit distance can not measure the local approximation between sequences, but the invention adopts (delta, gamma) -distance to carry out approximate measurement to the sequences, and gives consideration to the local approximation and the overall approximation, which is the maximum substantial area of the two;
(12) CN106980620A discloses a method and a device for matching Chinese character strings, the invention firstly judges whether characters on the same positions of two strings are similar characters or similar pronunciation characters, then obtains the editing distance between the characters on the same positions according to the preset editing distance rule for calculating the similar characters or the similar pronunciation characters, and finally matches the strings according to the editing distance, but the editing distance can not measure the local similarity between the sequences, resulting in the overall dissimilarity between the sequences.
Compared with the prior art, the method has the following remarkable progress:
(1) the method researches the pattern matching with the gap constraint, and in the problem of the pattern matching, due to the introduction of the gap constraint, although the problem is difficult to solve, the method is more flexible and changeable and has wider application range, and a user can flexibly set the size and the length of the gap constraint according to specific conditions so as to better meet the requirements of the user;
(2) the method is based on (delta, gamma) -distance, and researches the approximate pattern matching with local-overall constraint through a net tree structure, wherein the delta-distance ensures that the sequences meet the local constraint, the gamma-distance ensures that the sequences meet the overall constraint, and the (delta, gamma) -distance simultaneously measures the local approximation and the overall approximation between the sequences, thereby widening the application field, reducing the matching error and having important research value;
(3) the method researches approximate pattern matching under a one-time condition, in the problem of pattern matching with clearance constraint, a new appearance can be generated as long as the appearance position of a sub-pattern is changed, the pattern matching under no special condition allows characters at each position in a sequence to be repeatedly used for many times, so that a solution space is directly exponential, and the pattern matching under the one-time condition requires that the characters at each position in the sequence can be used only once at most, so that a result set is reduced, calculation is simplified, and the repeated use of the characters is avoided;
(4) the embodiment is simple time sequence matching application description, wherein the sequence S represents a character type time sequence, namely a result after a numerical type time sequence is symbolized, and the pattern P reflects a data change rule of a period of time and can predict the trend of future data.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a comparative illustration of example B, time series measured by Hamming distance and (δ, γ) -distance, where: fig. 1(1) is bdb, fig. 1(2) is badb, fig. 1(3) is baddb, fig. 1(4) is eaddb, fig. 1(5) is bafdb, fig. 1(6) is badde, fig. 1(7) is aaddb, fig. 1(8) is bacdb, and fig. 1(9) is bacdc.
Fig. 2 shows all occurrences of a given pattern P in example C in a given sequence S.
FIG. 3 is a schematic flow chart of the computer processing method of the present invention.
Fig. 4 shows the number of net tree layers determined according to the number of sub-patterns in a given pattern P in embodiment 1 of the present invention.
Fig. 5 shows a net tree created in example 1 of the present invention.
FIG. 6 is the net tree of FIG. 5 with nodes in the leaf level that cannot be reached, invalid parent relationships, and invalid nodes removed.
FIG. 7 is a flowchart of embodiment 1 of the present invention for calculating each node in the net tree shown in FIG. 6
Figure BDA0002099636890000101
Array of leaf paths
Figure BDA0002099636890000102
FIG. 8 shows the deletion of FIG. 6 according to embodiment 1 of the present invention<4,6,7,8>And all nodes with the same label, recalculating each of the other nodes
Figure BDA0002099636890000103
Tree root path array of
Figure BDA0002099636890000104
FIG. 9 shows the deletion of FIG. 6 according to embodiment 1 of the present invention<1,4,7,8>And all nodes with the same label, recalculating each of the other nodes
Figure BDA0002099636890000105
Tree root path array of
Figure BDA0002099636890000106
Fig. 10 is the net tree of fig. 9 with all nodes having the same label as that of <1,4,7,8> removed.
Detailed Description
The embodiment of FIG. 1 shows that the gapless pattern P of FIG. 1(1) bdb is consistent with example B; the badb in fig. 1(2) and the baddb in fig. 1(3) introduce gap constraint, which can process noise and correct phase difference; fig. 1(4) eaddb, fig. 1(5) bafdb and fig. 1(6) badde cannot exactly match the given pattern P in example B, and although it can approximately match the given pattern P in example B when the threshold of Hamming distance is 1, they are not similar to and greatly deviated from fig. 1(1) bdb as a whole, wherein "e" in fig. 1(4) eaddb is greatly deviated from "B", and "f" in fig. 1(5) bafdb is greatly deviated from "d", and "e" in fig. 1(6) badde is greatly deviated from "B"; fig. 1(7) aaddb, fig. 1(8) bacdb, and fig. 1(9) bacdc cannot exactly match the given pattern P in example B, but can approximately match the given pattern P in example B at a (δ, γ) -distance, where fig. 1(7) aaddb and fig. 1(8) bacdb can approximately match the given pattern P in example B at a local threshold δ of 1 and an overall threshold γ of 1, and have good similarities with fig. 1(1) bdb as a whole; fig. 1(9) bacdc cannot approximately match the given pattern P in example B when the local threshold δ is 1 and the global threshold γ is 1, and can approximately match the given pattern P in example B when the local threshold δ is 1 and the global threshold γ is 2; as described above, Hamming distance cannot measure local approximation between sequences, and (δ, γ) -distance can measure both local and overall approximation between sequences, thereby ensuring overall similarity between sequences.
The embodiment shown in fig. 2 shows that a given pattern P in example C occurs in a given sequence S in total of 4, wherein 'S' represents the sequence S, and 'C', 'b', 'C', 'a', 'b', 'C' are respectively 7 characters in the given sequence S, and the 7 characters are respectively denoted by '1', '2', '3', '4', '5', '6', and '7' corresponding position indices; the 1 st, 2 nd, 3 rd, and 4 th occurrences of a given pattern P in a given sequence S are denoted by their respective position indices of the corresponding characters in the given sequence S, so that the 4 occurrences of the given pattern P in the given sequence S are <1,2,3>, <3,5,6>, <3,5,7> and <5,6,7>, respectively, and of the 4 occurrences, including occurrences with both delta-and gamma-distances of 0, illustrate that exact pattern matching is a special case of approximate pattern matching, which is more general.
FIG. 3 is a flow of the computer processing employed by the method of the present invention: 1) start → 2) reading in the sequence S, pattern P, local threshold δ and global threshold γ → 3) creating a net tree according to the input conditions and calculating the nodes
Figure BDA0002099636890000111
Tree root path array of
Figure BDA0002099636890000112
And leaf path array
Figure BDA0002099636890000113
) Starting from the last node of the leaf level, looking for occurrences in the net tree from right to left → 5) finding the occurrence C1 according to the right-most parent strategy, finding the occurrence C2 → 6) selecting the occurrence with the larger number of remaining occurrences and deleting the occurrence, and the node with the same label as the occurrence → 7) computing the net tree remaining nodes → 5)
Figure BDA0002099636890000114
Tree root path array of
Figure BDA0002099636890000115
And leaf path array
Figure BDA0002099636890000116
) Judging whether the first node of the leaf level is reached, if yes, executing step 9, and if no, executing step 4 → 9) outputting all occurrences of the pattern P in the sequence S on the display → 10) ending.
Example 1
A time sequence is given, a sequence S signed by an SAX (symbol aggregation approximation) method is aababccc, a pattern P is given, a [0,2] b [0,2] c [ 1] is given, a local threshold δ is given, and an overall threshold γ is given, 1.
Step one, determining the layer number of the net tree:
reading a given sequence S, which has a length of 8, and a given pattern P, which is a [0,2]]b[0,2]c[0,2]c, the length of which is 4, and each sub-mode of the pattern P is respectively marked as a sub-mode P1A, sub-pattern p2B, sub-pattern p3C, sub-pattern p4C, 4 sub-patterns are provided, and the net tree has 4 layers which are respectively marked as the 1 st layer, the 2 nd layer, the 3 rd layer and the 4 th layer;
second, a net tree is created and nodes are calculated
Figure BDA0002099636890000117
Tree root path array of
Figure BDA0002099636890000118
And leaf path array
Figure BDA0002099636890000119
Given a local threshold value δ and a global threshold value γ, where 0 ≦ δ ≦ γ, and creating a net tree in which the node labeled i in level j is designated as the node S in the first step and the pattern P in the first step
Figure BDA00020996368900001110
Unlike exact pattern matching, each node in a net tree created by one-time approximate pattern matching with local-global constraints
Figure BDA00020996368900001111
Character s to be recordediAnd sub-pattern pjDelta-distance D betweenδ(si,pj) And each node
Figure BDA00020996368900001112
Tree root path array with size of gamma +1
Figure BDA00020996368900001113
To describe the node
Figure BDA00020996368900001114
Number of tree root paths to reach tree root level
Figure BDA00020996368900001115
Array subscript 0,1, …, γ, current node
Figure BDA00020996368900001116
Among the paths to the root layer, the number of root paths with a distance of γ d is represented as
Figure BDA00020996368900001117
Initializing the alpha-gamma-beta-gamma to 0, wherein d is more than or equal to 0 and less than or equal to gamma; during the process of creating the net tree, according to each node
Figure BDA00020996368900001118
Tree root path array of
Figure BDA00020996368900001119
Invalid parent relations and invalid nodes in the net tree can be judged and deleted in advance; for the same reason, each node
Figure BDA0002099636890000121
It is also desirable to use an array of leaf paths of size γ +1
Figure BDA0002099636890000122
To describe the node
Figure BDA0002099636890000123
Number of leaf paths to leaf layer
Figure BDA0002099636890000124
Array subscript 0,1, …, γ, current node
Figure BDA0002099636890000125
The number of leaf paths with a distance of y-d in the path to the leaf layer is indicated as
Figure BDA0002099636890000126
It is also initialized to 0, where d is 0. ltoreq. gamma;
the specific treatment method comprises the following steps:
for a given sequence S of characters S in the first step aboveiProcessing and calculating in turn the submode P in the given pattern P in the first step above1Sub-mode p2…, submode pmDelta-distance D betweenδ(si,pj) Wherein j is more than or equal to 1 and less than or equal to m:
when D is presentδ(si,pj)>δ, where 1 ≦ j ≦ m, without creating nodes at level j of the created net tree
Figure BDA0002099636890000127
When D is presentδ(si,pj) And ≦ δ, where j ≦ 1, creating nodes directly at the root level of the created mesh tree
Figure BDA0002099636890000128
And calculating the node by formula (1)
Figure BDA0002099636890000129
Tree root path array of
Figure BDA00020996368900001210
Figure BDA00020996368900001211
In the formula (1), d represents a node
Figure BDA00020996368900001212
Gamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,p1) Representing a character siAnd sub-pattern p1Delta-distance therebetween;
when D is presentδ(si,pj) δ is not more than 1<If j is less than or equal to m, then a node is created in the jth layer of the created net tree
Figure BDA00020996368900001213
And judging whether the node exists in all the nodes of the j-1 th layer of the created net tree
Figure BDA00020996368900001214
Satisfies the gap constraint [ min ]j-1,maxj-1]Parent node (c): when there is no tie point
Figure BDA00020996368900001215
Satisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is deleted, the node is deleted
Figure BDA00020996368900001216
When existing and node
Figure BDA00020996368900001217
Satisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is the node, the parent node is recorded as
Figure BDA00020996368900001218
And continuously judging the parent node by the following method
Figure BDA00020996368900001219
And node
Figure BDA00020996368900001220
Whether the parent relationship can be established:
when in use
Figure BDA00020996368900001221
Then the parent node
Figure BDA00020996368900001222
And node
Figure BDA00020996368900001223
Can establish a parent relationship between them, and calculate the nodes by the formula (2)
Figure BDA00020996368900001224
Tree root path array of
Figure BDA00020996368900001225
Figure BDA00020996368900001226
In the formula (2), d represents a node
Figure BDA00020996368900001227
Gamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,
Figure BDA00020996368900001228
representing the union node in the j-1 th layer of the net tree
Figure BDA00020996368900001229
Satisfies the gap constraint [ min ]j-1,maxj-1]And can be connected to a node
Figure BDA00020996368900001230
Establishing parent nodes of parent relations;
when in use
Figure BDA00020996368900001231
Then the parent node
Figure BDA00020996368900001232
And node
Figure BDA00020996368900001233
The parent relationship can not be established;
traversing all nodes of the j-1 th layer of the net tree in sequence, and calculating the nodes
Figure BDA00020996368900001234
Tree root path array of
Figure BDA00020996368900001235
When in use
Figure BDA00020996368900001236
Then the node is deleted
Figure BDA00020996368900001237
After each character in the given sequence S in the first step is processed, a net tree is created;
the specific implementation process of the steps is as follows:
1) for the 1 st character s1A, because Dδ(s1,p1) 0 ≦ δ, character s1And sub-pattern p1Is an exact match, when j equals 1, thus creating a node
Figure BDA00020996368900001238
Calculating a node by equation (1)
Figure BDA00020996368900001239
Tree root path array of
Figure BDA00020996368900001240
Is {1,0 }; due to Dδ(s1,p2) 1 ≦ δ, character s1And sub-pattern p2Is an approximate match, when j is 2, thus creating a node
Figure BDA00020996368900001241
Absence of tie points at level 1
Figure BDA0002099636890000131
Parent nodes satisfying gap constraints, thus deleting nodes
Figure BDA0002099636890000132
In the same way, the nodes are deleted
Figure BDA0002099636890000133
And node
Figure BDA0002099636890000134
2) For the 2 nd character s2A, because Dδ(s2,p1) 0 ≦ δ, character s2And sub-pattern p1Is an exact match, when j equals 1, thus creating a node
Figure BDA0002099636890000135
Calculating a node by equation (1)
Figure BDA0002099636890000136
Tree root path array of
Figure BDA0002099636890000137
Is {1,0 }; due to Dδ(s2,p2) 1 ≦ δ, character s2And sub-pattern p2Is an approximate match, when j is 2, thus creating a node
Figure BDA0002099636890000138
Presence and junction at layer 1
Figure BDA0002099636890000139
Parent node satisfying gap constraint
Figure BDA00020996368900001310
Due to the fact that
Figure BDA00020996368900001311
Then at the parent node
Figure BDA00020996368900001312
And node
Figure BDA00020996368900001313
Establishing a parent relationship between the two nodes, and calculating the nodes according to the formula (2)
Figure BDA00020996368900001314
Tree root path array of
Figure BDA00020996368900001315
Is {0,1 }; due to Dδ(s2,p3)=|a-c|=2>δ, therefore no creation of nodes is required
Figure BDA00020996368900001316
Similarly, no nodes need to be created
Figure BDA00020996368900001317
3) For the 3 rd character s3B, because Dδ(s3,p1) 1 ≦ δ, character s3And sub-pattern p1Is an approximate match, when j equals 1, thus creating a node
Figure BDA00020996368900001318
Calculating a node by equation (1)
Figure BDA00020996368900001319
Tree root path array of
Figure BDA00020996368900001320
Is {0,1 }; due to Dδ(s3,p2) 0 ≦ δ, character s3And sub-pattern p2Is an exact match, when j is 2, thus creating a node
Figure BDA00020996368900001321
Presence and junction at layer 1
Figure BDA00020996368900001322
Parent node satisfying gap constraint
Figure BDA00020996368900001323
And parent node
Figure BDA00020996368900001324
Due to the fact that
Figure BDA00020996368900001325
Thus the parent node
Figure BDA00020996368900001326
And parent node
Figure BDA00020996368900001327
Can be connected with the node
Figure BDA00020996368900001328
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001329
Tree root path array of
Figure BDA00020996368900001330
Is {2,0 }; due to Dδ(s3,p3) 1 ≦ δ, character s3And sub-pattern p3Is an approximate match, when j is 3, thus creating a node
Figure BDA00020996368900001331
Presence and junction at layer 2
Figure BDA00020996368900001332
Parent node satisfying gap constraint
Figure BDA00020996368900001333
Due to the fact that
Figure BDA00020996368900001334
Thus the parent node
Figure BDA00020996368900001335
And node
Figure BDA00020996368900001336
Can not establish a parent relationship between the two, because
Figure BDA00020996368900001337
Thus deleting nodes
Figure BDA00020996368900001338
Due to Dδ(s3,p4) 1 ≦ δ, character s3And (2)Mode p4Is an approximate match, when j is 4, thus creating a node
Figure BDA00020996368900001339
Absence of tie points at layer 3
Figure BDA00020996368900001340
Parent nodes satisfying gap constraints, thus deleting nodes
Figure BDA00020996368900001341
4) For the 4 th character s4A, because Dδ(s4,p1) 0 ≦ δ, character s4And sub-pattern p1Is an exact match, when j equals 1, thus creating a node
Figure BDA00020996368900001342
Calculating a node by equation (1)
Figure BDA00020996368900001343
Tree root path array of
Figure BDA00020996368900001344
Is {1,0 }; due to Dδ(s4,p2) 1 ≦ δ, character s4And sub-pattern p2Is an approximate match, when j is 2, thus creating a node
Figure BDA00020996368900001345
Presence and junction at layer 1
Figure BDA00020996368900001346
Parent node satisfying gap constraint
Figure BDA00020996368900001347
Parent node
Figure BDA00020996368900001348
And parent node
Figure BDA00020996368900001349
Due to the fact that
Figure BDA00020996368900001350
And
Figure BDA00020996368900001351
thus the parent node
Figure BDA00020996368900001352
And parent node
Figure BDA00020996368900001353
Can be connected with the node
Figure BDA00020996368900001354
Establishing parent relationship, parent node
Figure BDA00020996368900001355
Can not be connected with a node
Figure BDA00020996368900001356
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001357
Tree root path array of
Figure BDA00020996368900001358
Is {0,2 }; due to Dδ(s4,p3)=|a-c|=2>δ, therefore no creation of nodes is required
Figure BDA00020996368900001359
Similarly, no nodes need to be created
Figure BDA00020996368900001360
5) For the 5 th character s5B, because Dδ(s5,p1) 1 ≦ δ, character s5And sub-mouldFormula p1Is an approximate match, when j equals 1, thus creating a node
Figure BDA00020996368900001361
Calculating a node by equation (1)
Figure BDA00020996368900001362
Tree root path array of
Figure BDA00020996368900001363
Is {0,1 }; due to Dδ(s5,p2) 0 ≦ δ, character s5And sub-pattern p2Is an exact match, when j is 2, thus creating a node
Figure BDA00020996368900001364
Presence and junction at layer 1
Figure BDA00020996368900001365
Parent node satisfying gap constraint
Figure BDA00020996368900001366
Parent node
Figure BDA00020996368900001367
And parent node
Figure BDA00020996368900001368
Due to the fact that
Figure BDA00020996368900001369
And
Figure BDA0002099636890000141
thus the parent node
Figure BDA0002099636890000142
Parent node
Figure BDA0002099636890000143
And parent node
Figure BDA0002099636890000144
And can both engage with the junction
Figure BDA0002099636890000145
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA0002099636890000146
Tree root path array of
Figure BDA0002099636890000147
Is {2,1 }; due to Dδ(s5,p3) 1 ≦ δ, character s5And sub-pattern p3Is an approximate match, when j is 3, thus creating a node
Figure BDA0002099636890000148
Presence and junction at layer 2
Figure BDA0002099636890000149
Parent node satisfying gap constraint
Figure BDA00020996368900001410
Parent node
Figure BDA00020996368900001411
And parent node
Figure BDA00020996368900001412
Due to the fact that
Figure BDA00020996368900001413
And
Figure BDA00020996368900001414
thus the parent node
Figure BDA00020996368900001415
And parent node
Figure BDA00020996368900001416
Can not be connected with the node
Figure BDA00020996368900001417
Establishing parent relationship, parent node
Figure BDA00020996368900001418
Can be connected with a node
Figure BDA00020996368900001419
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001420
Tree root path array of
Figure BDA00020996368900001421
Is {0,2 }; due to Dδ(s5,p4) 1 ≦ δ, character s5And sub-pattern p4Is an approximate match, when j is 4, thus creating a node
Figure BDA00020996368900001422
Absence of tie points at layer 3
Figure BDA00020996368900001423
Parent nodes satisfying gap constraints, thus deleting nodes
Figure BDA00020996368900001424
6) For the 6 th character s6C, since Dδ(s6,p2) 1 ≦ δ, character s6And sub-pattern p2Is an approximate match, when j is 2, thus creating a node
Figure BDA00020996368900001425
Presence and junction at layer 1
Figure BDA00020996368900001426
Parent node satisfying gap constraint
Figure BDA00020996368900001427
Parent node
Figure BDA00020996368900001428
And parent node
Figure BDA00020996368900001429
Due to the fact that
Figure BDA00020996368900001430
Figure BDA00020996368900001431
And
Figure BDA00020996368900001432
thus the parent node
Figure BDA00020996368900001433
And parent node
Figure BDA00020996368900001434
Can not be connected with the node
Figure BDA00020996368900001435
Establishing parent relationship, parent node
Figure BDA00020996368900001436
Can be connected with a node
Figure BDA00020996368900001437
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001438
Tree root path array of
Figure BDA00020996368900001439
Is {0,1 }; due to Dδ(s6,p3) 0 ≦ δ, character s6And sub-pattern p3Is an exact match, when j is 3, thus creating a node
Figure BDA00020996368900001440
Presence and junction at layer 2
Figure BDA00020996368900001441
Parent node satisfying gap constraint
Figure BDA00020996368900001442
Parent node
Figure BDA00020996368900001443
And parent node
Figure BDA00020996368900001444
Due to the fact that
Figure BDA00020996368900001445
And
Figure BDA00020996368900001446
thus the parent node
Figure BDA00020996368900001447
Parent node
Figure BDA00020996368900001448
And parent node
Figure BDA00020996368900001449
Can both be connected with a node
Figure BDA00020996368900001450
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001451
Tree root path array of
Figure BDA00020996368900001452
Is {4,3 }; due to Dδ(s6,p4) 0 ≦ δ, character s6And sub-pattern p4Is an exact match, when j is 4, thus creating a node
Figure BDA00020996368900001453
Presence and junction at layer 3
Figure BDA00020996368900001454
Parent node satisfying gap constraint
Figure BDA00020996368900001455
Due to the fact that
Figure BDA00020996368900001456
Thus the parent node
Figure BDA00020996368900001457
Can be connected with a node
Figure BDA00020996368900001458
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001459
Tree root path array of
Figure BDA00020996368900001460
Is {0,2 };
7) for the 7 th character s7C, since Dδ(s7,p3) 0 ≦ δ, character s7And sub-pattern p3Is an exact match, when j is 3, thus creating a node
Figure BDA00020996368900001461
Presence and junction in layer 2
Figure BDA00020996368900001462
Parent node satisfying gap constraint
Figure BDA00020996368900001463
Parent node
Figure BDA00020996368900001464
And parent node
Figure BDA00020996368900001465
Due to the fact that
Figure BDA00020996368900001466
Figure BDA00020996368900001467
And
Figure BDA00020996368900001468
thus the parent node
Figure BDA00020996368900001469
Parent node
Figure BDA00020996368900001470
And parent node
Figure BDA00020996368900001471
Can be connected with the node
Figure BDA00020996368900001472
Establishing a parent relationship, and calculating a node by the formula (2)
Figure BDA00020996368900001473
Tree root path array of
Figure BDA00020996368900001474
Is {2,4 }; due to Dδ(s7,p4) 0 ≦ δ, character s7And sub-pattern p4Is an exact match, when j is 4, thus creating a node
Figure BDA00020996368900001475
Presence and junction at layer 3
Figure BDA00020996368900001476
Parents satisfying gap constraintsNode point
Figure BDA00020996368900001477
And parent node
Figure BDA00020996368900001478
Due to the fact that
Figure BDA00020996368900001479
Figure BDA00020996368900001480
Thus the parent node
Figure BDA00020996368900001481
And parent node
Figure BDA00020996368900001482
Can both be connected with a node
Figure BDA00020996368900001483
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001484
Tree root path array of
Figure BDA00020996368900001485
Is {4,5 };
8) for the 8 th character s8C, since Dδ(s8,p4) 0 ≦ δ, character s8And sub-pattern p4Is an exact match, when j is 4, thus creating a node
Figure BDA0002099636890000151
Presence and junction at layer 3
Figure BDA0002099636890000152
Parent node satisfying gap constraint
Figure BDA0002099636890000153
Parent node
Figure BDA0002099636890000154
And parent node
Figure BDA0002099636890000155
Due to the fact that
Figure BDA0002099636890000156
Figure BDA0002099636890000157
And
Figure BDA0002099636890000158
thus the parent node
Figure BDA0002099636890000159
Parent node
Figure BDA00020996368900001510
And parent node
Figure BDA00020996368900001511
Can both be connected with a node
Figure BDA00020996368900001512
Establishing a parent relationship, and calculating a node according to a formula (2)
Figure BDA00020996368900001513
Tree root path array of
Figure BDA00020996368900001514
Is {6,9 };
at this time, the characters in the given sequence S are processed, and a net tree is created, fig. 6 of the attached drawing of the specification shows the net tree which is created finally, and the nodes in which the leaf level cannot be reached, the invalid parent relationship and the invalid nodes are removed;
in the created net tree, each node of the net tree is calculated
Figure BDA00020996368900001515
Array of leaf paths
Figure BDA00020996368900001516
Node point
Figure BDA00020996368900001517
Array of leaf paths
Figure BDA00020996368900001518
Starting from the leaf layer:
for leaf node
Figure BDA00020996368900001519
Calculating the leaf path array by formula (3)
Figure BDA00020996368900001520
Figure BDA00020996368900001521
In the formula (3), d represents a node
Figure BDA00020996368900001522
Gamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pm) Representing a character siAnd sub-pattern pmDelta-distance therebetween;
for net tree j level node
Figure BDA00020996368900001523
Wherein j is more than or equal to 1<m, when there is a child node, the child node is recorded as
Figure BDA00020996368900001524
And calculating the node by formula (4)
Figure BDA00020996368900001525
Array of leaf paths
Figure BDA00020996368900001526
Figure BDA00020996368900001527
In the formula (4), d represents a node
Figure BDA00020996368900001528
Gamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,
Figure BDA00020996368900001529
representation node
Figure BDA00020996368900001530
The child node of (2);
go through in turn
Figure BDA00020996368900001531
All the child nodes of the node are calculated
Figure BDA00020996368900001532
Array of leaf paths
Figure BDA00020996368900001533
According to the node
Figure BDA00020996368900001534
Tree root path array of
Figure BDA00020996368900001535
Calculating the node by the formula (5)
Figure BDA00020996368900001536
Tree root path number of
Figure BDA00020996368900001537
According to the node
Figure BDA00020996368900001538
Array of leaf paths
Figure BDA00020996368900001539
Calculating the node by the formula (6)
Figure BDA00020996368900001540
Number of leaf paths of
Figure BDA00020996368900001541
Figure BDA00020996368900001542
In the formula (5), d represents a node
Figure BDA00020996368900001543
A γ -distance in the range of 0 to γ to the root layer of the tree, γ representing the overall threshold;
Figure BDA00020996368900001544
in the formula (6), d represents a node
Figure BDA00020996368900001545
A γ -distance in the range of 0 to γ to the leaf layer, γ representing the overall threshold;
according to the node
Figure BDA00020996368900001546
Tree root path array of
Figure BDA00020996368900001547
And leaf path array
Figure BDA00020996368900001548
Calculating the node by the formula (7)
Figure BDA00020996368900001549
Root leaf path number of
Figure BDA00020996368900001550
Figure BDA00020996368900001551
In the formula (7), k represents a node
Figure BDA00020996368900001552
D to reach the root level of the treeδ(si,pj) Gamma-distance in the range to gamma, h denotes a node
Figure BDA00020996368900001553
D to the leaf layerδ(si,pj) To gamma + Dδ(si,pj) -a distance γ in the range of k, γ representing the overall threshold;
in the created net tree, the nodes are connected
Figure BDA00020996368900001554
Root leaf path number of
Figure BDA00020996368900001555
Calculating the position-related number RP (i) of the node, by means of which node
Figure BDA0002099636890000161
Tree root path number of
Figure BDA0002099636890000162
Calculating the path branch number PB (i) of the node:
node point
Figure BDA0002099636890000163
The location related number RP (i) of (A) is that all nodes labeled i in the net tree
Figure BDA0002099636890000164
Root leaf path number of
Figure BDA0002099636890000165
The sum is calculated by equation (8):
Figure BDA0002099636890000166
in the formula (8), m represents the number of layers of the net tree,
Figure BDA0002099636890000167
representation node
Figure BDA0002099636890000168
The number of tree root leaf paths;
node point
Figure BDA0002099636890000169
The ancestor set of (1) is a set formed by nodes on all paths of the node to the root layer of the tree; the common ancestor set of a node set is a set formed by intersection of ancestor sets of all elements in the node set; node point
Figure BDA00020996368900001610
The path branch number PB (i) of (a) refers to all nodes labeled i in a common ancestor set A (D) of a node set D
Figure BDA00020996368900001611
Tree root path number of
Figure BDA00020996368900001612
The sum is calculated by equation (9):
Figure BDA00020996368900001613
in formula (9), l represents the depth of the common ancestor set A (D),
Figure BDA00020996368900001614
representation node
Figure BDA00020996368900001615
The number of tree root paths;
thirdly, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure:
the problem of one-time approximate pattern matching under the (delta, gamma) -distance is solved by utilizing the net trees constructed in the first step and the second step, specifically, one root leaf path in the net tree corresponds to one occurrence, the root leaf path meeting local-overall constraint is searched for in the net tree, based on the net tree structure, a heuristic method is adopted to search for the root leaf path meeting local-overall constraint in the net tree, the heuristic method has four similar search occurrence strategies, the first one of the four similar search occurrence strategies of the heuristic method, the rightmost parent strategy and the greedy search parent strategy are adopted in the embodiment: the right-most parent strategy is to start from the last node of the leaf layer, select the right-most parent node of the current node under the condition of meeting the local-overall constraint, the greedy search parent strategy is to start from the last node of the leaf layer, and select the optimal parent node of the current node under the condition of meeting the local-overall constraint, wherein the optimal parent node is the parent node with smaller position correlation number, and when the position correlation numbers of the two parent nodes are the same, the parent node with larger path branch number is selected as the optimal parent node in the common ancestor set of the existing path nodes;
the specific treatment method comprises the following steps:
firstly, starting from the last node of a leaf sublayer, finding a C1 occurrence in a net tree through a rightmost parent strategy, and finding a C2 occurrence in the net tree through a greedy search parent strategy; the two occurrences of residue are then comparedSelecting one of the occurrences with larger number of the remaining occurrences, and adding the selected one to the result set OCC (S, P); finally, deleting the appearance and all nodes with the same labels as the nodes in the appearance in the net tree, and recalculating the rest nodes according to the formula (1) and the formula (2)
Figure BDA00020996368900001616
Tree root path array of
Figure BDA00020996368900001617
Recalculating the remaining nodes according to equations (3) and (4)
Figure BDA00020996368900001618
Array of leaf paths
Figure BDA00020996368900001619
Iterating the above process until the first node of the leaf layer is the result set OCC (S, P), wherein the remaining occurrence number of C is that the C and all nodes with the same label as the node in the C are deleted in the net tree, and the remaining leaf nodes in the net tree are the same as the nodes in the C
Figure BDA00020996368900001620
Tree root path number of
Figure BDA00020996368900001621
Summing;
thereby completing one-time approximate pattern matching at (delta, gamma) -distance by using the net tree structure;
outputting all occurrences of a given pattern P in said first step in a given sequence S in said first step, i.e. all occurrences in the output result set OCC (S, P), on a display, thereby completing a one-time approximate pattern matching with local-global constraints.
The specific implementation process of the steps is as follows:
1) slave node
Figure BDA0002099636890000171
Start looking for the presence:
in FIG. 6 of the drawings in the specification, the slave nodes meet the local-global constraint according to the right-most parent strategy
Figure BDA0002099636890000172
Begin to find its rightmost parent node upwards
Figure BDA0002099636890000173
Add it to existing path<8>In this case, there is a path<7,8>The gamma-distance of (a) is 0; under the condition of meeting local-overall constraint, continuously searching nodes upwards
Figure BDA0002099636890000174
The rightmost parent node of
Figure BDA0002099636890000175
Add it to existing path<7,8>In this case, there is a path<6,7,8>The gamma-distance of (a) is 1; under the condition of meeting local-overall constraint, continuously searching nodes upwards
Figure BDA0002099636890000176
The rightmost parent node of
Figure BDA0002099636890000177
Add it to existing path<6,7,8>In this case, one occurrence is found according to the right-most parent policy<4,6,7,8>;
FIG. 8 of the drawings shows that the deletion of occurrences in a net tree<4,6,7,8>And nodes with the same labels as the nodes, recalculating the remaining nodes by formula (1) and formula (2)
Figure BDA0002099636890000178
Tree root path array of
Figure BDA0002099636890000179
And each leaf node is obtained through the formula (5)
Figure BDA00020996368900001710
Tree root path number of
Figure BDA00020996368900001711
The sum is 0, thus appearing<4,6,7,8>Is 0;
looking up nodes according to a greedy search parent strategy in conjunction with FIG. 6 of the accompanying drawings and FIG. 7 of the accompanying drawings
Figure BDA00020996368900001712
First, in fig. 6 of the drawings of the specification, each node is calculated by formula (5)
Figure BDA00020996368900001713
Tree root path number of
Figure BDA00020996368900001714
In FIG. 7 of the drawings in the specification, each node is calculated by the formula (6)
Figure BDA00020996368900001715
Number of leaf paths of
Figure BDA00020996368900001716
Calculating each node by formula (7) and formula (8)
Figure BDA00020996368900001717
Root leaf path number of
Figure BDA00020996368900001718
And a position-related number rp (i); under the condition of satisfying local-overall constraint
Figure BDA00020996368900001719
Three parent nodes are all the same, namely the parent nodes
Figure BDA00020996368900001720
Parent node
Figure BDA00020996368900001721
And parent node
Figure BDA00020996368900001722
Due to the fact that
Figure BDA00020996368900001723
And
Figure BDA00020996368900001724
then the parent node
Figure BDA00020996368900001725
Position-related number of (RP (5)) and parent node
Figure BDA00020996368900001726
Is also small, so that the two paths are compared in the existing path<8>The number of path branches in the common ancestor set, at the node
Figure BDA00020996368900001727
Are concentrated in common ancestors, due to
Figure BDA00020996368900001728
Parent node
Figure BDA00020996368900001729
The number of path branches PB (7) is greater than that of the parent node
Figure BDA00020996368900001730
The number of branch paths PB (5), thereby connecting the nodes
Figure BDA00020996368900001731
Joining to an existing path<8>In this case, there is a path<7,8>The gamma-distance of (a) is 0; continue to search upward for nodes
Figure BDA00020996368900001732
Under the condition of satisfying local-overall constraintNode (b)
Figure BDA00020996368900001733
Three parent nodes are all the same, namely the parent nodes
Figure BDA00020996368900001734
Parent node
Figure BDA00020996368900001735
And parent node
Figure BDA00020996368900001736
Due to the fact that
Figure BDA00020996368900001737
And
Figure BDA00020996368900001738
parent node
Figure BDA00020996368900001739
Is minimized, thereby connecting the nodes
Figure BDA00020996368900001740
Joining to an existing path<7,8>In this case, there is a path<4,7,8>The gamma-distance of (a) is 1; continue to search upward for nodes
Figure BDA00020996368900001741
The optimal parent node of (2) under the condition of satisfying local-overall constraint
Figure BDA00020996368900001742
Three parent nodes are all the same, namely the parent nodes
Figure BDA00020996368900001743
And parent node
Figure BDA00020996368900001744
Due to the fact that
Figure BDA00020996368900001745
Thereby connecting the nodes
Figure BDA00020996368900001746
Joining to an existing path<4,7,8>At this time, a single occurrence is found according to the greedy search parent strategy<1,4,7,8>;
FIG. 9 of the drawings shows that the deletion of occurrences in a net tree<1,4,7,8>And nodes with the same labels as the nodes, recalculating the remaining nodes by formula (1) and formula (2)
Figure BDA00020996368900001747
Tree root path array of
Figure BDA00020996368900001748
And each node of the leaf layer is obtained by the formula (5)
Figure BDA00020996368900001749
Tree root path number of
Figure BDA00020996368900001750
The sum is 1, thus appearing<1,4,7,8>The remaining number of occurrences of (a) is 1;
in summary, the occurrences are found by the right-most parent strategy<4,6,7,8>The number of occurrences remaining is 0, and the occurrences are found by greedy search parent strategy<1,4,7,8>The number of remaining occurrences of which is 1, the occurrence in which the number of remaining occurrences is the largest is selected, and the slave node is thus selected
Figure BDA00020996368900001751
Begin to find the appearance<1,4,7,8>Adding it to the result set OCC (S, P), and deleting the appearance in FIG. 9 in FIG. 10 of the drawings accompanying this specification<1,4,7,8>And the net tree behind all nodes with the same label;
2) slave node
Figure BDA0002099636890000181
Start looking for the presence:
due to the node
Figure BDA0002099636890000182
When the last step is found, it is found<1,4,7,8>In use, the tree has been deleted, so the search for the occurrence continues directly from the next leaf node;
3) slave node
Figure BDA0002099636890000183
Start looking for the presence:
in FIG. 10 of the drawings in the specification, the slave nodes meet the local-global constraint according to the right-most parent strategy
Figure BDA0002099636890000184
Begin to find its rightmost parent node upwards
Figure BDA0002099636890000185
Add it to existing path<6>In this case, there is a path<5,6>The gamma-distance of (a) is 1; under the condition of meeting local-overall constraint, continuously searching nodes upwards
Figure BDA0002099636890000186
The rightmost parent node of
Figure BDA0002099636890000187
Add it to existing path<5,6>In this case, there is a path<3,5,6>The gamma-distance of (a) is 0; under the condition of meeting local-overall constraint, continuously searching nodes upwards
Figure BDA0002099636890000188
The rightmost parent node of
Figure BDA0002099636890000189
Add it to existing path<2,3,5,6>In this case, one occurrence is found according to the right-most parent policy<2,3,5,6>;
In FIG. 10 of the drawings, nodes are found upward according to a greedy search parent strategy
Figure BDA00020996368900001810
Since the nodes are the optimal parent nodes
Figure BDA00020996368900001811
Only one parent node
Figure BDA00020996368900001812
There is no need to calculate the position-related number and the path branch number, and therefore it is added to the existing path<6>Performing the following steps; similarly, selecting parent nodes
Figure BDA00020996368900001813
And parent node
Figure BDA00020996368900001814
Thus finding an occurrence according to a greedy search parent strategy<2,3,5,6>;
In summary, the slave node
Figure BDA00020996368900001815
Initially, an occurrence is found by both the right-most parent strategy and the greedy search parent strategy<2,3,5,6>Adding it to result set OCC (S, P);
now the first node of the leaf sublayer has been reached, the one-time approximate pattern matching with local-global constraints is completed, and the occurrences in the result set OCC (S, P), i.e. <1,4,7,8> and <2,3,5,6> are output on the display.
Fig. 4 shows that the present embodiment is based on a given pattern P ═ a [0,2 ═ a]b[0,2]c[0,2]The number of sub-patterns in c determines the number of levels of the net tree, since there are 4 sub-patterns in a given pattern P, sub-pattern P1A, sub-pattern p2B, sub-pattern p3C, sub-pattern p4Since c is the number of layers, the net tree has 4 layers, which are referred to as layer 1, layer 2, layer 3, and layer 4, respectively.
Fig. 5 shows that the present embodiment creates a net tree based on the given sequence S, pattern P, local threshold δ, and global threshold γ, and calculates each node
Figure BDA00020996368900001816
D of (A)δ(si,pj) And calculating each node
Figure BDA00020996368900001817
Tree root path array of
Figure BDA00020996368900001818
Grid line node in graph
Figure BDA00020996368900001819
Representing a character siAnd sub-pattern pjApproximately matched, i.e. 0<Dδ(si,pj) Delta or less, white node
Figure BDA00020996368900001820
Representing a character siAnd sub-pattern pjExact matching, i.e. Dδ(si,pj) 0; each node
Figure BDA00020996368900001821
Data representation node of left first row
Figure BDA00020996368900001822
Tree root path array of
Figure BDA00020996368900001823
The second line of data representing the character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) (ii) a For Dδ(si,pj)>Delta junction point
Figure BDA00020996368900001824
Need not be created since Dδ(s4,p3)>δ, thus the node
Figure BDA00020996368900001825
Is not created; in the drawingsThe black dotted line of (a) indicates that the gap constraint is satisfied between two nodes, but due to the limitation of local-global constraint, the parent relationship cannot be established between the two nodes; for the
Figure BDA00020996368900001826
Node (a) of
Figure BDA00020996368900001827
Paths satisfying the local-global constraint that do not pass through the node, need to be deleted, need not be created, since
Figure BDA00020996368900001828
Thus the node
Figure BDA00020996368900001829
Deleted, not created.
FIG. 6 is a net tree of FIG. 5 of the drawings of the specification with nodes in the leaf level unreachable, invalid parent relationships, and invalid nodes removed, wherein there are no invalid parent relationships and invalid nodes and each node can reach the leaf level, each node being capable of reaching the leaf level
Figure BDA00020996368900001830
Data representation node of left first row
Figure BDA00020996368900001831
Tree root path array of
Figure BDA00020996368900001832
The second line of data representing the character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj)。
FIG. 7 illustrates leaf nodes from a net tree
Figure BDA00020996368900001833
Initially, the respective nodes in FIG. 6 of the drawings of the specification are calculated by formula (3) and formula (4)
Figure BDA0002099636890000191
Array of leaf paths
Figure BDA0002099636890000192
Wherein each node
Figure BDA0002099636890000193
Data representation node on right side
Figure BDA0002099636890000194
Array of leaf paths
Figure BDA0002099636890000195
FIG. 8 shows the slave node of the present embodiment in conjunction with FIG. 6 of the drawings and FIG. 7 of the drawings
Figure BDA0002099636890000196
Initially, one occurrence is found according to the right-most parent policy<4,6,7,8>Is present and present<4,6,7,8>And all nodes with the same label as it are indicated by dashed circles; removing occurrences from a net tree<4,6,7,8>And all nodes with the same label, deleting all parent relations related to the nodes, and recalculating the rest nodes by formula (1) and formula (2)
Figure BDA0002099636890000197
Tree root path array of
Figure BDA0002099636890000198
Each node
Figure BDA0002099636890000199
Left first row data representation node
Figure BDA00020996368900001910
Tree root path array of
Figure BDA00020996368900001911
The second line of data representing a character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) (ii) a Each leaf node of the net tree
Figure BDA00020996368900001912
Tree root path number of
Figure BDA00020996368900001913
If the sum is 0, then<4,6,7,8>The remaining number of occurrences of (a) is 0.
FIG. 9 shows the slave node of the present embodiment in conjunction with FIG. 6 of the drawings and FIG. 7 of the drawings
Figure BDA00020996368900001914
Initially, an occurrence is found according to a greedy search parent strategy<1,4,7,8>Is present and present<1,4,7,8>And all nodes with the same label as it are indicated by dashed circles; removing occurrences from a net tree<1,4,7,8>And all nodes with the same label, deleting all parent relations related to the nodes, and recalculating the rest nodes by formula (1) and formula (2)
Figure BDA00020996368900001915
Tree root path array of
Figure BDA00020996368900001916
Each node
Figure BDA00020996368900001917
Left first row data representation node
Figure BDA00020996368900001918
Tree root path array of
Figure BDA00020996368900001919
The second line of data representing a character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) (ii) a Each leaf node of the net tree
Figure BDA00020996368900001920
Tree root path number of
Figure BDA00020996368900001921
If the sum is 1, then<1,4,7,8>The remaining number of occurrences of (c) is 1.
FIG. 10 is a drawing of the specification with the deletion of FIG. 9<1,4,7,8>And a net tree behind all nodes with the same label as the net tree, each node
Figure BDA00020996368900001922
Left first row data representation node
Figure BDA00020996368900001923
Tree root path array of
Figure BDA00020996368900001924
The second line of data representing a character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) Slave node
Figure BDA00020996368900001925
Initially, an occurrence is found according to a greedy search parent strategy<2,3,5,6>。
Example 2
Except for the third step, in solving the one-time approximate pattern matching problem under the (δ, γ) -distance by using the net tree structure, a heuristic method is adopted to search a tree root leaf path satisfying local-overall constraints in the net tree, in this embodiment, the second of four similar search emerging strategies of the heuristic method is adopted, namely a leftmost parent strategy and a greedy search parent strategy: the left-most parent strategy is that a left-most parent node of a current node is selected preferentially from a first node of a leaf layer under the condition of meeting local-overall constraint, at the moment, greedy search is performed on the left-most parent strategy, namely the left-most parent node of the current node is selected from the first node of the leaf layer, and the optimal parent node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal parent node is the parent node with a small position correlation number; "otherwise, the same procedure as in example 1 was repeated.
Example 3
Except for the third step, in solving the one-time approximate pattern matching problem under the (δ, γ) -distance by using the net tree structure, a heuristic method is adopted to search a tree root leaf path satisfying local-overall constraints in the net tree, in this embodiment, the third of four similar search occurrence strategies of the heuristic method is adopted, namely a right-most child strategy and a greedy search child strategy: the rightmost child strategy is that a rightmost child node of a current node is selected preferentially from the last node of a tree root layer under the condition of meeting local-overall constraint, at the moment, child strategy is searched greedily from the last node of the tree root layer, and the optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with smaller position correlation number; "otherwise, the same procedure as in example 1 was repeated.
Example 4
Except for the third step, in solving the one-time approximate pattern matching problem under the (δ, γ) -distance by using the net tree structure, a heuristic method is adopted to search a tree root leaf path satisfying local-overall constraints in the net tree, in this embodiment, a fourth of four similar search occurrence strategies of the heuristic method is adopted, namely a leftmost child strategy and a greedy search child strategy: the left-most child strategy is that a left-most child node of a current node is selected from a first node of a tree root layer under the condition of meeting local-overall constraint, at the moment, a child strategy is searched greedily from the first node of the tree root layer, and an optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with a small position correlation number; "otherwise, the same procedure as in example 1 was repeated.
The present invention is not limited to the above embodiments, and any design that uses the design concept of the present invention and makes some simple changes should be within the scope of the present invention.

Claims (1)

1. A one-time approximate pattern matching method with local-global constraints, characterized by: the method solves the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure, and comprises the following specific steps:
step one, determining the layer number of the net tree:
reading in a given sequence S, determining the length n, each character in the sequence S being marked as a character S1S character2…, character snReading in a given pattern P, determining its length as m, and recording each sub-pattern of the pattern P as a sub-pattern P1Sub-mode p2…, submode pmHere 0<m<n; determining the number of layers of the net tree according to the number of the sub-modes in the given mode P, wherein the net tree has m layers which are respectively marked as a 1 st layer, a 2 nd layer, … and an m th layer;
second, a net tree is created and nodes are calculated
Figure FDA0002843067510000011
Tree root path array of
Figure FDA0002843067510000012
And leaf path array
Figure FDA0002843067510000013
Given a local threshold value delta and a global threshold value gamma, where 0. ltoreq. delta. ltoreq. gamma, and according to the sequence S in the first step and the sequence SPattern P in the first step creates a net tree with nodes labeled i at level j of the net tree, which are labeled i
Figure FDA0002843067510000014
Unlike exact pattern matching, each node in a net tree created by one-time approximate pattern matching with local-global constraints
Figure FDA0002843067510000015
Character s to be recordediAnd sub-pattern pjDelta-distance D betweenδ(si,pj) And each node
Figure FDA0002843067510000016
Tree root path array with size of gamma +1
Figure FDA0002843067510000017
To describe the node
Figure FDA0002843067510000018
Number of tree root paths to reach tree root level
Figure FDA0002843067510000019
Array subscript 0,1, …, γ, current node
Figure FDA00028430675100000110
Among the paths to the root layer, the number of root paths with a distance of γ d is represented as
Figure FDA00028430675100000111
Initializing the alpha-gamma-beta-gamma to 0, wherein d is more than or equal to 0 and less than or equal to gamma; during the process of creating the net tree, according to each node
Figure FDA00028430675100000112
Tree root path array of
Figure FDA00028430675100000113
Invalid parent relations and invalid nodes in the net tree can be judged and deleted in advance; for the same reason, each node
Figure FDA00028430675100000114
It is also desirable to use an array of leaf paths of size γ +1
Figure FDA00028430675100000115
To describe the node
Figure FDA00028430675100000116
Number of leaf paths to leaf layer
Figure FDA00028430675100000117
Array subscript 0,1, …, γ, current node
Figure FDA00028430675100000118
The number of leaf paths with a distance of y-d in the path to the leaf layer is indicated as
Figure FDA00028430675100000119
It is also initialized to 0, where d is 0. ltoreq. gamma;
the specific treatment method comprises the following steps:
for a given sequence S of characters S in the first step aboveiProcessing and calculating in turn the submode P in the given pattern P in the first step above1Sub-mode p2…, submode pmDelta-distance D betweenδ(si,pj) Wherein j is more than or equal to 1 and less than or equal to m:
when D is presentδ(si,pj)>δ, where 1 ≦ j ≦ m, without creating nodes at level j of the created net tree
Figure FDA00028430675100000120
When D is presentδ(si,pj) And ≦ δ, where j ≦ 1, creating nodes directly at the root level of the created mesh tree
Figure FDA00028430675100000121
And calculating the node by formula (1)
Figure FDA00028430675100000122
Tree root path array of
Figure FDA00028430675100000123
Figure FDA00028430675100000124
In the formula (1), d represents a node
Figure FDA00028430675100000125
Gamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,p1) Representing a character siAnd sub-pattern p1Delta-distance therebetween;
when D is presentδ(si,pj) δ is not more than 1<If j is less than or equal to m, then a node is created in the jth layer of the created net tree
Figure FDA00028430675100000126
And judging whether the node exists in all the nodes of the j-1 th layer of the created net tree
Figure FDA00028430675100000127
Satisfies the gap constraint [ min ]j-1,maxj-1]Parent node (c): when there is no tie point
Figure FDA00028430675100000128
Satisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is deleted, the node is deleted
Figure FDA00028430675100000129
When existing and node
Figure FDA00028430675100000130
Satisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is the node, the parent node is recorded as
Figure FDA00028430675100000131
And continuously judging the parent node by the following method
Figure FDA00028430675100000132
And node
Figure FDA00028430675100000133
Whether the parent relationship can be established:
when in use
Figure FDA0002843067510000021
Then the parent node
Figure FDA0002843067510000022
And node
Figure FDA0002843067510000023
Can establish a parent relationship between them, and calculate the nodes by the formula (2)
Figure FDA0002843067510000024
Tree root path array of
Figure FDA0002843067510000025
Figure FDA0002843067510000026
In the formula (2), d represents a node
Figure FDA0002843067510000027
Gamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,
Figure FDA0002843067510000028
representing the union node in the j-1 th layer of the net tree
Figure FDA0002843067510000029
Satisfies the gap constraint [ min ]j-1,maxj-1]And can be connected to a node
Figure FDA00028430675100000210
Establishing parent nodes of parent relations;
when in use
Figure FDA00028430675100000211
Then the parent node
Figure FDA00028430675100000212
And node
Figure FDA00028430675100000213
The parent relationship can not be established;
traversing all nodes of the j-1 th layer of the net tree in sequence, and calculating the nodes
Figure FDA00028430675100000214
Tree root path array of
Figure FDA00028430675100000215
When in use
Figure FDA00028430675100000216
Then the node is deleted
Figure FDA00028430675100000217
After each character in the given sequence S in the first step is processed, a net tree is created;
in the created net tree, each node of the net tree is calculated
Figure FDA00028430675100000218
Array of leaf paths
Figure FDA00028430675100000219
Node point
Figure FDA00028430675100000220
Array of leaf paths
Figure FDA00028430675100000221
Starting from the leaf layer:
for leaf node
Figure FDA00028430675100000222
Calculating the leaf path array by formula (3)
Figure FDA00028430675100000223
Figure FDA00028430675100000224
In the formula (3), d represents a node
Figure FDA00028430675100000225
Gamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pm) Representing a character siAnd sub-pattern pmDelta-distance therebetween;
for net treeNode of j level
Figure FDA00028430675100000226
When the child node exists, the child node is recorded as
Figure FDA00028430675100000227
And calculating the node by formula (4)
Figure FDA00028430675100000228
Array of leaf paths
Figure FDA00028430675100000229
Figure FDA00028430675100000230
In the formula (4), d represents a node
Figure FDA00028430675100000231
Gamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,
Figure FDA00028430675100000232
representation node
Figure FDA00028430675100000233
The child node of (2);
go through in turn
Figure FDA00028430675100000234
All the child nodes of the node are calculated
Figure FDA00028430675100000235
Array of leaf paths
Figure FDA00028430675100000236
According to the node
Figure FDA00028430675100000237
Tree root path array of
Figure FDA00028430675100000238
Calculating the node by the formula (5)
Figure FDA00028430675100000239
Tree root path number of
Figure FDA00028430675100000240
According to the node
Figure FDA00028430675100000241
Array of leaf paths
Figure FDA00028430675100000242
Calculating the node by the formula (6)
Figure FDA00028430675100000243
Number of leaf paths of
Figure FDA00028430675100000244
Figure FDA00028430675100000245
In the formula (5), d represents a node
Figure FDA00028430675100000246
A γ -distance in the range of 0 to γ to the root layer of the tree, γ representing the overall threshold;
Figure FDA00028430675100000247
in the formula (6), d represents a node
Figure FDA00028430675100000248
A γ -distance in the range of 0 to γ to the leaf layer, γ representing the overall threshold;
according to the node
Figure FDA00028430675100000249
Tree root path array of
Figure FDA00028430675100000250
And leaf path array
Figure FDA00028430675100000251
Calculating the node by the formula (7)
Figure FDA00028430675100000252
Root leaf path number of
Figure FDA0002843067510000031
Figure FDA0002843067510000032
In the formula (7), k represents a node
Figure FDA0002843067510000033
D to reach the root level of the treeδ(si,pj) Gamma-distance in the range to gamma, h denotes a node
Figure FDA0002843067510000034
D to the leaf layerδ(si,pj) To gamma + Dδ(si,pj) In the range of-kγ -distance, γ representing the overall threshold;
in the created net tree, the nodes are connected
Figure FDA0002843067510000035
Root leaf path number of
Figure FDA0002843067510000036
Calculating the position-related number RP (i) of the node, by means of which node
Figure FDA0002843067510000037
Tree root path number of
Figure FDA0002843067510000038
Calculating the path branch number PB (i) of the node:
node point
Figure FDA0002843067510000039
The location related number RP (i) of (A) is that all nodes labeled i in the net tree
Figure FDA00028430675100000310
Root leaf path number of
Figure FDA00028430675100000311
The sum is calculated by equation (8):
Figure FDA00028430675100000312
in the formula (8), m represents the number of layers of the net tree,
Figure FDA00028430675100000313
representation node
Figure FDA00028430675100000314
The number of tree root leaf paths;
node point
Figure FDA00028430675100000315
The ancestor set of (1) is a set formed by nodes on all paths of the node to the root layer of the tree; the common ancestor set of a node set is a set formed by intersection of ancestor sets of all elements in the node set; node point
Figure FDA00028430675100000316
The path branch number PB (i) of (1) refers to all nodes labeled i in a common ancestor set A (D) of a node set D
Figure FDA00028430675100000317
Tree root path number of
Figure FDA00028430675100000318
The sum is calculated by equation (9):
Figure FDA00028430675100000319
in formula (9), l represents the depth of the common ancestor set A (D),
Figure FDA00028430675100000320
representation node
Figure FDA00028430675100000321
The number of tree root paths;
thirdly, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure:
the problem of one-time approximate pattern matching under the (delta, gamma) -distance is solved by utilizing the net trees constructed in the first step and the second step, specifically, one tree root leaf path in the net tree corresponds to one appearance, the appearance is searched, namely, the tree root leaf path meeting the local-overall constraint is searched in the net tree, based on the net tree structure, the tree root leaf path meeting the local-overall constraint is searched in the net tree by adopting a heuristic method, and the heuristic method is any one of the following four similar appearance searching strategies:
first, the right-most parent strategy and the greedy search parent strategy: the right-most parent strategy is to start from the last node of the leaf layer, select the right-most parent node of the current node under the condition of meeting the local-overall constraint, the greedy search parent strategy is to start from the last node of the leaf layer, and select the optimal parent node of the current node under the condition of meeting the local-overall constraint, wherein the optimal parent node is the parent node with smaller position correlation number, and when the position correlation numbers of the two parent nodes are the same, the parent node with larger path branch number is selected as the optimal parent node in the common ancestor set of the existing path nodes;
second, left-most and greedy search parent strategies: the left-most parent strategy is that a left-most parent node of a current node is selected preferentially from a first node of a leaf layer under the condition of meeting local-overall constraint, at the moment, greedy search is performed on the left-most parent strategy, namely the left-most parent node of the current node is selected from the first node of the leaf layer, and the optimal parent node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal parent node is the parent node with a small position correlation number;
third, the right-most child strategy and the greedy search child strategy: the rightmost child strategy is that a rightmost child node of a current node is selected preferentially from the last node of a tree root layer under the condition of meeting local-overall constraint, at the moment, child strategy is searched greedily from the last node of the tree root layer, and the optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with smaller position correlation number;
fourth, left-most child strategy and greedy search child strategy: the left-most child strategy is that a left-most child node of a current node is selected from a first node of a tree root layer under the condition of meeting local-overall constraint, at the moment, a child strategy is searched greedily from the first node of the tree root layer, and an optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with a small position correlation number;
thereby completing one-time approximate pattern matching at (delta, gamma) -distance by using the net tree structure;
outputting all occurrences of a given pattern P in said first step in a given sequence S in said first step, i.e. all occurrences in the output result set OCC (S, P), on a display, thereby completing a one-time approximate pattern matching with local-global constraints.
CN201910530672.6A 2019-06-19 2019-06-19 Disposable approximate pattern matching method with local-overall constraint Active CN110232140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910530672.6A CN110232140B (en) 2019-06-19 2019-06-19 Disposable approximate pattern matching method with local-overall constraint

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910530672.6A CN110232140B (en) 2019-06-19 2019-06-19 Disposable approximate pattern matching method with local-overall constraint

Publications (2)

Publication Number Publication Date
CN110232140A CN110232140A (en) 2019-09-13
CN110232140B true CN110232140B (en) 2021-03-02

Family

ID=67856066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910530672.6A Active CN110232140B (en) 2019-06-19 2019-06-19 Disposable approximate pattern matching method with local-overall constraint

Country Status (1)

Country Link
CN (1) CN110232140B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101702171A (en) * 2009-11-19 2010-05-05 新蛋信息技术(西安)有限公司 Approximating matching method for numerous character strings
US9679104B2 (en) * 2013-01-17 2017-06-13 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform
CN109271419B (en) * 2018-09-03 2021-06-22 河北工业大学 Online string matching method without gap constraint

Also Published As

Publication number Publication date
CN110232140A (en) 2019-09-13

Similar Documents

Publication Publication Date Title
Nguyen et al. Accurate and efficient structural characteristic feature extraction for clone detection
JP3672242B2 (en) PATTERN SEARCH METHOD, PATTERN SEARCH DEVICE, COMPUTER PROGRAM, AND STORAGE MEDIUM
Ehrenfeucht et al. A new distance metric on strings computable in linear time
US7287026B2 (en) Method of comparing the closeness of a target tree to other trees using noisy sub-sequence tree processing
Klein A subset spanner for planar graphs, with application to subset TSP
Ann et al. Efficient algorithms for the block edit problems
Kuosmanen et al. Using minimum path cover to boost dynamic programming on DAGs: co-linear chaining extended
CN111475551A (en) High average utility sequence pattern mining method under non-overlapping condition
CN110232140B (en) Disposable approximate pattern matching method with local-overall constraint
JP3630414B2 (en) Base sequence cluster generation system, base sequence cluster generation method, program for executing the cluster generation method, computer-readable recording medium storing the program, and base sequence information providing system
Chowdhary et al. An improved hyperbolic embedding algorithm
Lacki et al. Reachability in graph timelines
Korostensky et al. Near optimal multiple sequence alignments using a traveling salesman problem approach
CN113076356A (en) One-time conditional negative sequence mode mining method
CN110245167B (en) Non-overlapping approximate pattern matching method with local-overall constraint
Galbrun et al. Mining redescriptions with siren
CN110232084B (en) Approximate pattern matching method with local-global constraint
Gambette et al. Do branch lengths help to locate a tree in a phylogenetic network?
Hendrian et al. Online algorithms for constructing linear-size suffix trie
Li et al. Semi-dynamic shortest-path tree algorithms for directed graphs with arbitrary weights
Wu et al. NetNDP: Nonoverlapping (delta, gamma)-approximate pattern matching
Heibl et al. Package ‘ips’
Staus et al. Exact algorithms for group closeness centrality
Oehl A combinatorial approach for reconstructing rDNA repeats
Liu et al. A parallel lcs algorithm for biosequences alignment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant