CN110232140B - Disposable approximate pattern matching method with local-overall constraint - Google Patents
Disposable approximate pattern matching method with local-overall constraint Download PDFInfo
- Publication number
- CN110232140B CN110232140B CN201910530672.6A CN201910530672A CN110232140B CN 110232140 B CN110232140 B CN 110232140B CN 201910530672 A CN201910530672 A CN 201910530672A CN 110232140 B CN110232140 B CN 110232140B
- Authority
- CN
- China
- Prior art keywords
- node
- tree
- parent
- leaf
- gamma
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9014—Indexing; Data structures therefor; Storage structures hash tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
- G06F16/90344—Query processing by using string matching techniques
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a one-time approximate pattern matching method with local-overall constraint, which relates to the technical field of electric digital data processing and solves the one-time approximate pattern matching problem under (delta, gamma) -distance by utilizing a net tree structure. The method overcomes the defects that the prior art is difficult to consider generality, accuracy and flexibility of solution and the solution set needs to be non-redundant aiming at the problem of one-time approximate pattern matching with local-overall constraint.
Description
Technical Field
The technical scheme of the invention relates to the technical field of electric digital data processing, in particular to a disposable approximate pattern matching method with local-overall constraint.
Background
With the continuous development of internet technology, the scale of data is increasing dramatically, how to find valuable information from a large amount of data by using a data mining technology becomes a research hotspot, frequent pattern mining refers to finding frequently occurring patterns from a large amount of data, and the main task is pattern matching, because frequent pattern mining usually needs to calculate the support degree of a pattern, and the essence of the support degree calculation is the problem of pattern matching, the pattern matching is the basis and the core of the frequent pattern mining, and with the change of science and technology, the pattern matching technology has been widely applied in various fields, not only to simple time series matching, but also to simple biological series matching.
The pattern matching refers to a process of searching for a subsequence which is the same as or similar to the pattern P in the sequence S, and the development trend of the process mainly includes the following steps: from traditional pattern matching to pattern matching with gap constraints; from exact pattern matching to approximate pattern matching; and thirdly, from the mode matching under the no special condition to the mode matching under the disposable condition.
Compared with the traditional pattern matching, the gap constraint is introduced into the pattern matching problem, so that the problem is more difficult to solve, the influence of data noise can be reduced, and the flexibility of pattern matching is improved.
Example a. given sequence S ═ S1s2s3s4s5s6s7Cbcabc, mode P ═ P1[0,1]p2[0,1]p3=c[0,1]b[0,1]c。
Mode with gap constraint: p ═ P1[min1,max1]p2…[minj,maxj]pj+1…[minm-1,maxm-1]pmWherein, minjAnd maxjRespectively represent sub-patterns pjSum sub-pattern pj+1The minimum number and the maximum number of wildcards which can be matched with each other, and the interval [ minj,maxj]Called gap constraint, m denotes the length of the pattern P.
In example a, P for a given pattern1[0,1]p2[0,1]p3=c[0,1]b[0,1]c, with c [0,1]]b is for example, [0,1]]The method is called gap constraint, wherein 0 means that at least 0 wildcards can be matched between the sub-pattern c and the sub-pattern b, and 1 means that at most 1 wildcard can be matched between the sub-pattern c and the sub-pattern b; for ease of description of each occurrence, all the characters of the pattern P that occur in the sequence S are indicated by the subscripts on the positions of the corresponding characters in the sequence S, in this example, 2 occurrences of the given pattern P in the given sequence S are each<1,2,3>And<3,5,7>;<1,2,3>denotes a subsequence s1s2s3And mode p1p2p3Identical, wherein the character s1And character s2Has no character in between, satisfies the gap constraint of 0,1]The character s2And character s3There is no character in between, and the gap constraint [0,1] is satisfied];<3,5,7>Denotes a subsequence s3s5s7And mode p1p2p3Identical, wherein the character s3And character s51 character in between, satisfy the gap constraint [0,1]]The character s5And character s7There are also 1 character in between, satisfying the gap constraint [0,1](ii) a It follows that "? And "", the gap constraint is more flexible.
The pattern matching with the gap constraint is accurate pattern matching and approximate pattern matching according to the matching type, the accurate pattern matching is not allowed to have errors, but noise is introduced in the actual data acquisition and preprocessing processes, the accurate pattern matching cannot obtain effective information at the moment, the pattern matching in the approximate sense is needed, the approximate pattern matching allows errors, and more valuable information can be found compared with the accurate pattern matching.
The measurement distance is a key of approximate pattern matching, the Hamming distance is a commonly used measurement distance and refers to the number of different characters at corresponding positions of two sequences with equal length, but the Hamming distance cannot measure the local approximation degree between the sequences, so that the sequences are not similar integrally, and larger noise is introduced, the other measurement distance is a (delta, gamma) -distance initially used for music information retrieval, and the problem that the Hamming distance introduces noise can be solved.
Example b. a given pattern P is b [0,1] d [0,1] b.
Fig. 1 of the drawings of the specification shows that (1) in fig. 1 coincides with a given pattern P without gaps; the (2), (3) in fig. 1 have a gap constraint and can be matched exactly to a given pattern P.
Using Hamming distance as the metric distance for a given pattern P: giving two sequences X and Y with equal length, wherein the Hamming distance between the sequences X and Y is the number of different characters at corresponding positions; the (4), (5), (6) in fig. 1 cannot exactly match the given pattern P and can approximately match the given pattern P when the threshold of the Hamming distance is 1, but the shaded segments all have great deviation from the corresponding segments of the given pattern P, so that the (4), (5), (6) in fig. 1 are not similar to the (1) in fig. 1 as a whole.
Using (δ, γ) -distance as the metric distance for a given pattern P: delta-distance: given any two characters x and y in the character set Σ, the δ -distance between the characters x and y is | x-y |, denoted Dδ(x, y); γ -distance: given two sequences of equal length X ═ X1x2…xnAnd the sequence Y ═ Y1y2…ynThe gamma-distance of sequence X and sequence Y beingWherein i is not less than 1 and not more than n, and is denoted as Dγ(X, Y), wherein n represents the length of the sequence.
Given two sequences of equal length X ═ X1x2…xnAnd the sequence Y ═ Y1y2…ynGiven a local threshold δ and an overall threshold γ, when:
(1)then sequence X and sequence Y are said to satisfy a local proximity constraint, referred to as local constraint for short, and denoted as X ═δY;
(2)Dγ(X, Y) is less than or equal to γ, then sequence X and sequence Y are said to satisfy the global proximity constraint, which is called global constraint for short, and is denoted as X ═ YγY;
(3)X=δY and X ═γY, then sequence X and sequence Y satisfy both local constraint and global constraint, referred to as local-global constraint, denoted X ═(δ,γ)Y;
(7), (8), (9) in fig. 1 cannot exactly match the given pattern P, but can approximately match the given pattern P at a (δ, γ) -distance, where (7), (8) in fig. 1 can approximately match the given pattern P when the local threshold δ is 1 and the global threshold γ is 1, and both have good similarity to (1) in fig. 1 as a whole, illustrating that the δ -distance can measure the local similarity between sequences such that the local constraint is satisfied between the sequences; fig. 1(9) cannot approximately match a given pattern P when the local threshold δ is 1 and the global threshold γ is 1, but can approximately match the given pattern P when the local threshold δ is 1 and the global threshold γ is 2, which illustrates that γ -distance can measure the global approximation between sequences so that the global constraint is satisfied between sequences.
FIG. 1 of the drawings of the specification shows that global dissimilarity between sequences is caused by the fact that Hamming distance cannot measure local proximity between sequences; the (delta, gamma) -distance can measure the local approximation degree and the overall approximation degree between the sequences at the same time, and the overall similarity between the sequences is ensured.
In the pattern matching problem with gap constraint, when no constraint condition is applied, that is, under no special condition, characters in the sequence S are allowed to appear for multiple times of repeated use, the number of the appearance of the pattern P in the sequence S is exponential, but in practical application, each character in the sequence S represents an event occurring at a certain time or a certain time period, and in order to avoid counting the repetition thereof, it is more reasonable to apply a one-time condition constraint, and the following example C adopts the sequence S and the pattern P in the example a to describe in detail the (δ, γ) -approximate pattern matching under the one-time condition.
Example c. given sequence S ═ S1s2s3s4s5s6s7Cbcabc, mode P ═ P1[0,1]p2[0,1]p3=c[0,1]b[0,1]c, the local threshold δ is 1, and the global threshold γ is 1.
The following occurs: given sequence S ═ S1s2...snThe pattern P ═ P1[min1,max1]p2…[minm-1,maxm-1]pmLocal threshold δ and global threshold γ, when m integers l are present1,l2,…,lmThe following two conditions are satisfied simultaneously:
(1)1≤l1<l2<…<lm≤n,minj≤lj+1-lj-1≤maxjwherein j is more than or equal to 1 and less than or equal to m-1;
Pattern matching without special conditions means that characters at any position in the sequence S can be repeatedly used for a plurality of times, and FIG. 2 of the attached drawings of the specification shows that under the special conditions, 4 occurrences of a given pattern P in a given sequence S in example A are respectively<1,2,3>、<3,5,6>、<3,5,7>And<5,6,7>(ii) a For the<3,5,6>In other words, the subsequence s3s5s6And mode p1p2p3Not all of them, where the character s6Sub-pattern p ═ b3C, b ≠ c, but Dδ(s2,p2)=|s2-p 21 ≦ δ ≦ 1 ≦ a-b |, so that the subsequence s3s5s6And mode p1p2p3Satisfies a local constraint, i.e. s3s5s6=δp1p2p3While D isγ(s3s5s6,p1p2p3)=|s3-p1|+|s5-p2|+|s6-p 31 | ═ 0+0+1 ≦ γ ═ 1, then subsequence s3s5s6And mode p1p2p3Satisfies an overall constraint, i.e. s3s5s6=γp1p2p3Thus, therefore, it is<3,5,6>Is the occurrence of a given pattern P in a given sequence S, i.e. S3s5s6=(δ,γ)p1p2p3(ii) a For the<5,6,7>In other words, the subsequence s5s6s7And mode p1p2p3Identical, the gamma-distance between the two is 0, the delta-distance between the corresponding characters is also 0,<5,6,7>is the occurrence of an exact match; therefore, the exact match is a special case of the approximate match, and the approximate match is more general than the exact match; pattern matching in the one-time condition means that a character at any position in the sequence S can only be used once at most, and in the one-time condition, the given pattern P in example A has 2 occurrences in the given sequence S, respectively<1,2,3>And<5,6,7>when selected, is<1,2,3>Then it cannot select<3,5,6>And<3,5,7>because of the character s3Has been already covered<1,2,3>The use is carried out; therefore, compared with the mode matching under no special condition, the mode matching under the one-time condition can not only reduce the result set and simplify the calculation, but also avoid the repeated use of characters.
The problem of pattern matching generally requires to have generality, accuracy and flexibility, and the solution set needs to be non-redundant, which can reduce the difficulty of data analysis and processing, but the existing related technologies are difficult to simultaneously meet the conditions, for example, the document published by wu et al, "a heuristic method for solving the MPMGOOC problem, the computer reports" the pattern matching with gap constraint and one-time condition is researched based on the net tree structure, the document provides a heuristic method, namely, the right-most biparental strategy and the greedy search biparental strategy are used for finding the optimal occurrence, the quality of the solution is improved compared with other methods, and the method has reference value for solving other complex problems, but the method researches accurate pattern matching, cannot process the data noise problem, and has no generality; he et al, published in the literature "SAIL-APPROX: An effective On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints, ieee." by dynamic programming, have studied the one-time Approximate Pattern Matching of band-pass allocators and Length Constraints and have proven the correctness and effectiveness of the proposed method, but the literature has studied the Approximate Pattern Matching at Hamming distance, the Hamming distance has not considered the local Constraints between sequences, which can cause huge bias when measuring time sequences, without having accuracy; the document "Approximate pattern matching with gap constraint", Journal of Information science ", published by down et al, studies an Approximate pattern matching with gap constraint, and proposes an efficient solution method based on a single net tree, which can find more valuable Information in many fields than the precise pattern matching, but the document studies the Approximate pattern matching without special conditions, and the special conditions allow characters at any position in the sequence to be used for many times, so that the number of the patterns appearing in the sequence grows exponentially along with the length of the patterns, and the complexity of the processing problem is increased; liu et al, "an improved BM pattern matching method, computer engineering" on the basis of BM method, through judging whether there are consecutive characters in the pattern, and then change the comparative order of the pattern, improved the matching efficiency of BM method, but this document matches and is consecutive characters, there is no interval constraint, lack the flexibility.
In summary, for the one-time approximate pattern matching problem with local-global constraint, the existing one-time approximate pattern matching technology hardly considers the generality, accuracy and flexibility of solution, and the solution set needs to be non-redundant, and a good method for solving the problem does not exist so far.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method realizes the one-time approximate pattern matching under the (delta, gamma) -distance under the condition of gap constraint, and overcomes the defects that the prior art is difficult to consider the generality, the accuracy and the flexibility of solution and the solution set needs to be non-redundant aiming at the one-time approximate pattern matching problem with the local-integral constraint.
The technical scheme adopted by the invention for solving the technical problem is as follows: the disposable approximate pattern matching method with local-overall constraint solves the disposable approximate pattern matching problem under the (delta, gamma) -distance by utilizing a net tree structure, and comprises the following specific steps:
step one, determining the layer number of the net tree:
reading in a given sequence S, determining the length n, each character in the sequence S being marked as a character S1S character2…, character snReading in a given pattern P, determining its length as m, and recording each sub-pattern of the pattern P as a sub-pattern P1Sub-mode p2…, submode pmHere 0<m<n; determining the number of layers of the net tree according to the number of the sub-modes in the given mode P, wherein the net tree has m layers which are respectively marked as a 1 st layer, a 2 nd layer, … and an m th layer;
Given a local threshold value δ and a global threshold value γ, where 0 ≦ δ ≦ γ, and creating a net tree in which the node labeled i in level j is designated as the node S in the first step and the pattern P in the first stepUnlike exact pattern matching, each node in a net tree created by one-time approximate pattern matching with local-global constraintsCharacter s to be recordediAnd sub-pattern pjDelta-distance D betweenδ(si,pj) And is combined withAnd each nodeTree root path array with size of gamma +1To describe the nodeNumber of tree root paths to reach tree root levelArray subscript 0,1, …, γ, current nodeAmong the paths to the root layer, the number of root paths with a distance of γ d is represented asInitializing the alpha-gamma-beta-gamma to 0, wherein d is more than or equal to 0 and less than or equal to gamma; during the process of creating the net tree, according to each nodeTree root path array ofInvalid parent relations and invalid nodes in the net tree can be judged and deleted in advance; for the same reason, each nodeIt is also desirable to use an array of leaf paths of size γ +1To describe the nodeNumber of leaf paths to leaf layerArray subscript 0,1, …, γ, current nodeThe number of leaf paths with a distance of y-d in the path to the leaf layer is indicated asIt is also initialized to 0, where d is 0. ltoreq. gamma;
the specific treatment method comprises the following steps:
for a given sequence S of characters S in the first step aboveiProcessing and calculating in turn the submode P in the given pattern P in the first step above1Sub-mode p2…, submode pmDelta-distance D betweenδ(si,pj) Wherein j is more than or equal to 1 and less than or equal to m:
when D is presentδ(si,pj)>δ, where 1 ≦ j ≦ m, without creating nodes at level j of the created net tree
When D is presentδ(si,pj) And ≦ δ, where j ≦ 1, creating nodes directly at the root level of the created mesh treeAnd calculating the node by formula (1)Tree root path array of
In the formula (1), d represents a nodeGamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,p1) Representing a character siAnd sub-pattern p1Delta-distance therebetween;
when D is presentδ(si,pj) δ is not more than 1<If j is less than or equal to m, then a node is created in the jth layer of the created net treeAnd judging whether the node exists in all the nodes of the j-1 th layer of the created net treeSatisfies the gap constraint [ min ]j-1,maxj-1]Parent node (c): when there is no tie pointSatisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is deleted, the node is deletedWhen existing and nodeSatisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is the node, the parent node is recorded asAnd continuously judging the parent node by the following methodAnd nodeWhether the parent relationship can be established:
when in useThen the parent nodeAnd nodeCan establish a parent relationship between them, and calculate the nodes by the formula (2)Tree root path array of
In the formula (2), d represents a nodeGamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,representing the union node in the j-1 th layer of the net treeSatisfies the gap constraint [ min ]j-1,maxj-1]And can be connected to a nodeEstablishing parent nodes of parent relations;
traversing all nodes of the j-1 th layer of the net tree in sequence, and calculating the nodesTree root path array ofWhen in useThen the node is deleted
After each character in the given sequence S in the first step is processed, a net tree is created;
in the created net tree, each node of the net tree is calculatedArray of leaf pathsNode pointArray of leaf pathsStarting from the leaf layer:
In the formula (3), d represents a nodeGamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pm) Representing a character siAnd sub-pattern pmDelta-distance therebetween;
for net tree j level nodeWherein j is more than or equal to 1<m, when there is a child node, the child node is recorded asAnd calculating the node by formula (4)Array of leaf paths
In the formula (4), d represents a nodeGamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,representation nodeThe child node of (2);
According to the nodeTree root path array ofCalculating the node by the formula (5)Tree root path number ofAccording to the nodeArray of leaf pathsCalculating the node by the formula (6)Number of leaf paths of
In the formula (5), d represents a nodeA γ -distance in the range of 0 to γ to the root layer of the tree, γ representing the overall threshold;
in the formula (6), d represents a nodeA γ -distance in the range of 0 to γ to the leaf layer, γ representing the overall threshold;
according to the nodeTree root path array ofAnd leaf path arrayCalculating the node by the formula (7)Root leaf path number of
In the formula (7), k represents a nodeD to reach the root level of the treeδ(si,pj) Gamma-distance in the range to gamma, h denotes a nodeD to the leaf layerδ(si,pj) To gamma + Dδ(si,pj) -a distance γ in the range of k, γ representing the overall threshold;
in the created net tree, the nodes are connectedRoot leaf path number ofCalculating the position-related number RP (i) of the node, by means of which nodeTree root path number ofCalculating the path branch number PB (i) of the node:
node pointThe location related number RP (i) of (A) is that all nodes labeled i in the net treeRoot leaf path number ofThe sum is calculated by equation (8):
in the formula (8), m represents the number of layers of the net tree,representation nodeThe number of tree root leaf paths;
node pointThe ancestor set of (1) is a set formed by nodes on all paths of the node to the root layer of the tree; the common ancestor set of a node set is a set formed by intersection of ancestor sets of all elements in the node set; node pointThe path branch number PB (i) of (1) refers to all nodes labeled i in a common ancestor set A (D) of a node set DTree root path number ofThe sum is calculated by equation (9):
in formula (9), l represents the depth of the common ancestor set A (D),representation nodeThe number of tree root paths;
thirdly, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure:
the problem of one-time approximate pattern matching under the (delta, gamma) -distance is solved by utilizing the net trees constructed in the first step and the second step, specifically, one tree root leaf path in the net tree corresponds to one appearance, the appearance is searched, namely, the tree root leaf path meeting the local-overall constraint is searched in the net tree, based on the net tree structure, the tree root leaf path meeting the local-overall constraint is searched in the net tree by adopting a heuristic method, and the heuristic method is any one of the following four similar appearance searching strategies:
first, the right-most parent strategy and the greedy search parent strategy: the right-most parent strategy is to start from the last node of the leaf layer, select the right-most parent node of the current node under the condition of meeting the local-overall constraint, the greedy search parent strategy is to start from the last node of the leaf layer, and select the optimal parent node of the current node under the condition of meeting the local-overall constraint, wherein the optimal parent node is the parent node with smaller position correlation number, and when the position correlation numbers of the two parent nodes are the same, the parent node with larger path branch number is selected as the optimal parent node in the common ancestor set of the existing path nodes;
second, left-most and greedy search parent strategies: the left-most parent strategy is that a left-most parent node of a current node is selected preferentially from a first node of a leaf layer under the condition of meeting local-overall constraint, at the moment, greedy search is performed on the left-most parent strategy, namely the left-most parent node of the current node is selected from the first node of the leaf layer, and the optimal parent node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal parent node is the parent node with a small position correlation number;
third, the right-most child strategy and the greedy search child strategy: the rightmost child strategy is that a rightmost child node of a current node is selected preferentially from the last node of a tree root layer under the condition of meeting local-overall constraint, at the moment, child strategy is searched greedily from the last node of the tree root layer, and the optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with smaller position correlation number;
fourth, left-most child strategy and greedy search child strategy: the left-most child strategy is that a left-most child node of a current node is selected from a first node of a tree root layer under the condition of meeting local-overall constraint, at the moment, a child strategy is searched greedily from the first node of the tree root layer, and an optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with a small position correlation number;
thereby completing one-time approximate pattern matching at (delta, gamma) -distance by using the net tree structure;
outputting all occurrences of a given pattern P in said first step in a given sequence S in said first step, i.e. all occurrences in the output result set OCC (S, P), on a display, thereby completing a one-time approximate pattern matching with local-global constraints.
The above-mentioned one-time approximate pattern matching method with local-global constraint uses VC + +6.0 as programming software, Visio2013 as drawing tool, and uses Pentium (R) Dual-Core 32Processor + as Processor, and Windows7 as operating system and above, and the software and hardware environment are all known to those skilled in the art.
The invention has the beneficial effects that: compared with the prior art, the invention has the prominent substantive characteristics as follows:
(1) the method of the invention researches the disposable approximate pattern matching with local-overall constraint through the net tree structure based on the (delta, gamma) -distance, and solves the problem of the traditional HaReading in given sequence S and given pattern P, determining the number of net tree layers matched with the submode number of given pattern P, creating a net tree according to input conditions, pre-judging and deleting invalid parent relationship and invalid nodes in the process of creating net tree, and calculating the residual nodesTree root path array ofAnd leaf path arrayFinally, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure;
(2) CN107239500A discloses a character string matching method and system, relating to the field of character matching, wherein the system realizes the matching of character strings by defining several modules and orderly connecting together, the invention researches the character string accurate matching without special conditions and does not allow gaps among characters, while the invention researches the approximate pattern matching under one-time conditions and allows gaps among characters, which is the maximum substantive difference between the two;
(3) CN108647299A discloses a rarely-used character matching method, a character string pattern matching method and a storage medium, the invention firstly extracts rarely-used characters in the pattern, then judges whether the rarely-used characters exist in the text, and finally judges whether the pattern is equal to the text segment containing the rarely-used characters, thereby realizing the rarely-used character matching, the invention researches the accurate pattern matching of the character string without special conditions, and does not allow gaps to exist among the characters, while the invention researches the approximate pattern matching under the one-time condition, and allows gaps to exist among the characters, which is the maximum substantive difference between the two;
(4) CN106446062A discloses a retrieval system and method of continuous characters and fuzzy characters, which realizes double matching of continuous character string pattern matching and fuzzy character matching, the invention researches continuous character string matching without special conditions, and does not allow gaps among characters, while the invention researches pattern matching under one-time conditions, and allows gaps among characters, which is the maximum substantive difference between the two;
(5) CN103425739A discloses a fast string matching method, which obtains a simple hash table of each character by preprocessing a target string, and then implements fast matching of the string through the hash table, but the invention researches continuous string matching without special conditions and does not allow gaps between characters, while the invention researches pattern matching under one-time conditions and allows gaps between characters, which is the most substantial difference between the two;
(6) CN107229759A discloses a character string pattern matching method, which researches the accurate pattern matching without special condition, the result set is redundant and the anti-interference performance is poor, but researches the approximate pattern matching under one-time condition, which not only simplifies the result set, but also can find more valuable information compared with the accurate pattern matching, which is the maximum substantive difference between the two;
(7) CN109408681 discloses a character string matching method, which firstly divides a target character string and a character string to be matched into sub-character strings with preset lengths respectively, then distributes a hash value for each seed character string in a sub-character string set, and finally calculates the similarity between the target character string and the character string to be matched according to the hash sequence, when the similarity is larger than a preset similarity threshold value, the target character string and the character string to be matched are determined to be matched, the invention needs to divide the character strings in advance and cannot measure the local similarity between the sequences, but the invention adopts the (delta, gamma) -distance to carry out approximate measurement on the sequences, gives consideration to the local similarity and the overall similarity, does not need to divide the sequences in advance, and only needs to traverse the sequences once, namely completes the approximate pattern matching under the (delta, gamma) -distance;
(8) CN109165326 discloses a method and apparatus for matching character strings, the method and apparatus determine fields corresponding to each word contained in a character string by segmenting the character string, further determine matching degree between character strings according to weight value of each field, when the matching degree is greater than a preset threshold, consider two character strings to be matched, the invention performs approximate measurement on the sequences by weight value, although the invention is also an approximate match, the invention cannot measure local approximation degree between the sequences, and does not allow gaps between the characters, but the invention performs approximate measurement on the sequences by (δ, γ) -distance, can simultaneously measure local approximation degree and overall approximation degree between the sequences, and allows gaps between the characters, which is the maximum substantial difference between the two;
(9) CN108920361A discloses a string matching code similarity detection method, which detects the similarity of the code structure by the method of the longest common substring, but can not measure the local similarity between sequences, but adopts the (delta, gamma) -distance to carry out approximate measurement on the sequences, and gives consideration to the local similarity and the overall similarity, which is the maximum substantive difference between the two;
(10) CN108920483A discloses a character string fast matching method based on suffix array, which firstly utilizes binary search to limit the occurrence position within an interval, then further excludes suffixes with length smaller than the pattern and different from the last character of the pattern on the interval, thereby realizing fast matching of the character string, but the invention researches accurate pattern matching without special conditions and does not allow gaps between the characters, while the invention researches approximate pattern matching under one-time conditions and allows gaps between the characters, which is the maximum substantive difference between the two;
(11) CN108255836A discloses a character string matching method and device, the invention calculates the matching value of key character strings, and calculates the first edit distance between character strings by using the preset rule, finally obtains the similarity according to the first edit distance and the matching value, but the edit distance can not measure the local approximation between sequences, but the invention adopts (delta, gamma) -distance to carry out approximate measurement to the sequences, and gives consideration to the local approximation and the overall approximation, which is the maximum substantial area of the two;
(12) CN106980620A discloses a method and a device for matching Chinese character strings, the invention firstly judges whether characters on the same positions of two strings are similar characters or similar pronunciation characters, then obtains the editing distance between the characters on the same positions according to the preset editing distance rule for calculating the similar characters or the similar pronunciation characters, and finally matches the strings according to the editing distance, but the editing distance can not measure the local similarity between the sequences, resulting in the overall dissimilarity between the sequences.
Compared with the prior art, the method has the following remarkable progress:
(1) the method researches the pattern matching with the gap constraint, and in the problem of the pattern matching, due to the introduction of the gap constraint, although the problem is difficult to solve, the method is more flexible and changeable and has wider application range, and a user can flexibly set the size and the length of the gap constraint according to specific conditions so as to better meet the requirements of the user;
(2) the method is based on (delta, gamma) -distance, and researches the approximate pattern matching with local-overall constraint through a net tree structure, wherein the delta-distance ensures that the sequences meet the local constraint, the gamma-distance ensures that the sequences meet the overall constraint, and the (delta, gamma) -distance simultaneously measures the local approximation and the overall approximation between the sequences, thereby widening the application field, reducing the matching error and having important research value;
(3) the method researches approximate pattern matching under a one-time condition, in the problem of pattern matching with clearance constraint, a new appearance can be generated as long as the appearance position of a sub-pattern is changed, the pattern matching under no special condition allows characters at each position in a sequence to be repeatedly used for many times, so that a solution space is directly exponential, and the pattern matching under the one-time condition requires that the characters at each position in the sequence can be used only once at most, so that a result set is reduced, calculation is simplified, and the repeated use of the characters is avoided;
(4) the embodiment is simple time sequence matching application description, wherein the sequence S represents a character type time sequence, namely a result after a numerical type time sequence is symbolized, and the pattern P reflects a data change rule of a period of time and can predict the trend of future data.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a comparative illustration of example B, time series measured by Hamming distance and (δ, γ) -distance, where: fig. 1(1) is bdb, fig. 1(2) is badb, fig. 1(3) is baddb, fig. 1(4) is eaddb, fig. 1(5) is bafdb, fig. 1(6) is badde, fig. 1(7) is aaddb, fig. 1(8) is bacdb, and fig. 1(9) is bacdc.
Fig. 2 shows all occurrences of a given pattern P in example C in a given sequence S.
FIG. 3 is a schematic flow chart of the computer processing method of the present invention.
Fig. 4 shows the number of net tree layers determined according to the number of sub-patterns in a given pattern P in embodiment 1 of the present invention.
Fig. 5 shows a net tree created in example 1 of the present invention.
FIG. 6 is the net tree of FIG. 5 with nodes in the leaf level that cannot be reached, invalid parent relationships, and invalid nodes removed.
FIG. 7 is a flowchart of embodiment 1 of the present invention for calculating each node in the net tree shown in FIG. 6Array of leaf paths
FIG. 8 shows the deletion of FIG. 6 according to embodiment 1 of the present invention<4,6,7,8>And all nodes with the same label, recalculating each of the other nodesTree root path array of
FIG. 9 shows the deletion of FIG. 6 according to embodiment 1 of the present invention<1,4,7,8>And all nodes with the same label, recalculating each of the other nodesTree root path array of
Fig. 10 is the net tree of fig. 9 with all nodes having the same label as that of <1,4,7,8> removed.
Detailed Description
The embodiment of FIG. 1 shows that the gapless pattern P of FIG. 1(1) bdb is consistent with example B; the badb in fig. 1(2) and the baddb in fig. 1(3) introduce gap constraint, which can process noise and correct phase difference; fig. 1(4) eaddb, fig. 1(5) bafdb and fig. 1(6) badde cannot exactly match the given pattern P in example B, and although it can approximately match the given pattern P in example B when the threshold of Hamming distance is 1, they are not similar to and greatly deviated from fig. 1(1) bdb as a whole, wherein "e" in fig. 1(4) eaddb is greatly deviated from "B", and "f" in fig. 1(5) bafdb is greatly deviated from "d", and "e" in fig. 1(6) badde is greatly deviated from "B"; fig. 1(7) aaddb, fig. 1(8) bacdb, and fig. 1(9) bacdc cannot exactly match the given pattern P in example B, but can approximately match the given pattern P in example B at a (δ, γ) -distance, where fig. 1(7) aaddb and fig. 1(8) bacdb can approximately match the given pattern P in example B at a local threshold δ of 1 and an overall threshold γ of 1, and have good similarities with fig. 1(1) bdb as a whole; fig. 1(9) bacdc cannot approximately match the given pattern P in example B when the local threshold δ is 1 and the global threshold γ is 1, and can approximately match the given pattern P in example B when the local threshold δ is 1 and the global threshold γ is 2; as described above, Hamming distance cannot measure local approximation between sequences, and (δ, γ) -distance can measure both local and overall approximation between sequences, thereby ensuring overall similarity between sequences.
The embodiment shown in fig. 2 shows that a given pattern P in example C occurs in a given sequence S in total of 4, wherein 'S' represents the sequence S, and 'C', 'b', 'C', 'a', 'b', 'C' are respectively 7 characters in the given sequence S, and the 7 characters are respectively denoted by '1', '2', '3', '4', '5', '6', and '7' corresponding position indices; the 1 st, 2 nd, 3 rd, and 4 th occurrences of a given pattern P in a given sequence S are denoted by their respective position indices of the corresponding characters in the given sequence S, so that the 4 occurrences of the given pattern P in the given sequence S are <1,2,3>, <3,5,6>, <3,5,7> and <5,6,7>, respectively, and of the 4 occurrences, including occurrences with both delta-and gamma-distances of 0, illustrate that exact pattern matching is a special case of approximate pattern matching, which is more general.
FIG. 3 is a flow of the computer processing employed by the method of the present invention: 1) start → 2) reading in the sequence S, pattern P, local threshold δ and global threshold γ → 3) creating a net tree according to the input conditions and calculating the nodesTree root path array ofAnd leaf path array) Starting from the last node of the leaf level, looking for occurrences in the net tree from right to left → 5) finding the occurrence C1 according to the right-most parent strategy, finding the occurrence C2 → 6) selecting the occurrence with the larger number of remaining occurrences and deleting the occurrence, and the node with the same label as the occurrence → 7) computing the net tree remaining nodes → 5)Tree root path array ofAnd leaf path array) Judging whether the first node of the leaf level is reached, if yes, executing step 9, and if no, executing step 4 → 9) outputting all occurrences of the pattern P in the sequence S on the display → 10) ending.
Example 1
A time sequence is given, a sequence S signed by an SAX (symbol aggregation approximation) method is aababccc, a pattern P is given, a [0,2] b [0,2] c [ 1] is given, a local threshold δ is given, and an overall threshold γ is given, 1.
Step one, determining the layer number of the net tree:
reading a given sequence S, which has a length of 8, and a given pattern P, which is a [0,2]]b[0,2]c[0,2]c, the length of which is 4, and each sub-mode of the pattern P is respectively marked as a sub-mode P1A, sub-pattern p2B, sub-pattern p3C, sub-pattern p4C, 4 sub-patterns are provided, and the net tree has 4 layers which are respectively marked as the 1 st layer, the 2 nd layer, the 3 rd layer and the 4 th layer;
Given a local threshold value δ and a global threshold value γ, where 0 ≦ δ ≦ γ, and creating a net tree in which the node labeled i in level j is designated as the node S in the first step and the pattern P in the first stepUnlike exact pattern matching, each node in a net tree created by one-time approximate pattern matching with local-global constraintsCharacter s to be recordediAnd sub-pattern pjDelta-distance D betweenδ(si,pj) And each nodeTree root path array with size of gamma +1To describe the nodeNumber of tree root paths to reach tree root levelArray subscript 0,1, …, γ, current nodeAmong the paths to the root layer, the number of root paths with a distance of γ d is represented asInitializing the alpha-gamma-beta-gamma to 0, wherein d is more than or equal to 0 and less than or equal to gamma; during the process of creating the net tree, according to each nodeTree root path array ofInvalid parent relations and invalid nodes in the net tree can be judged and deleted in advance; for the same reason, each nodeIt is also desirable to use an array of leaf paths of size γ +1To describe the nodeNumber of leaf paths to leaf layerArray subscript 0,1, …, γ, current nodeThe number of leaf paths with a distance of y-d in the path to the leaf layer is indicated asIt is also initialized to 0, where d is 0. ltoreq. gamma;
the specific treatment method comprises the following steps:
for a given sequence S of characters S in the first step aboveiProcessing and calculating in turn the submode P in the given pattern P in the first step above1Sub-mode p2…, submode pmDelta-distance D betweenδ(si,pj) Wherein j is more than or equal to 1 and less than or equal to m:
when D is presentδ(si,pj)>δ, where 1 ≦ j ≦ m, without creating nodes at level j of the created net tree
When D is presentδ(si,pj) And ≦ δ, where j ≦ 1, creating nodes directly at the root level of the created mesh treeAnd calculating the node by formula (1)Tree root path array of
In the formula (1), d represents a nodeGamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,p1) Representing a character siAnd sub-pattern p1Delta-distance therebetween;
when D is presentδ(si,pj) δ is not more than 1<If j is less than or equal to m, then a node is created in the jth layer of the created net treeAnd judging whether the node exists in all the nodes of the j-1 th layer of the created net treeSatisfies the gap constraint [ min ]j-1,maxj-1]Parent node (c): when there is no tie pointSatisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is deleted, the node is deletedWhen existing and nodeSatisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is the node, the parent node is recorded asAnd continuously judging the parent node by the following methodAnd nodeWhether the parent relationship can be established:
when in useThen the parent nodeAnd nodeCan establish a parent relationship between them, and calculate the nodes by the formula (2)Tree root path array of
In the formula (2), d represents a nodeGamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,representing the union node in the j-1 th layer of the net treeSatisfies the gap constraint [ min ]j-1,maxj-1]And can be connected to a nodeEstablishing parent nodes of parent relations;
traversing all nodes of the j-1 th layer of the net tree in sequence, and calculating the nodesTree root path array ofWhen in useThen the node is deleted
After each character in the given sequence S in the first step is processed, a net tree is created;
the specific implementation process of the steps is as follows:
1) for the 1 st character s1A, because Dδ(s1,p1) 0 ≦ δ, character s1And sub-pattern p1Is an exact match, when j equals 1, thus creating a nodeCalculating a node by equation (1)Tree root path array ofIs {1,0 }; due to Dδ(s1,p2) 1 ≦ δ, character s1And sub-pattern p2Is an approximate match, when j is 2, thus creating a nodeAbsence of tie points at level 1Parent nodes satisfying gap constraints, thus deleting nodesIn the same way, the nodes are deletedAnd node
2) For the 2 nd character s2A, because Dδ(s2,p1) 0 ≦ δ, character s2And sub-pattern p1Is an exact match, when j equals 1, thus creating a nodeCalculating a node by equation (1)Tree root path array ofIs {1,0 }; due to Dδ(s2,p2) 1 ≦ δ, character s2And sub-pattern p2Is an approximate match, when j is 2, thus creating a nodePresence and junction at layer 1Parent node satisfying gap constraintDue to the fact thatThen at the parent nodeAnd nodeEstablishing a parent relationship between the two nodes, and calculating the nodes according to the formula (2)Tree root path array ofIs {0,1 }; due to Dδ(s2,p3)=|a-c|=2>δ, therefore no creation of nodes is requiredSimilarly, no nodes need to be created
3) For the 3 rd character s3B, because Dδ(s3,p1) 1 ≦ δ, character s3And sub-pattern p1Is an approximate match, when j equals 1, thus creating a nodeCalculating a node by equation (1)Tree root path array ofIs {0,1 }; due to Dδ(s3,p2) 0 ≦ δ, character s3And sub-pattern p2Is an exact match, when j is 2, thus creating a nodePresence and junction at layer 1Parent node satisfying gap constraintAnd parent nodeDue to the fact thatThus the parent nodeAnd parent nodeCan be connected with the nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {2,0 }; due to Dδ(s3,p3) 1 ≦ δ, character s3And sub-pattern p3Is an approximate match, when j is 3, thus creating a nodePresence and junction at layer 2Parent node satisfying gap constraintDue to the fact thatThus the parent nodeAnd nodeCan not establish a parent relationship between the two, becauseThus deleting nodesDue to Dδ(s3,p4) 1 ≦ δ, character s3And (2)Mode p4Is an approximate match, when j is 4, thus creating a nodeAbsence of tie points at layer 3Parent nodes satisfying gap constraints, thus deleting nodes
4) For the 4 th character s4A, because Dδ(s4,p1) 0 ≦ δ, character s4And sub-pattern p1Is an exact match, when j equals 1, thus creating a nodeCalculating a node by equation (1)Tree root path array ofIs {1,0 }; due to Dδ(s4,p2) 1 ≦ δ, character s4And sub-pattern p2Is an approximate match, when j is 2, thus creating a nodePresence and junction at layer 1Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact thatAndthus the parent nodeAnd parent nodeCan be connected with the nodeEstablishing parent relationship, parent nodeCan not be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {0,2 }; due to Dδ(s4,p3)=|a-c|=2>δ, therefore no creation of nodes is requiredSimilarly, no nodes need to be created
5) For the 5 th character s5B, because Dδ(s5,p1) 1 ≦ δ, character s5And sub-mouldFormula p1Is an approximate match, when j equals 1, thus creating a nodeCalculating a node by equation (1)Tree root path array ofIs {0,1 }; due to Dδ(s5,p2) 0 ≦ δ, character s5And sub-pattern p2Is an exact match, when j is 2, thus creating a nodePresence and junction at layer 1Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact thatAndthus the parent nodeParent nodeAnd parent nodeAnd can both engage with the junctionEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {2,1 }; due to Dδ(s5,p3) 1 ≦ δ, character s5And sub-pattern p3Is an approximate match, when j is 3, thus creating a nodePresence and junction at layer 2Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact thatAndthus the parent nodeAnd parent nodeCan not be connected with the nodeEstablishing parent relationship, parent nodeCan be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {0,2 }; due to Dδ(s5,p4) 1 ≦ δ, character s5And sub-pattern p4Is an approximate match, when j is 4, thus creating a nodeAbsence of tie points at layer 3Parent nodes satisfying gap constraints, thus deleting nodes
6) For the 6 th character s6C, since Dδ(s6,p2) 1 ≦ δ, character s6And sub-pattern p2Is an approximate match, when j is 2, thus creating a nodePresence and junction at layer 1Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact that Andthus the parent nodeAnd parent nodeCan not be connected with the nodeEstablishing parent relationship, parent nodeCan be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {0,1 }; due to Dδ(s6,p3) 0 ≦ δ, character s6And sub-pattern p3Is an exact match, when j is 3, thus creating a nodePresence and junction at layer 2Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact thatAndthus the parent nodeParent nodeAnd parent nodeCan both be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {4,3 }; due to Dδ(s6,p4) 0 ≦ δ, character s6And sub-pattern p4Is an exact match, when j is 4, thus creating a nodePresence and junction at layer 3Parent node satisfying gap constraintDue to the fact thatThus the parent nodeCan be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {0,2 };
7) for the 7 th character s7C, since Dδ(s7,p3) 0 ≦ δ, character s7And sub-pattern p3Is an exact match, when j is 3, thus creating a nodePresence and junction in layer 2Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact that Andthus the parent nodeParent nodeAnd parent nodeCan be connected with the nodeEstablishing a parent relationship, and calculating a node by the formula (2)Tree root path array ofIs {2,4 }; due to Dδ(s7,p4) 0 ≦ δ, character s7And sub-pattern p4Is an exact match, when j is 4, thus creating a nodePresence and junction at layer 3Parents satisfying gap constraintsNode pointAnd parent nodeDue to the fact that Thus the parent nodeAnd parent nodeCan both be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {4,5 };
8) for the 8 th character s8C, since Dδ(s8,p4) 0 ≦ δ, character s8And sub-pattern p4Is an exact match, when j is 4, thus creating a nodePresence and junction at layer 3Parent node satisfying gap constraintParent nodeAnd parent nodeDue to the fact that Andthus the parent nodeParent nodeAnd parent nodeCan both be connected with a nodeEstablishing a parent relationship, and calculating a node according to a formula (2)Tree root path array ofIs {6,9 };
at this time, the characters in the given sequence S are processed, and a net tree is created, fig. 6 of the attached drawing of the specification shows the net tree which is created finally, and the nodes in which the leaf level cannot be reached, the invalid parent relationship and the invalid nodes are removed;
in the created net tree, each node of the net tree is calculatedArray of leaf pathsNode pointArray of leaf pathsStarting from the leaf layer:
In the formula (3), d represents a nodeGamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pm) Representing a character siAnd sub-pattern pmDelta-distance therebetween;
for net tree j level nodeWherein j is more than or equal to 1<m, when there is a child node, the child node is recorded asAnd calculating the node by formula (4)Array of leaf paths
In the formula (4), d represents a nodeGamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,representation nodeThe child node of (2);
According to the nodeTree root path array ofCalculating the node by the formula (5)Tree root path number ofAccording to the nodeArray of leaf pathsCalculating the node by the formula (6)Number of leaf paths of
In the formula (5), d represents a nodeA γ -distance in the range of 0 to γ to the root layer of the tree, γ representing the overall threshold;
in the formula (6), d represents a nodeA γ -distance in the range of 0 to γ to the leaf layer, γ representing the overall threshold;
according to the nodeTree root path array ofAnd leaf path arrayCalculating the node by the formula (7)Root leaf path number of
In the formula (7), k represents a nodeD to reach the root level of the treeδ(si,pj) Gamma-distance in the range to gamma, h denotes a nodeD to the leaf layerδ(si,pj) To gamma + Dδ(si,pj) -a distance γ in the range of k, γ representing the overall threshold;
in the created net tree, the nodes are connectedRoot leaf path number ofCalculating the position-related number RP (i) of the node, by means of which nodeTree root path number ofCalculating the path branch number PB (i) of the node:
node pointThe location related number RP (i) of (A) is that all nodes labeled i in the net treeRoot leaf path number ofThe sum is calculated by equation (8):
in the formula (8), m represents the number of layers of the net tree,representation nodeThe number of tree root leaf paths;
node pointThe ancestor set of (1) is a set formed by nodes on all paths of the node to the root layer of the tree; the common ancestor set of a node set is a set formed by intersection of ancestor sets of all elements in the node set; node pointThe path branch number PB (i) of (a) refers to all nodes labeled i in a common ancestor set A (D) of a node set DTree root path number ofThe sum is calculated by equation (9):
in formula (9), l represents the depth of the common ancestor set A (D),representation nodeThe number of tree root paths;
thirdly, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure:
the problem of one-time approximate pattern matching under the (delta, gamma) -distance is solved by utilizing the net trees constructed in the first step and the second step, specifically, one root leaf path in the net tree corresponds to one occurrence, the root leaf path meeting local-overall constraint is searched for in the net tree, based on the net tree structure, a heuristic method is adopted to search for the root leaf path meeting local-overall constraint in the net tree, the heuristic method has four similar search occurrence strategies, the first one of the four similar search occurrence strategies of the heuristic method, the rightmost parent strategy and the greedy search parent strategy are adopted in the embodiment: the right-most parent strategy is to start from the last node of the leaf layer, select the right-most parent node of the current node under the condition of meeting the local-overall constraint, the greedy search parent strategy is to start from the last node of the leaf layer, and select the optimal parent node of the current node under the condition of meeting the local-overall constraint, wherein the optimal parent node is the parent node with smaller position correlation number, and when the position correlation numbers of the two parent nodes are the same, the parent node with larger path branch number is selected as the optimal parent node in the common ancestor set of the existing path nodes;
the specific treatment method comprises the following steps:
firstly, starting from the last node of a leaf sublayer, finding a C1 occurrence in a net tree through a rightmost parent strategy, and finding a C2 occurrence in the net tree through a greedy search parent strategy; the two occurrences of residue are then comparedSelecting one of the occurrences with larger number of the remaining occurrences, and adding the selected one to the result set OCC (S, P); finally, deleting the appearance and all nodes with the same labels as the nodes in the appearance in the net tree, and recalculating the rest nodes according to the formula (1) and the formula (2)Tree root path array ofRecalculating the remaining nodes according to equations (3) and (4)Array of leaf pathsIterating the above process until the first node of the leaf layer is the result set OCC (S, P), wherein the remaining occurrence number of C is that the C and all nodes with the same label as the node in the C are deleted in the net tree, and the remaining leaf nodes in the net tree are the same as the nodes in the CTree root path number ofSumming;
thereby completing one-time approximate pattern matching at (delta, gamma) -distance by using the net tree structure;
outputting all occurrences of a given pattern P in said first step in a given sequence S in said first step, i.e. all occurrences in the output result set OCC (S, P), on a display, thereby completing a one-time approximate pattern matching with local-global constraints.
The specific implementation process of the steps is as follows:
in FIG. 6 of the drawings in the specification, the slave nodes meet the local-global constraint according to the right-most parent strategyBegin to find its rightmost parent node upwardsAdd it to existing path<8>In this case, there is a path<7,8>The gamma-distance of (a) is 0; under the condition of meeting local-overall constraint, continuously searching nodes upwardsThe rightmost parent node ofAdd it to existing path<7,8>In this case, there is a path<6,7,8>The gamma-distance of (a) is 1; under the condition of meeting local-overall constraint, continuously searching nodes upwardsThe rightmost parent node ofAdd it to existing path<6,7,8>In this case, one occurrence is found according to the right-most parent policy<4,6,7,8>;
FIG. 8 of the drawings shows that the deletion of occurrences in a net tree<4,6,7,8>And nodes with the same labels as the nodes, recalculating the remaining nodes by formula (1) and formula (2)Tree root path array ofAnd each leaf node is obtained through the formula (5)Tree root path number ofThe sum is 0, thus appearing<4,6,7,8>Is 0;
looking up nodes according to a greedy search parent strategy in conjunction with FIG. 6 of the accompanying drawings and FIG. 7 of the accompanying drawingsFirst, in fig. 6 of the drawings of the specification, each node is calculated by formula (5)Tree root path number ofIn FIG. 7 of the drawings in the specification, each node is calculated by the formula (6)Number of leaf paths ofCalculating each node by formula (7) and formula (8)Root leaf path number ofAnd a position-related number rp (i); under the condition of satisfying local-overall constraintThree parent nodes are all the same, namely the parent nodesParent nodeAnd parent nodeDue to the fact thatAndthen the parent nodePosition-related number of (RP (5)) and parent nodeIs also small, so that the two paths are compared in the existing path<8>The number of path branches in the common ancestor set, at the nodeAre concentrated in common ancestors, due toParent nodeThe number of path branches PB (7) is greater than that of the parent nodeThe number of branch paths PB (5), thereby connecting the nodesJoining to an existing path<8>In this case, there is a path<7,8>The gamma-distance of (a) is 0; continue to search upward for nodesUnder the condition of satisfying local-overall constraintNode (b)Three parent nodes are all the same, namely the parent nodesParent nodeAnd parent nodeDue to the fact thatAndparent nodeIs minimized, thereby connecting the nodesJoining to an existing path<7,8>In this case, there is a path<4,7,8>The gamma-distance of (a) is 1; continue to search upward for nodesThe optimal parent node of (2) under the condition of satisfying local-overall constraintThree parent nodes are all the same, namely the parent nodesAnd parent nodeDue to the fact thatThereby connecting the nodesJoining to an existing path<4,7,8>At this time, a single occurrence is found according to the greedy search parent strategy<1,4,7,8>;
FIG. 9 of the drawings shows that the deletion of occurrences in a net tree<1,4,7,8>And nodes with the same labels as the nodes, recalculating the remaining nodes by formula (1) and formula (2)Tree root path array ofAnd each node of the leaf layer is obtained by the formula (5)Tree root path number ofThe sum is 1, thus appearing<1,4,7,8>The remaining number of occurrences of (a) is 1;
in summary, the occurrences are found by the right-most parent strategy<4,6,7,8>The number of occurrences remaining is 0, and the occurrences are found by greedy search parent strategy<1,4,7,8>The number of remaining occurrences of which is 1, the occurrence in which the number of remaining occurrences is the largest is selected, and the slave node is thus selectedBegin to find the appearance<1,4,7,8>Adding it to the result set OCC (S, P), and deleting the appearance in FIG. 9 in FIG. 10 of the drawings accompanying this specification<1,4,7,8>And the net tree behind all nodes with the same label;
due to the nodeWhen the last step is found, it is found<1,4,7,8>In use, the tree has been deleted, so the search for the occurrence continues directly from the next leaf node;
in FIG. 10 of the drawings in the specification, the slave nodes meet the local-global constraint according to the right-most parent strategyBegin to find its rightmost parent node upwardsAdd it to existing path<6>In this case, there is a path<5,6>The gamma-distance of (a) is 1; under the condition of meeting local-overall constraint, continuously searching nodes upwardsThe rightmost parent node ofAdd it to existing path<5,6>In this case, there is a path<3,5,6>The gamma-distance of (a) is 0; under the condition of meeting local-overall constraint, continuously searching nodes upwardsThe rightmost parent node ofAdd it to existing path<2,3,5,6>In this case, one occurrence is found according to the right-most parent policy<2,3,5,6>;
In FIG. 10 of the drawings, nodes are found upward according to a greedy search parent strategySince the nodes are the optimal parent nodesOnly one parent nodeThere is no need to calculate the position-related number and the path branch number, and therefore it is added to the existing path<6>Performing the following steps; similarly, selecting parent nodesAnd parent nodeThus finding an occurrence according to a greedy search parent strategy<2,3,5,6>;
In summary, the slave nodeInitially, an occurrence is found by both the right-most parent strategy and the greedy search parent strategy<2,3,5,6>Adding it to result set OCC (S, P);
now the first node of the leaf sublayer has been reached, the one-time approximate pattern matching with local-global constraints is completed, and the occurrences in the result set OCC (S, P), i.e. <1,4,7,8> and <2,3,5,6> are output on the display.
Fig. 4 shows that the present embodiment is based on a given pattern P ═ a [0,2 ═ a]b[0,2]c[0,2]The number of sub-patterns in c determines the number of levels of the net tree, since there are 4 sub-patterns in a given pattern P, sub-pattern P1A, sub-pattern p2B, sub-pattern p3C, sub-pattern p4Since c is the number of layers, the net tree has 4 layers, which are referred to as layer 1, layer 2, layer 3, and layer 4, respectively.
Fig. 5 shows that the present embodiment creates a net tree based on the given sequence S, pattern P, local threshold δ, and global threshold γ, and calculates each nodeD of (A)δ(si,pj) And calculating each nodeTree root path array ofGrid line node in graphRepresenting a character siAnd sub-pattern pjApproximately matched, i.e. 0<Dδ(si,pj) Delta or less, white nodeRepresenting a character siAnd sub-pattern pjExact matching, i.e. Dδ(si,pj) 0; each nodeData representation node of left first rowTree root path array ofThe second line of data representing the character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) (ii) a For Dδ(si,pj)>Delta junction pointNeed not be created since Dδ(s4,p3)>δ, thus the nodeIs not created; in the drawingsThe black dotted line of (a) indicates that the gap constraint is satisfied between two nodes, but due to the limitation of local-global constraint, the parent relationship cannot be established between the two nodes; for theNode (a) ofPaths satisfying the local-global constraint that do not pass through the node, need to be deleted, need not be created, sinceThus the nodeDeleted, not created.
FIG. 6 is a net tree of FIG. 5 of the drawings of the specification with nodes in the leaf level unreachable, invalid parent relationships, and invalid nodes removed, wherein there are no invalid parent relationships and invalid nodes and each node can reach the leaf level, each node being capable of reaching the leaf levelData representation node of left first rowTree root path array ofThe second line of data representing the character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj)。
FIG. 7 illustrates leaf nodes from a net treeInitially, the respective nodes in FIG. 6 of the drawings of the specification are calculated by formula (3) and formula (4)Array of leaf pathsWherein each nodeData representation node on right sideArray of leaf paths
FIG. 8 shows the slave node of the present embodiment in conjunction with FIG. 6 of the drawings and FIG. 7 of the drawingsInitially, one occurrence is found according to the right-most parent policy<4,6,7,8>Is present and present<4,6,7,8>And all nodes with the same label as it are indicated by dashed circles; removing occurrences from a net tree<4,6,7,8>And all nodes with the same label, deleting all parent relations related to the nodes, and recalculating the rest nodes by formula (1) and formula (2)Tree root path array ofEach nodeLeft first row data representation nodeTree root path array ofThe second line of data representing a character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) (ii) a Each leaf node of the net treeTree root path number ofIf the sum is 0, then<4,6,7,8>The remaining number of occurrences of (a) is 0.
FIG. 9 shows the slave node of the present embodiment in conjunction with FIG. 6 of the drawings and FIG. 7 of the drawingsInitially, an occurrence is found according to a greedy search parent strategy<1,4,7,8>Is present and present<1,4,7,8>And all nodes with the same label as it are indicated by dashed circles; removing occurrences from a net tree<1,4,7,8>And all nodes with the same label, deleting all parent relations related to the nodes, and recalculating the rest nodes by formula (1) and formula (2)Tree root path array ofEach nodeLeft first row data representation nodeTree root path array ofThe second line of data representing a character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) (ii) a Each leaf node of the net treeTree root path number ofIf the sum is 1, then<1,4,7,8>The remaining number of occurrences of (c) is 1.
FIG. 10 is a drawing of the specification with the deletion of FIG. 9<1,4,7,8>And a net tree behind all nodes with the same label as the net tree, each nodeLeft first row data representation nodeTree root path array ofThe second line of data representing a character siAnd sub-pattern pjDelta-distance D betweenδ(si,pj) Slave nodeInitially, an occurrence is found according to a greedy search parent strategy<2,3,5,6>。
Example 2
Except for the third step, in solving the one-time approximate pattern matching problem under the (δ, γ) -distance by using the net tree structure, a heuristic method is adopted to search a tree root leaf path satisfying local-overall constraints in the net tree, in this embodiment, the second of four similar search emerging strategies of the heuristic method is adopted, namely a leftmost parent strategy and a greedy search parent strategy: the left-most parent strategy is that a left-most parent node of a current node is selected preferentially from a first node of a leaf layer under the condition of meeting local-overall constraint, at the moment, greedy search is performed on the left-most parent strategy, namely the left-most parent node of the current node is selected from the first node of the leaf layer, and the optimal parent node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal parent node is the parent node with a small position correlation number; "otherwise, the same procedure as in example 1 was repeated.
Example 3
Except for the third step, in solving the one-time approximate pattern matching problem under the (δ, γ) -distance by using the net tree structure, a heuristic method is adopted to search a tree root leaf path satisfying local-overall constraints in the net tree, in this embodiment, the third of four similar search occurrence strategies of the heuristic method is adopted, namely a right-most child strategy and a greedy search child strategy: the rightmost child strategy is that a rightmost child node of a current node is selected preferentially from the last node of a tree root layer under the condition of meeting local-overall constraint, at the moment, child strategy is searched greedily from the last node of the tree root layer, and the optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with smaller position correlation number; "otherwise, the same procedure as in example 1 was repeated.
Example 4
Except for the third step, in solving the one-time approximate pattern matching problem under the (δ, γ) -distance by using the net tree structure, a heuristic method is adopted to search a tree root leaf path satisfying local-overall constraints in the net tree, in this embodiment, a fourth of four similar search occurrence strategies of the heuristic method is adopted, namely a leftmost child strategy and a greedy search child strategy: the left-most child strategy is that a left-most child node of a current node is selected from a first node of a tree root layer under the condition of meeting local-overall constraint, at the moment, a child strategy is searched greedily from the first node of the tree root layer, and an optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with a small position correlation number; "otherwise, the same procedure as in example 1 was repeated.
The present invention is not limited to the above embodiments, and any design that uses the design concept of the present invention and makes some simple changes should be within the scope of the present invention.
Claims (1)
1. A one-time approximate pattern matching method with local-global constraints, characterized by: the method solves the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure, and comprises the following specific steps:
step one, determining the layer number of the net tree:
reading in a given sequence S, determining the length n, each character in the sequence S being marked as a character S1S character2…, character snReading in a given pattern P, determining its length as m, and recording each sub-pattern of the pattern P as a sub-pattern P1Sub-mode p2…, submode pmHere 0<m<n; determining the number of layers of the net tree according to the number of the sub-modes in the given mode P, wherein the net tree has m layers which are respectively marked as a 1 st layer, a 2 nd layer, … and an m th layer;
Given a local threshold value delta and a global threshold value gamma, where 0. ltoreq. delta. ltoreq. gamma, and according to the sequence S in the first step and the sequence SPattern P in the first step creates a net tree with nodes labeled i at level j of the net tree, which are labeled iUnlike exact pattern matching, each node in a net tree created by one-time approximate pattern matching with local-global constraintsCharacter s to be recordediAnd sub-pattern pjDelta-distance D betweenδ(si,pj) And each nodeTree root path array with size of gamma +1To describe the nodeNumber of tree root paths to reach tree root levelArray subscript 0,1, …, γ, current nodeAmong the paths to the root layer, the number of root paths with a distance of γ d is represented asInitializing the alpha-gamma-beta-gamma to 0, wherein d is more than or equal to 0 and less than or equal to gamma; during the process of creating the net tree, according to each nodeTree root path array ofInvalid parent relations and invalid nodes in the net tree can be judged and deleted in advance; for the same reason, each nodeIt is also desirable to use an array of leaf paths of size γ +1To describe the nodeNumber of leaf paths to leaf layerArray subscript 0,1, …, γ, current nodeThe number of leaf paths with a distance of y-d in the path to the leaf layer is indicated asIt is also initialized to 0, where d is 0. ltoreq. gamma;
the specific treatment method comprises the following steps:
for a given sequence S of characters S in the first step aboveiProcessing and calculating in turn the submode P in the given pattern P in the first step above1Sub-mode p2…, submode pmDelta-distance D betweenδ(si,pj) Wherein j is more than or equal to 1 and less than or equal to m:
when D is presentδ(si,pj)>δ, where 1 ≦ j ≦ m, without creating nodes at level j of the created net tree
When D is presentδ(si,pj) And ≦ δ, where j ≦ 1, creating nodes directly at the root level of the created mesh treeAnd calculating the node by formula (1)Tree root path array of
In the formula (1), d represents a nodeGamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,p1) Representing a character siAnd sub-pattern p1Delta-distance therebetween;
when D is presentδ(si,pj) δ is not more than 1<If j is less than or equal to m, then a node is created in the jth layer of the created net treeAnd judging whether the node exists in all the nodes of the j-1 th layer of the created net treeSatisfies the gap constraint [ min ]j-1,maxj-1]Parent node (c): when there is no tie pointSatisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is deleted, the node is deletedWhen existing and nodeSatisfies the gap constraint [ min ]j-1,maxj-1]When the parent node is the node, the parent node is recorded asAnd continuously judging the parent node by the following methodAnd nodeWhether the parent relationship can be established:
when in useThen the parent nodeAnd nodeCan establish a parent relationship between them, and calculate the nodes by the formula (2)Tree root path array of
In the formula (2), d represents a nodeGamma-distance in the range of 0 to gamma to the root level of the tree, delta denotes local threshold, gamma denotes global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,representing the union node in the j-1 th layer of the net treeSatisfies the gap constraint [ min ]j-1,maxj-1]And can be connected to a nodeEstablishing parent nodes of parent relations;
traversing all nodes of the j-1 th layer of the net tree in sequence, and calculating the nodesTree root path array ofWhen in useThen the node is deleted
After each character in the given sequence S in the first step is processed, a net tree is created;
in the created net tree, each node of the net tree is calculatedArray of leaf pathsNode pointArray of leaf pathsStarting from the leaf layer:
In the formula (3), d represents a nodeGamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pm) Representing a character siAnd sub-pattern pmDelta-distance therebetween;
for net treeNode of j levelWhen the child node exists, the child node is recorded asAnd calculating the node by formula (4)Array of leaf paths
In the formula (4), d represents a nodeGamma-distance in the range of 0 to gamma to the leaf layer, delta denotes the local threshold, gamma denotes the global threshold, Dδ(si,pj) Representing a character siAnd sub-pattern pjThe delta-distance between the two electrodes,representation nodeThe child node of (2);
According to the nodeTree root path array ofCalculating the node by the formula (5)Tree root path number ofAccording to the nodeArray of leaf pathsCalculating the node by the formula (6)Number of leaf paths of
In the formula (5), d represents a nodeA γ -distance in the range of 0 to γ to the root layer of the tree, γ representing the overall threshold;
in the formula (6), d represents a nodeA γ -distance in the range of 0 to γ to the leaf layer, γ representing the overall threshold;
according to the nodeTree root path array ofAnd leaf path arrayCalculating the node by the formula (7)Root leaf path number of
In the formula (7), k represents a nodeD to reach the root level of the treeδ(si,pj) Gamma-distance in the range to gamma, h denotes a nodeD to the leaf layerδ(si,pj) To gamma + Dδ(si,pj) In the range of-kγ -distance, γ representing the overall threshold;
in the created net tree, the nodes are connectedRoot leaf path number ofCalculating the position-related number RP (i) of the node, by means of which nodeTree root path number ofCalculating the path branch number PB (i) of the node:
node pointThe location related number RP (i) of (A) is that all nodes labeled i in the net treeRoot leaf path number ofThe sum is calculated by equation (8):
in the formula (8), m represents the number of layers of the net tree,representation nodeThe number of tree root leaf paths;
node pointThe ancestor set of (1) is a set formed by nodes on all paths of the node to the root layer of the tree; the common ancestor set of a node set is a set formed by intersection of ancestor sets of all elements in the node set; node pointThe path branch number PB (i) of (1) refers to all nodes labeled i in a common ancestor set A (D) of a node set DTree root path number ofThe sum is calculated by equation (9):
in formula (9), l represents the depth of the common ancestor set A (D),representation nodeThe number of tree root paths;
thirdly, solving the problem of one-time approximate pattern matching under the (delta, gamma) -distance by utilizing a net tree structure:
the problem of one-time approximate pattern matching under the (delta, gamma) -distance is solved by utilizing the net trees constructed in the first step and the second step, specifically, one tree root leaf path in the net tree corresponds to one appearance, the appearance is searched, namely, the tree root leaf path meeting the local-overall constraint is searched in the net tree, based on the net tree structure, the tree root leaf path meeting the local-overall constraint is searched in the net tree by adopting a heuristic method, and the heuristic method is any one of the following four similar appearance searching strategies:
first, the right-most parent strategy and the greedy search parent strategy: the right-most parent strategy is to start from the last node of the leaf layer, select the right-most parent node of the current node under the condition of meeting the local-overall constraint, the greedy search parent strategy is to start from the last node of the leaf layer, and select the optimal parent node of the current node under the condition of meeting the local-overall constraint, wherein the optimal parent node is the parent node with smaller position correlation number, and when the position correlation numbers of the two parent nodes are the same, the parent node with larger path branch number is selected as the optimal parent node in the common ancestor set of the existing path nodes;
second, left-most and greedy search parent strategies: the left-most parent strategy is that a left-most parent node of a current node is selected preferentially from a first node of a leaf layer under the condition of meeting local-overall constraint, at the moment, greedy search is performed on the left-most parent strategy, namely the left-most parent node of the current node is selected from the first node of the leaf layer, and the optimal parent node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal parent node is the parent node with a small position correlation number;
third, the right-most child strategy and the greedy search child strategy: the rightmost child strategy is that a rightmost child node of a current node is selected preferentially from the last node of a tree root layer under the condition of meeting local-overall constraint, at the moment, child strategy is searched greedily from the last node of the tree root layer, and the optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with smaller position correlation number;
fourth, left-most child strategy and greedy search child strategy: the left-most child strategy is that a left-most child node of a current node is selected from a first node of a tree root layer under the condition of meeting local-overall constraint, at the moment, a child strategy is searched greedily from the first node of the tree root layer, and an optimal child node of the current node is selected under the condition of meeting local-overall constraint, wherein the optimal child node refers to a child node with a small position correlation number;
thereby completing one-time approximate pattern matching at (delta, gamma) -distance by using the net tree structure;
outputting all occurrences of a given pattern P in said first step in a given sequence S in said first step, i.e. all occurrences in the output result set OCC (S, P), on a display, thereby completing a one-time approximate pattern matching with local-global constraints.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910530672.6A CN110232140B (en) | 2019-06-19 | 2019-06-19 | Disposable approximate pattern matching method with local-overall constraint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910530672.6A CN110232140B (en) | 2019-06-19 | 2019-06-19 | Disposable approximate pattern matching method with local-overall constraint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110232140A CN110232140A (en) | 2019-09-13 |
CN110232140B true CN110232140B (en) | 2021-03-02 |
Family
ID=67856066
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910530672.6A Active CN110232140B (en) | 2019-06-19 | 2019-06-19 | Disposable approximate pattern matching method with local-overall constraint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110232140B (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101702171A (en) * | 2009-11-19 | 2010-05-05 | 新蛋信息技术(西安)有限公司 | Approximating matching method for numerous character strings |
US9679104B2 (en) * | 2013-01-17 | 2017-06-13 | Edico Genome, Corp. | Bioinformatics systems, apparatuses, and methods executed on an integrated circuit processing platform |
CN109271419B (en) * | 2018-09-03 | 2021-06-22 | 河北工业大学 | Online string matching method without gap constraint |
-
2019
- 2019-06-19 CN CN201910530672.6A patent/CN110232140B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN110232140A (en) | 2019-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nguyen et al. | Accurate and efficient structural characteristic feature extraction for clone detection | |
JP3672242B2 (en) | PATTERN SEARCH METHOD, PATTERN SEARCH DEVICE, COMPUTER PROGRAM, AND STORAGE MEDIUM | |
Ehrenfeucht et al. | A new distance metric on strings computable in linear time | |
US7287026B2 (en) | Method of comparing the closeness of a target tree to other trees using noisy sub-sequence tree processing | |
Klein | A subset spanner for planar graphs, with application to subset TSP | |
Ann et al. | Efficient algorithms for the block edit problems | |
Kuosmanen et al. | Using minimum path cover to boost dynamic programming on DAGs: co-linear chaining extended | |
CN111475551A (en) | High average utility sequence pattern mining method under non-overlapping condition | |
CN110232140B (en) | Disposable approximate pattern matching method with local-overall constraint | |
JP3630414B2 (en) | Base sequence cluster generation system, base sequence cluster generation method, program for executing the cluster generation method, computer-readable recording medium storing the program, and base sequence information providing system | |
Chowdhary et al. | An improved hyperbolic embedding algorithm | |
Lacki et al. | Reachability in graph timelines | |
Korostensky et al. | Near optimal multiple sequence alignments using a traveling salesman problem approach | |
CN113076356A (en) | One-time conditional negative sequence mode mining method | |
CN110245167B (en) | Non-overlapping approximate pattern matching method with local-overall constraint | |
Galbrun et al. | Mining redescriptions with siren | |
CN110232084B (en) | Approximate pattern matching method with local-global constraint | |
Gambette et al. | Do branch lengths help to locate a tree in a phylogenetic network? | |
Hendrian et al. | Online algorithms for constructing linear-size suffix trie | |
Li et al. | Semi-dynamic shortest-path tree algorithms for directed graphs with arbitrary weights | |
Wu et al. | NetNDP: Nonoverlapping (delta, gamma)-approximate pattern matching | |
Heibl et al. | Package ‘ips’ | |
Staus et al. | Exact algorithms for group closeness centrality | |
Oehl | A combinatorial approach for reconstructing rDNA repeats | |
Liu et al. | A parallel lcs algorithm for biosequences alignment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |