CN117874088B - Data fuzzy matching method, device, equipment and medium - Google Patents
Data fuzzy matching method, device, equipment and medium Download PDFInfo
- Publication number
- CN117874088B CN117874088B CN202410275647.9A CN202410275647A CN117874088B CN 117874088 B CN117874088 B CN 117874088B CN 202410275647 A CN202410275647 A CN 202410275647A CN 117874088 B CN117874088 B CN 117874088B
- Authority
- CN
- China
- Prior art keywords
- matching
- operator
- rule
- matched
- atomic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 86
- 230000008569 process Effects 0.000 claims abstract description 24
- 230000014509 gene expression Effects 0.000 claims description 108
- 238000004590 computer program Methods 0.000 claims description 16
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 14
- 238000010276 construction Methods 0.000 claims description 13
- 238000007781 pre-processing Methods 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000012840 feeding operation Methods 0.000 claims 1
- 238000004891 communication Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000003672 processing method Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24558—Binary matching operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Automation & Control Theory (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention discloses a data fuzzy matching method, a device, equipment and a medium. The method comprises the following steps: acquiring engineering data sources to be matched, and forming a data list to be matched from the engineering data sources to be matched; configuring matching keywords according to the atomic keywords and the matching rules; and carrying out fuzzy matching on the data list to be matched and the matching keywords based on the mode matching mode of the character strings so as to obtain a matching result of the engineering data source to be matched. According to the embodiment of the invention, the matching keywords are configured through the atomic keywords and the matching rules, and the data list to be matched and the matching keywords are subjected to fuzzy matching based on a stack matching mode and a binary tree matching mode to obtain the matching result of the engineering data source to be matched, so that the problem that the accuracy and the efficiency cannot be improved due to the difference in the engineering data identification process can be solved, the data list to be matched does not need to be frequently acquired in the matching process, fewer acquisition operations of the data list to be matched are realized, and the matching efficiency and accuracy are improved.
Description
Technical Field
The present invention relates to the field of railway signal data processing technologies, and in particular, to a method, an apparatus, a device, and a medium for fuzzy matching of data.
Background
At present, train control data are completely digitized, signal data related to railway design are provided in an electronic form, so that equipment providers process normalized electronic engineering data according to engineering design schemes, and the tool software is automatically converted into a data form required by basic software, so that the application of some advanced algorithm technology to train control data processing is possible, and keyword matching is an effective solution. No matter what technology is used, matching operation is needed, so that matching operation is needed, currently used accurate matching is simple and easy to realize, but the defect is that the failure rate is high, so that the problem of low matching accuracy is solved, the problem that the accuracy and efficiency cannot be improved due to the difference in the engineering data identification process cannot be solved, in the prior art, when data fuzzy matching is conducted, a data source needed to be matched is needed to be frequently acquired in the matching process, so that matching failure frequently occurs, manual assistance is needed after failure, the automation degree of signal data processing is low, therefore, fuzzy matching is an effective solution, and a technology suitable for improving the effective identification rate in the signal field during train control data processing is needed to be explored, so that the problem that the accuracy and efficiency cannot be improved due to the difference in the data identification process is solved.
Disclosure of Invention
In view of the above, the invention provides a data fuzzy matching method, device, equipment and medium, which can solve the problem that the accuracy and efficiency cannot be improved due to the variability in the engineering data identification process, and the matching process does not need to frequently acquire matching targets, so that fewer operations on the matching targets are realized, and the efficiency and accuracy of data source matching are improved.
According to an aspect of the present invention, an embodiment of the present invention provides a data fuzzy matching method, including:
Acquiring engineering data sources to be matched, and forming a data list to be matched from the engineering data sources to be matched;
Configuring matching keywords according to pre-constructed atomic keywords and pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules;
Fuzzy matching is carried out on the data list to be matched and the matching keywords based on a mode matching mode of the character strings, so that a matching result of the engineering data source to be matched is obtained; the pattern matching mode of the character string comprises the following steps: stack matching and binary tree matching.
According to another aspect of the present invention, an embodiment of the present invention further provides a data fuzzy matching apparatus, including:
The construction module is used for acquiring engineering data sources to be matched and forming a data list to be matched from the engineering data sources to be matched;
The configuration module is used for configuring the matching keywords according to the pre-constructed atomic keywords and the pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules;
The matching module is used for carrying out fuzzy matching on the data list to be matched and the matching keywords based on a mode matching mode of the character strings so as to obtain a matching result of the engineering data source to be matched; the pattern matching mode of the character string comprises the following steps: stack matching and binary tree matching.
According to another aspect of the present invention, an embodiment of the present invention further provides an electronic device, including:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data fuzzy matching method of any of the embodiments of the present invention.
According to another aspect of the present invention, an embodiment of the present invention further provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a processor to implement the data fuzzy matching method according to any embodiment of the present invention.
According to the technical scheme, the matching keywords are configured through the pre-constructed atomic keywords and the pre-defined matching rules, so that the matching keywords can be flexibly expanded in remembering; on the basis, the to-be-matched engineering data sources are formed into the to-be-matched data list based on a stack matching mode and a binary tree matching mode, and the configured matching keywords are subjected to fuzzy matching to obtain a matching result of the to-be-matched engineering data sources, so that the problem that the accuracy and the efficiency cannot be improved due to the difference in the engineering data identification process can be solved, the to-be-matched data list is not required to be frequently acquired in the matching process, fewer acquisition operations of the to-be-matched data list are realized, and the matching efficiency and accuracy are improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data fuzzy matching method according to an embodiment of the present invention;
FIG. 2 is a flowchart of another method for fuzzy matching of data according to an embodiment of the present invention;
FIG. 3 is a schematic representation of a post-construction binary tree expression according to one embodiment of the present invention;
FIG. 4 is a flowchart of a method for performing fuzzy matching of data by using a stack matching method according to an embodiment of the present invention;
FIG. 5 is a flow chart of a fuzzy matching method for engineering data using binary tree matching according to an embodiment of the present invention;
FIG. 6 is a block diagram illustrating a data fuzzy matching device according to an embodiment of the present invention;
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In an embodiment, fig. 1 is a flowchart of a data fuzzy matching method according to an embodiment of the present invention, where the method may be performed by a data fuzzy matching device, and the data fuzzy matching device may be implemented in hardware and/or software, and the data fuzzy matching device may be configured in an electronic device.
As shown in fig. 1, the data fuzzy matching method in this embodiment includes the following specific steps:
S110, acquiring engineering data sources to be matched, and forming a data list to be matched from the engineering data sources to be matched.
The engineering data source to be matched can be understood as a target data source to be matched, the target data source can be a data source provided by different design units, the data source can comprise but is not limited to a data information table aiming at a certain engineering project, column control engineering data, interface data and the like of a high-speed railway signal, the data information table can comprise but is not limited to an entry information table and an exit information table aiming at a certain railway line, and each information table comprises an identification area and a data area; illustratively, the route information table may include: the type of approach, traffic signal information, transponder information, switch information, line speed, etc., the present embodiment is not limited in this regard.
In this embodiment, the data list to be matched may be understood as a matched pile list, where the matched pile list is a header of the extracted engineering data source to be matched, each header may include one or more matched piles, a hierarchy may be set for each matched pile, each hierarchy may correspond to information such as a name of the hierarchy, and in an exemplary routing information table for a certain station, a route type is a hierarchy, a start signal is a hierarchy, and the start signal may be further divided into a name of the start signal, a display of the start signal, and so on.
In this embodiment, the engineering data source to be matched generally consists of corresponding data tables, each data table may include two parts of an identification area and a data area, the identification area may be understood as a header of the data table, the data area may be understood as a specific data content corresponding to each header, a list header corresponding to the engineering data source to be matched may be read, and one or more matching piles in the list header may form a data list to be matched.
S120, configuring matching keywords according to pre-constructed atomic keywords and pre-defined matching rules; the matching keywords are in the form of expressions consisting of atomic keywords and rule operators contained in the matching rules.
The pre-constructed atomic keywords may be understood as atomic keywords constructed by analyzing signal features contained in the engineering data source to be matched, and the atomic keywords may form an atomic keyword table.
In one embodiment, the pre-construction of the atomic keywords includes: and analyzing signal characteristics contained in the engineering data source to be matched, and constructing an atomic keyword according to an analysis result. In this embodiment, the analysis is performed according to the names in the engineering data and the interface data provided by each design unit, where the character strings corresponding to the names in the engineering data and the interface data include one or more specific signal atom keywords, so that an atom keyword can be constructed. Illustratively, according to the signal specialty, the atom key set { a1= "approach", a2= "number", a3= "type", a4= "start", a5= "signalizer", a6= "unit" }, is defined.
In this embodiment, the predefined matching rules include rule names, rule symbols, rule attributes and rule meanings, and corresponding operation symbol priorities are predefined between rule symbols, and, for better understanding of the predefined matching rules, a specific content of a matching rule definition table provided by an embodiment of the present invention is shown in table one.
Table one: match rule definition table
In this embodiment, the rules need to define an operation priority, and any two operators α and β should follow the following principles according to the order of occurrence: 1) The @ operator has a lower priority than other operators; 2) The operator is higher than the @ operator but lower than the other operators; 3) Under the conditions 1 and 2 are satisfied, the operations in brackets are prioritized.
In this embodiment, in order to better understand that the rule symbols have predefined priorities of the corresponding operation symbols, table two is an operation symbol definition table provided in the embodiment of the present invention, and the specific contents are shown in table two.
And (II) table: arithmetic symbol definition table
In this embodiment, according to the above operation rule, it can be obtained that: the operator does not allow brackets to be separated; brackets "()" must be paired, when encountering ")", the expression set up has an error if the "(" brackets can be removed "and the priorities corresponding to the two operators do not exist during the operation.
In this embodiment, the association relationship between the atomic keywords may be established first, and the association relationship is combined according to a predefined matching rule to obtain the matching keywords, which may be understood that the matching keywords may be constructed by combining the atomic keywords in the rule definition atomic keyword tables. For the sake of better understanding of the construction of the matching keywords, table three is an example of the construction of the matching keywords provided in the embodiment of the present invention, and the specific contents are shown in table three.
Table three: construction example of matching keywords
In this embodiment, the matching key may be obtained from three tables, and may be composed of the atomic key and the rule to form a corresponding expression form.
S130, carrying out fuzzy matching on a data list to be matched and a matching keyword based on a pattern matching mode of the character string so as to obtain a matching result of the engineering data source to be matched; the pattern matching mode of the character string comprises the following steps: stack matching and binary tree matching.
The pattern matching mode of the character string can be understood as a character string matching mode corresponding to the matching keyword, and the data list to be matched and the matching keyword can be subjected to fuzzy matching in any one matching mode of a stack matching mode or a binary tree matching mode.
In this embodiment, the stack analysis operation may be performed on the matching keyword and the data list to be matched based on the setting operation Fu Zhan and the operand stack, and based on the current character string in the matching keyword being an atomic keyword or an operation operator and the operation symbol priority of the rule symbol in the predefined matching rule, so as to obtain the matching result of the engineering data source to be matched, specifically, each time of operation Fu Chuzhan, the operand may also be popped out of the stack correspondingly, if the operand is an atomic keyword, the matching of the atomic keyword to the matching pile list is performed first, so as to obtain the first matching result, the matching priority operation of the operation operator is performed on the first matching result so as to obtain the second matching result, and layer-by-layer analysis is performed until the character string in the matching keyword is analyzed to obtain the final matching result; in some embodiments, the data list to be matched and the binary tree expression form are subjected to fuzzy matching by firstly converting the prefix expression form in the character string corresponding to the matching keyword into the suffix expression form of the matching keyword, constructing a binary tree expression form based on the suffix expression form, and performing fuzzy matching on the data list to be matched and the binary tree expression form by adopting a subsequent traversing method and the operation priority of the rule symbol in the predefined matching rule so as to obtain a corresponding matching result.
According to the technical scheme, the matching keywords are configured through the pre-constructed atomic keywords and the pre-defined matching rules, so that the matching keywords can be flexibly expanded in remembering; on the basis, the to-be-matched engineering data sources are formed into a to-be-matched data list based on a stack matching mode and a binary tree matching mode, and fuzzy matching is carried out on configured matching keywords so as to obtain a matching result of the to-be-matched engineering data sources, so that the problem that accuracy and efficiency cannot be improved due to the difference in the engineering data identification process can be solved, the to-be-matched data list is not required to be frequently acquired in the matching process, fewer acquisition operations of the to-be-matched data list are realized, and the efficiency and accuracy of data source matching are improved.
In an embodiment, fig. 2 is a flowchart of another data fuzzy matching method according to an embodiment of the present invention, where, based on the foregoing embodiments, a project data source to be matched is obtained, and the project data source to be matched is formed into a data list to be matched; configuring matching keywords according to pre-constructed atomic keywords and pre-defined matching rules; and carrying out fuzzy matching on the data list to be matched and the matching keywords based on a mode matching mode of the character strings so as to obtain a matching result of the engineering data source to be matched, and further refining the matching result.
As shown in fig. 2, the data fuzzy matching method in this embodiment may specifically include the following steps:
S210, acquiring engineering data sources to be matched, and reading list heads corresponding to the engineering data sources to be matched.
The list header is understood to be the identification area part of the corresponding list of the engineering data sources to be matched, and the identification area can include, but is not limited to, a route type, a start signal, a terminal signal, a passing transponder, a turnout and a line speed.
In this embodiment, the engineering data source to be matched is obtained, and the identification area portion of the table corresponding to the engineering data source to be matched is read. The list head comprises at least one matching pile, each matching pile is provided with a grade, and each grade corresponds to corresponding grade information.
S220, forming a data list to be matched by at least two matching piles.
In this embodiment, at least two matching piles are formed into a data list to be matched according to a form of rows and columns. For example, in order to better understand the to-be-matched data list, table four is the content of the matching pile contained in the to-be-matched data list provided by the embodiment of the invention, and the specific content is shown in table four.
Table four: matching pile content contained in data list to be matched
S230, establishing association relations among the atomic keywords.
In this embodiment, the association relationship between the atomic keywords may be established, and it may be understood that the keywords may be mutually established to form the operation rule.
S240, combining the association relations according to a predefined matching rule to obtain a matching keyword; the rule name, the rule symbol, the rule attribute and the rule meaning are included in the predefined definition of the matching rule, and the corresponding operation symbol priority is predefined between the rule symbols.
In this embodiment, the association relationships may be combined according to a predefined matching rule to obtain a matching keyword, which may be understood that, according to the signal professional feature, as a user with a matching requirement, the matching keyword may be formed by an atomic keyword according to a matching rule of a formulated keyword; the rule name, the rule symbol, the rule attribute and the rule meaning are included in the predefined definition of the matching rule, and the corresponding operation symbol priority is predefined between the rule symbols.
The fifth table is a specific content of the route information table in the c item provided by the embodiment of the present invention, the fifth table is a specific content of the route information table in the d item provided by the embodiment of the present invention, and in the route information table in the c item engineering, the fields identified for the code sequence and the transponder link information are shown in the fifth table:
table five: specific content of route information table in item c
In item c, in the route information provided by the design unit, the matching stake of the code sequence is 2 layers, which is the "initial signal machine/highest code sequence", and the matching stake of the transponder link is 2 layers, which is the "transponder group/transponder group unit number/link distance".
Table six: specific contents of the route information table in item d
In the item d, in the route information provided by the design unit, the matching stake of the code sequence is 2 layers, which is a 'start signal machine/display', and the matching stake of the transponder link is 1 layer, which is a 'passing transponder'. According to the atomic keywords and rules and the actual conditions of the engineering projects, the expression defining the matching keywords can be expressed as follows: self_rule= { u' start: shows the |code sequence',
U' transponder $@ transponder: link connection number,
}
S250, performing data preprocessing operation on the matching keywords to obtain target matching keywords after preprocessing operation; wherein the preprocessing operation at least comprises: and (5) sequentially processing rule symbols in the matching key words.
In this embodiment, performing a data preprocessing operation on the matching keywords to obtain target matching keywords after the preprocessing operation; wherein the preprocessing operation at least comprises: and (5) sequentially processing rule symbols in the matching key words.
S260, according to the type of the character string analyzed by the target matching key word, the operation symbol priority of the rule symbol in the predefined matching rule, a preset operator stack and a preset operand stack, performing stack analysis operation on the target matching key word and the data list to be matched, and using the stack analysis operation as a fuzzy matching process; wherein the type includes an atomic key or an operator.
In this embodiment, stack parsing operation may be performed on the target matching keyword and the data list to be matched as a fuzzy matching process according to the type of the character string parsed by the target matching keyword, the operation symbol priority of the rule symbol in the predefined matching rule, the preset operator stack and the preset operand stack; wherein the type includes an atomic key or an operator.
Specifically, according to the type of the character string parsed by the target matching key, the operation symbol priority of the rule symbol in the predefined matching rule, the preset operator stack and the preset operand stack, performing stack parsing operation on the target matching key and the data list to be matched, including: reading a current character string analyzed by the target matching keyword, judging whether the current character string is an atomic keyword, and if so, executing S2601; if not, S2602 is executed.
In this embodiment, the current string parsed by the target matching key is read, whether the current string is an atomic key is judged, if yes, the atomic key is placed in a preset operand stack, the atomic key is matched with a data list to be matched to obtain a first matching result, if no, the current string is determined to be an operation operator, the operation operator is placed in a preset operator stack, a comparison result between the first priority of the operation operator and the second priority of a stack top operator in the preset operator stack is determined according to a predefined matching rule, the operator with the higher priority is carried out according to the comparison result to carry out a stack-out operation, a matching operation is carried out according to the operation rule of the operator with the higher priority and the first matching result to obtain a second matching result, the next string is continuously read, the next string is taken as the current string, and the step S260 is returned until the strings of the matching key are all read to obtain the matching result of the engineering data source to be matched.
S2601, placing the atomic keywords in a preset operand stack, and matching the atomic keywords with a data list to be matched to obtain a first matching result.
In this embodiment, when the current character string is an atomic keyword, the atomic keyword is placed in a preset operand stack, and the atomic keyword is matched with a data list to be matched to obtain a first matching result.
S2602, determining that the current character string is an operation operator, placing the operation operator into a preset operator stack, determining a comparison result between a first priority of the operation operator and a second priority of a stack top operator in the preset operator stack according to a predefined matching rule, executing a pop operation on the operator with high priority according to the comparison result, executing a matching operation according to the operation rule with high priority and the first matching result to obtain a second matching result, continuing to read the next character string, taking the next character string as the current character string, returning to read the current character string in the target matching keyword, and judging whether the current character string is an atomic keyword or not until the character strings of the target matching keyword are completely read, so as to obtain a matching result of the engineering data source to be matched.
In this embodiment, when the current string is an atomic keyword, determining that the current string is an operation operator, placing the operation operator in a preset operator stack, determining a comparison result between a first priority of the operation operator and a second priority of a top operator in the preset operator stack according to a predefined matching rule, executing a pop operation on the operator with high priority according to the comparison result, executing a matching operation according to the operator operation rule with high priority and the first matching result to obtain a second matching result, continuing to read the next string, taking the next string as the current string, returning to the current string in the read target matching keyword, and judging whether the current string is the atomic keyword, until all the strings of the matching keyword are read, so as to obtain the matching result of the engineering data source to be matched.
In an embodiment, in order to better understand that the data list to be matched and the matching key words are subjected to fuzzy matching by using a stack matching mode, table seven shows the step of stack parsing expression, and specific contents are shown in table seven.
Table seven: step of stack parsing expression
S270, converting the prefix expression form in the character string corresponding to the matching keyword into the suffix expression form of the matching keyword.
In the embodiment, when fuzzy matching is performed by using a binary tree matching mode, firstly converting a prefix expression form in a character string corresponding to a matching keyword into a suffix expression form of the matching keyword, and specifically, reading a current unit in the prefix expression in the character string corresponding to the matching keyword; wherein the current unit includes: an atomic keyword or a matching operator, and directly outputting the atomic keyword under the condition that the current unit is the atomic keyword; and under the condition that the current unit is a matching operator, corresponding push and pop operations are executed according to the comparison of the priority of the current matching operator and the priority of the stack top operator so as to finally obtain the suffix expression form of the matching keyword.
In an embodiment, converting the prefix expression form in the character string corresponding to the matching keyword into the suffix expression form of the matching keyword includes:
reading a current unit in a prefix expression in the character string corresponding to the matching keyword; wherein the current unit includes: an atomic keyword or matching operator;
Under the condition that the current unit is an atomic keyword, directly outputting the atomic keyword;
Under the condition that the current unit is a matching operator, the matching operator is used as the current matching operator, and under the condition that the priority of the current matching operator is higher than that of the stack top operator, the current matching operator executes a push operation;
Under the condition that the priority of the current matching operator is smaller than that of the stack top operator, the stack top operator executes a pop operation, judges whether the stack top operator is a limit operator, and executes a push operation if the stack top operator is the limit operator;
And under the condition that the priority of the current matching operator is equal to the priority of the stack top operator, executing a stack pulling operation, reading a next unit, taking the next unit as the current unit, and returning to the operation of judging the current unit as an atomic keyword until the character strings of all the matching keywords are converted, so as to obtain a suffix expression form corresponding to the matching keywords.
In this embodiment, the current unit in the prefix expression in the character string corresponding to the matching keyword is read; wherein the current unit includes: an atomic keyword or matching operator; under the condition that the current unit is an atomic keyword, directly outputting the atomic keyword; under the condition that the current unit is a matching operator, the matching operator is used as the current matching operator, and under the condition that the priority of the current matching operator is higher than that of the stack top operator, the current matching operator executes a push operation; under the condition that the priority of the current matching operator is smaller than that of the stack top operator, the stack top operator executes a pop operation, judges whether the stack top operator is a limit operator, and executes a push operation if the stack top operator is the limit operator; and under the condition that the priority of the current matching operator is equal to the priority of the stack top operator, executing a stack pulling operation, reading a next unit, taking the next unit as the current unit, and returning to the operation of judging the current unit as an atomic keyword until the character strings of all the matching keywords are converted, so as to obtain a suffix expression form corresponding to the matching keywords.
S280, constructing a binary tree expression form of the matching keyword according to the suffix expression form and rule attributes in a predefined matching rule; wherein the rule attribute at least comprises a double-element operator, a single-element operator and a limit operator.
In the implementation, a binary tree expression form of the matching key word is constructed according to the suffix expression form and rule attributes in a predefined matching rule, specifically, a current unit in the suffix expression form is read, whether the current unit is an atomic key word is judged, a corresponding node is constructed according to whether the current unit is the atomic key word, and the binary tree expression form of the matching key word is obtained by combining the rule attributes in the predefined matching rule on the basis; wherein the rule attribute at least comprises a double-element operator, a single-element operator and a limit operator.
In an embodiment, constructing a binary tree expression form of the matching keyword according to the suffix expression form and rule attributes in a predefined matching rule, including:
reading a current unit in the suffix expression form, and judging whether the current unit is an atomic key word or not;
if the current unit is an atomic keyword, constructing a first node corresponding to the atomic keyword, and executing a stacking operation of the atomic keyword;
If the current unit is not an atomic key, determining that the current unit is a matching operator, using the matching operator as the current matching operator, constructing a second node corresponding to the current matching operator, continuing to judge the rule attribute of the current matching operator, if the rule attribute is a unitary operator, performing stack pulling once, and executing a stacking operation after constructing a binary tree left child node for a left child node fixedly; if the rule attribute is a binary operator, the two times of stack pulling are executed, the two times of stack pulling are reversely used as right sub-nodes according to the order of stack pulling, the stack pulling operation is executed after the right sub-nodes of the binary tree are constructed, the next unit is continuously read in, the next unit is used as the current unit, the step of judging whether the current unit is an atomic key word is returned until all units in the suffix expression form are read out, and the constructed binary tree expression is obtained.
In the embodiment, a unit is read in according to the constructed suffix expression, and if the unit is an atomic keyword, a node is constructed and stacked; constructing another node, if the node is a unitary operator, popping the stack once, fixing the node as a left child node, constructing a binary tree, and then pushing the node into the stack; if the node is a binary operator, the node is popped twice and is reversely used as a child node according to the popping order, and the constructed binary tree is popped; and so on to obtain the final expression binary tree. Illustratively, the step of constructing the binary tree of the expression in this embodiment is specifically shown in table eight. To facilitate understanding of the constructed binary tree expression, fig. 3 is a schematic diagram of the constructed binary tree expression according to an embodiment of the present invention.
Table eight: step of constructing expression binary tree
And S290, carrying out fuzzy matching on the data list to be matched and the binary tree expression form by adopting a subsequent traversal method and the operation priority of rule symbols in a predefined matching rule so as to obtain a matching result of the engineering data source to be matched.
In this embodiment, a subsequent traversing method and an operation priority of a rule symbol in a predefined matching rule are adopted to perform fuzzy matching on a data list to be matched and a binary tree expression form, so as to obtain a matching result of an engineering data source to be matched, specifically, if a child node is an atomic key, the matching of the atomic key to a matching pile list is performed first to obtain a first matching result, and the first matching result is subjected to matching operation according to the operation priority of the rule symbol in the predefined matching rule until the traversing is completed, which can be understood as that the matching key is converted into a tree, then after at least one matching pile in the data list to be matched is read in, the matching operation is performed layer by layer according to the structure of the binary tree and the operator calculation rule, and the result is recorded.
In one embodiment, the binary tree representation includes leaf nodes, intermediate nodes, and root nodes; correspondingly, fuzzy matching is carried out on the data list to be matched and the binary tree expression form by adopting a subsequent traversal mode and the operation priority of rule symbols in a predefined matching rule, and the fuzzy matching method comprises the following steps:
Searching leaf nodes in the binary tree expression form, and judging whether the leaf nodes are atomic keywords or not; wherein the intermediate node and the leaf node are respectively a left child node or a right child node; the matching priority of the left child node is higher than that of the right child node;
Under the condition that the leaf node is an atomic key, matching the leaf node with at least one matching pile in a data list to be matched to obtain a third matching result, returning the third matching result to an intermediate node corresponding to the leaf node in the binary tree expression form, executing matching of the intermediate node and the third matching result according to the operation priority of a rule symbol in a predefined matching rule corresponding to the intermediate node to obtain a fourth matching result, until traversing to a root node, and executing matching of the operation priority of the root node corresponding to the fourth matching result to obtain a fuzzy matching target matching result;
The intermediate node and the root node are in the form of operation operators, the operation operators are expressed as rule symbols, and each rule symbol corresponds to a corresponding operation priority level.
In this embodiment, the intermediate node and the leaf node may be a left child node or a right child node, respectively; the matching priority of the left child node is higher than that of the right child node, and it can be understood that the leaf nodes are divided into a left leaf node and a right leaf node; the middle nodes are divided into a left middle node and a right middle node; the matching priority of the left leaf node is higher than that of the right leaf node; the matching priority of the left intermediate node is higher than the matching priority of the right intermediate node.
In this embodiment, a leaf node in the binary tree expression is searched, and a subsequent traversal operation is performed from the leaf node; in this embodiment, the leaf nodes are divided into left and right leaf nodes; the middle nodes are divided into a left middle node and a right middle node; the matching priority of the left leaf node is higher than that of the right leaf node; the matching priority of the left middle node is higher than that of the right middle node; it can be understood that when at least one matching pile in the data list to be matched is matched with the constructed binary tree, the left leaf node is matched first, then the right leaf node is matched, and similarly, when the matching of the intermediate node is performed, the matching of the left intermediate node is performed first, and then the matching of the right intermediate node is performed. In the embodiment, searching for leaf nodes in the binary tree expression form, and judging whether the leaf nodes are atomic keywords or not; and under the condition that the leaf node is an atomic key, matching the leaf node with at least one matching pile in a data list to be matched to obtain a third matching result, returning the third matching result to an intermediate node corresponding to the leaf node in a binary tree expression form, executing the matching of the intermediate node and the third matching result according to the operation priority of a rule symbol in a predefined matching rule corresponding to the intermediate node to obtain a fourth matching result, traversing the fourth matching result to a root node, and executing the matching of the operation priority corresponding to the root node to obtain a fuzzy matching target matching result.
According to the technical scheme provided by the embodiment of the invention, the association relation among the atomic keywords is established, and the association relation is combined according to the pre-defined matching rule to obtain the matching keywords, so that the matching keywords can be flexibly expanded in a remembering way; performing stack analysis operation on the matching keywords and the data list to be matched by setting operation Fu Zhan and operand stacks and based on the current character strings in the matching keywords as atomic keywords or operation operators and the operation symbol priorities of rule symbols in a predefined matching rule; or converting the prefix expression form in the character string corresponding to the matching key word into the suffix expression form of the matching key word, constructing a binary tree expression form of the matching key word according to the suffix expression form and rule attributes in a predefined matching rule, and performing fuzzy matching on the data list to be matched and the binary tree expression form by adopting a subsequent traversing method and the operation priority of rule symbols in the predefined matching rule, so that the problem that the accuracy and the efficiency cannot be improved due to the difference in the engineering data identification process can be further solved, and the matching process does not need to frequently acquire the data list to be matched, so that fewer acquisition operations on the data list to be matched are realized, and the matching efficiency and accuracy are improved.
In an embodiment, in order to better understand that the stack matching method is used to perform the fuzzy matching of engineering data on the data list to be matched, fig. 4 is a flowchart of a method for performing the fuzzy matching of data by using the stack matching method according to an embodiment of the present invention, in this embodiment, the method for performing the fuzzy matching of data by using the stack matching method includes the following specific steps:
a1, acquiring engineering data sources to be matched, and forming a data list to be matched from the engineering data sources to be matched.
A2, configuring matching keywords according to the pre-constructed atomic keywords and the pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules.
A3, set operation Fu Zhan, and operand stack.
And a4, executing stack analysis operation on the matching key word and the data list to be matched based on the current character string in the matching key word as an atomic key word or an operation operator and the operation symbol priority of the rule symbol in the predefined matching rule.
Specifically, each time operation is popped, the operands are popped correspondingly, if the operands are atomic keywords, the atomic keywords are matched with the matching pile list to obtain a first matching result, the first matching result is subjected to matching priority operation of the operation operators to obtain a second matching result, and the second matching result is analyzed layer by layer until the analysis of the character strings in the matching keywords is completed to obtain a final matching result.
In an embodiment, in order to better understand that the binary tree matching method is adopted to perform the fuzzy matching of the engineering data on the data list to be matched, fig. 5 is a flowchart of a method for performing the fuzzy matching of the engineering data by adopting the binary tree matching method, in this embodiment, the method for performing the fuzzy matching of the engineering data by adopting the binary tree matching method specifically includes the following steps:
b1, acquiring engineering data sources to be matched, and forming a data list to be matched from the engineering data sources to be matched.
B2, configuring matching keywords according to the pre-constructed atomic keywords and the pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules.
And b3, converting the prefix expression form in the character string corresponding to the matching keyword into the suffix expression form of the matching keyword.
And b4, constructing a binary tree expression form of the matching keyword according to the suffix expression form and rule attributes in a predefined matching rule.
And b5, carrying out fuzzy matching on the data list to be matched and the binary tree expression form by adopting a subsequent traversal method and the operation priority of rule symbols in the predefined matching rule.
Specifically, if the child node is an atomic keyword, matching the atomic keyword with the matching pile list to obtain a first matching result, and executing matching operation on the first matching result according to the operation priority of the rule symbol in the predefined matching rule until the final matching result is obtained after the traversing is completed.
In this embodiment, it may be obtained through experiments, and the complexity of calculation is mlog (n) by adopting a stack matching mode and a binary tree matching mode, where m is the length of the matching pile, and n is the length of the matching keyword, that is, the number of the matching keywords and operators. It can be seen that the improved matching operation does not require frequent reading for the acquisition of the matching stub, and the matching method is abstracted into the processing of the expression, so that fuzzy matching based on the atomic keywords, the matching rule operation and the matching stub is more universal and efficient. The current digitized data source storage modes are on a server or a management platform, the access speed is slower, and the application of the improved fuzzy matching method greatly reduces the acquisition times of the matching piles, and meanwhile, the matching method is more efficient, so that the matching process is more effective.
In an embodiment, fig. 6 is a block diagram of a data fuzzy matching device according to an embodiment of the present invention, where the device is suitable for use in fuzzy matching of a data source of high-speed rail signal engineering data, and the device may be implemented by hardware/software. The data fuzzy matching processing method can be configured in the electronic equipment to realize the data fuzzy matching processing method in the embodiment of the invention.
As shown in fig. 6, the apparatus includes: a building module 610, a configuration module 620, and a matching module 630.
The construction module 610 is configured to obtain a project data source to be matched, and form a data list to be matched from the project data source to be matched;
A configuration module 620, configured to configure matching keywords according to pre-constructed atomic keywords and pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules;
The matching module 630 is configured to perform fuzzy matching on the to-be-matched data list and the matching keyword based on a pattern matching manner of the character string, so as to obtain a matching result of the to-be-matched engineering data source; the pattern matching mode of the character string comprises the following steps: stack matching and binary tree matching.
According to the embodiment of the invention, the configuration module configures the matching key words through the pre-constructed atomic key words and the pre-defined matching rules, so that the matching key words can be flexibly expanded in remembering; on the basis, the matching module forms the engineering data source to be matched into a data list to be matched based on a stack matching mode and a binary tree matching mode, and the configured matching keywords are subjected to fuzzy matching so as to obtain a matching result of the engineering data source to be matched, so that the problem that the accuracy and the efficiency cannot be improved due to the difference in the engineering data identification process can be solved, the data list to be matched does not need to be frequently acquired in the matching process, fewer acquisition operations of the data list to be matched are realized, and the matching efficiency and accuracy are improved.
In one embodiment, the building block 610 includes:
the reading unit is used for reading the list head corresponding to the engineering data source to be matched; the list head comprises at least one matching pile, each matching pile is provided with a grade, and each grade corresponds to corresponding grade information;
And the composition unit is used for composing the at least two matching piles into the data list to be matched.
In an embodiment, the pre-building of the atomic key comprises: analyzing signal characteristics contained in the engineering data source to be matched, and constructing an atomic keyword according to the analysis result; accordingly, the configuration module 620 includes:
a relationship establishing unit, configured to establish an association relationship between the atomic keywords;
the combination unit is used for combining the association relations according to the predefined matching rules to obtain the matching keywords; the rule name, the rule symbol, the rule attribute and the rule meaning are included in the predefined definition of the matching rule, and the corresponding operation symbol priority is predefined between the rule symbols.
In an embodiment, the pattern matching mode of the character string is a stack matching mode; accordingly, the matching module 630 includes:
the preprocessing unit is used for carrying out data preprocessing operation on the matching keywords to obtain target matching keywords after preprocessing operation; wherein the preprocessing operation at least comprises: sequentially processing rule symbols in the matching keywords;
the matching unit is used for executing stack analysis operation on the target matching key word and the data list to be matched according to the type of the character string analyzed by the target matching key word, the operation symbol priority of a rule symbol in a predefined matching rule, a preset operator stack and a preset operand stack, so as to be used as a fuzzy matching process; wherein the string type includes an atomic key or an operation operator.
In an embodiment, the matching unit comprises:
the reading subunit is used for reading the current character string in the target matching keyword and judging whether the current character string is an atomic keyword or not;
the first matching subunit is used for placing the atomic keywords into a preset operand stack if the current character string is the atomic keywords, and matching the atomic keywords with the data list to be matched to obtain a first matching result;
And the second matching subunit is used for determining that the current character string is an operation operator if the current character string is not an atomic keyword, placing the operation operator into a preset operator stack, determining a comparison result between a first priority of the operation operator and a second priority of a stack top operator in the preset operator stack according to the predefined matching rule, executing a pop operation on the operator with high priority according to the comparison result, executing a matching operation with the first matching result according to the operator operation rule with high priority to obtain a second matching result, continuing to read the next character string, taking the next character string as the current character string, and returning to the step of judging whether the current character string is the atomic keyword or not until the character string of the target matching keyword is completely read.
In an embodiment, the pattern matching mode of the character string is a binary tree matching mode; correspondingly, the matching module 630 further includes:
the conversion unit is used for converting the prefix expression form in the character string corresponding to the matching keyword into the suffix expression form of the matching keyword;
The construction unit is used for constructing a binary tree expression form of the matching keyword according to the suffix expression form and rule attributes in the predefined matching rule; wherein the rule attribute at least comprises a binary operator, a unary operator and a limit operator;
And the matching unit is used for carrying out fuzzy matching on the data list to be matched and the binary tree expression form by adopting a subsequent traversing method and the operation priority of rule symbols in the predefined matching rule.
In an embodiment, the conversion unit comprises:
A reading subunit, configured to read a current unit in a prefix expression in the character string corresponding to the matching keyword; wherein the current unit includes: an atomic keyword or matching operator;
the first output subunit is configured to directly output an atomic keyword if the current unit is the atomic keyword;
The second output subunit is configured to take the matching operator as a current matching operator if the current unit is the matching operator, and execute a push operation if the priority of the current matching operator is greater than the priority of the top of stack operator;
Under the condition that the priority of the current matching operator is smaller than that of a stack top operator, the stack top operator executes a pop operation, judges whether the stack top operator is a limit operator, and executes a push operation if the stack top operator is the limit operator;
And under the condition that the priority of the current matching operator is equal to the priority of the stack top operator, executing a stack pulling operation, reading a next unit, taking the next unit as the current unit, and returning to the operation of judging that the current unit is an atomic keyword until the character strings of all the matching keywords are converted, so as to obtain a suffix expression form corresponding to the matching keywords.
In an embodiment, the building unit comprises:
a reading subunit, configured to read a current unit in the suffix expression form, and determine whether the current unit is an atomic keyword;
The first construction subunit is configured to construct a first node corresponding to an atomic keyword if the current unit is the atomic keyword, and execute a stacking operation of the atomic keyword;
A second construction subunit, configured to determine that the current unit is a matching operator if the current unit is not an atomic key, construct a second node corresponding to the current matching operator by using the matching operator as the current matching operator, and continuously determine a rule attribute of the current matching operator, and if the rule attribute is a unary operator, execute a pop operation once, and execute a push operation after fixing the first node to construct a left child node of a binary tree for a left child node; if the rule attribute is a binary operator, performing stack pulling twice, reversely serving as a right child node according to a stack pulling order, performing a stack pulling operation after constructing a binary tree right child node, continuously reading in a next unit, taking the next unit as the current unit, and returning to the step of judging whether the current unit is an atomic key word or not until all units in the suffix expression form are read, so as to obtain a constructed binary tree expression.
In one embodiment, the binary tree expression includes leaf nodes, intermediate nodes, and root nodes; correspondingly, the matching unit comprises:
A searching subunit, configured to search for a leaf node in the binary tree expression form, and execute a subsequent traversal operation from the leaf node; wherein the intermediate node and the leaf node are respectively a left child node or a right child node; the matching priority of the left child node is higher than that of the right child node;
A matching subunit, configured to, when the leaf node is an atomic key, match the leaf node with at least one matching pile in the to-be-matched data list to obtain a third matching result, return the third matching result to an intermediate node corresponding to the leaf node in the binary tree expression form, and execute matching between the intermediate node and the third matching result according to an operation priority of a rule symbol in a predefined matching rule corresponding to the intermediate node to obtain a fourth matching result, until traversing to a root node, and execute matching between the fourth matching result and the operation priority corresponding to the root node to obtain a fuzzy matching target matching result;
The intermediate node and the root node are in the form of operation operators, the operation operators are expressed as rule symbols, and each rule symbol corresponds to a corresponding operation priority level.
The data fuzzy matching device provided by the embodiment of the invention can execute the data fuzzy matching method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
In an embodiment, fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the data fuzzy matching method.
In some embodiments, the data fuzzy matching processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the data fuzzy matching method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the data fuzzy matching method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data fuzzy matching apparatus, such that the computer programs, when executed by the processor, cause the functions/operations specified in the flowchart and/or block diagram to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.
Claims (9)
1. A method for fuzzy matching of data, comprising:
Acquiring engineering data sources to be matched, and forming a data list to be matched from the engineering data sources to be matched;
Configuring matching keywords according to pre-constructed atomic keywords and pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules;
fuzzy matching is carried out on the data list to be matched and the matching keywords based on a mode matching mode of the character strings, so that a matching result of the engineering data source to be matched is obtained; the pattern matching mode of the character string comprises the following steps: a stack matching mode and a binary tree matching mode;
the pattern matching mode of the character strings is a binary tree matching mode; correspondingly, the fuzzy matching of the data list to be matched and the matching keywords by the character string-based pattern matching mode comprises the following steps:
Converting the prefix expression form of the character string corresponding to the matching key word into the suffix expression form of the matching key word;
constructing a binary tree expression form of the matching keyword according to the suffix expression form and rule attributes in the predefined matching rule; wherein the rule attribute at least comprises a binary operator, a unary operator and a limit operator;
Fuzzy matching is carried out on the data list to be matched and the binary tree expression form by adopting a subsequent traversing method and the operation priority of rule symbols in the predefined matching rule;
the converting the prefix expression form of the character string corresponding to the matching keyword into the suffix expression form of the matching keyword includes:
Reading a current unit in a prefix expression in the character string corresponding to the matching keyword; wherein the current unit includes: an atomic keyword or matching operator;
If the current unit is an atomic keyword, directly outputting the atomic keyword;
taking the matching operator as a current matching operator under the condition that the current unit is the matching operator, and executing a push operation by the current matching operator under the condition that the priority of the current matching operator is higher than that of a stack top operator;
Under the condition that the priority of the current matching operator is smaller than that of a stack top operator, the stack top operator executes a pop operation, judges whether the stack top operator is a limit operator, and executes a push operation if the stack top operator is the limit operator;
Under the condition that the priority of the current matching operator is equal to the priority of the stack top operator, performing a stack pulling operation, reading a next unit, taking the next unit as the current unit, and returning to the operation of judging that the current unit is an atomic keyword until the character strings of all the matching keywords are converted, so as to obtain a suffix expression form corresponding to the matching keywords;
The binary tree expression form comprises leaf nodes, intermediate nodes and root nodes; correspondingly, fuzzy matching is carried out on the data list to be matched and the binary tree expression form by adopting a subsequent traversal mode and the operation priority of rule symbols in the predefined matching rule, and the fuzzy matching comprises the following steps:
Searching leaf nodes in the binary tree expression form, and judging whether the leaf nodes are atomic keywords or not; wherein the intermediate node and the leaf node are respectively a left child node or a right child node; the matching priority of the left child node is higher than that of the right child node;
Under the condition that the leaf node is an atomic key, matching the leaf node with at least one matching pile in the data list to be matched to obtain a third matching result, returning the third matching result to an intermediate node corresponding to the leaf node in the binary tree expression form, executing matching of the intermediate node and the third matching result according to the operation priority of a rule symbol in a pre-defined matching rule corresponding to the intermediate node to obtain a fourth matching result until traversing to a root node, and executing matching of the operation priority corresponding to the root node to obtain a fuzzy matching target matching result;
The intermediate node and the root node are in the form of operation operators, the operation operators are expressed as rule symbols, and each rule symbol corresponds to a corresponding operation priority level.
2. The method of claim 1, wherein the composing the engineering data source to be matched into a data list to be matched comprises:
reading a list head corresponding to the engineering data source to be matched; the list head comprises at least one matching pile, each matching pile is provided with a grade, and each grade corresponds to corresponding grade information;
And forming the at least one matching pile into the data list to be matched.
3. The method of claim 1, wherein the pre-construction of the atomic keywords comprises: analyzing signal characteristics contained in the engineering data source to be matched, and constructing an atomic keyword according to an analysis result; correspondingly, the configuring the matching keywords according to the pre-constructed atomic keywords and the pre-defined matching rules comprises the following steps:
Establishing an association relation between the atom keywords;
combining the association relations according to the predefined matching rules to obtain the matching keywords; the rule name, the rule symbol, the rule attribute and the rule meaning are included in the predefined definition of the matching rule, and the corresponding operation symbol priority is predefined between the rule symbols.
4. The method of claim 1, wherein the pattern matching of the character strings is a stack matching; correspondingly, the fuzzy matching of the data list to be matched and the matching keywords by the character string-based pattern matching mode comprises the following steps:
Performing data preprocessing operation on the matching keywords to obtain target matching keywords after preprocessing operation; wherein the preprocessing operation at least comprises: sequentially processing rule symbols in the matching keywords;
according to the character string type analyzed by the target matching key word, the operation symbol priority of rule symbols in a predefined matching rule, a preset operator stack and a preset operand stack, performing stack analysis operation on the target matching key word and the data list to be matched, and using the stack analysis operation as a fuzzy matching process; wherein the string type includes an atomic key or an operation operator.
5. The method according to claim 4, wherein the performing stack parsing operation on the target matching key and the data list to be matched according to the type of the character string parsed by the target matching key, the operation symbol priority of the rule symbol in the predefined matching rule, the preset operator stack, and the preset operand stack includes:
Reading a current character string analyzed by the target matching key word, and judging whether the current character string is an atomic key word or not;
if the current character string is an atomic keyword, placing the atomic keyword into a preset operand stack, and matching the atomic keyword with the data list to be matched to obtain a first matching result;
If the current character string is not an atomic key word, determining that the current character string is an operation operator, putting the operation operator into a preset operator stack, determining a comparison result between a first priority of the operation operator and a second priority of a stack top operator in the preset operator stack according to the predefined matching rule, executing a pop operation on the operator with high priority according to the comparison result, executing a matching operation on the operator with high priority and the first matching result according to the operation rule of the operator with high priority to obtain a second matching result, continuing to read a next character string, taking the next character string as the current character string, and returning to the step of judging whether the current character string is the atomic key word or not until all the character strings analyzed by the target matching key word are read.
6. The method of claim 1, wherein said constructing a binary tree representation of the matching key from the suffix expression form and rule attributes in the predefined matching rule comprises:
Reading a current unit in the suffix expression form, and judging whether the current unit is an atomic keyword or not;
if the current unit is an atomic key word, constructing a first node corresponding to the atomic key word, and executing a stacking operation of the atomic key word;
If the current unit is not an atomic key, determining that the current unit is a matching operator, taking the matching operator as a current matching operator, constructing a second node corresponding to the current matching operator, continuously judging the rule attribute of the current matching operator, if the rule attribute is a unitary operator, executing stack-stripping once, fixing the rule attribute to a left child node, constructing a binary tree left child node, and executing stack-feeding operation; if the rule attribute is a binary operator, performing stack pulling twice, reversely serving as a right child node according to a stack pulling order, performing a stack pulling operation after constructing a binary tree right child node, continuously reading in a next unit, taking the next unit as the current unit, and returning to the step of judging whether the current unit is an atomic keyword or not until all units in the suffix expression form are read, so as to obtain a constructed binary tree expression.
7. A data fuzzy matching device, comprising:
The construction module is used for acquiring engineering data sources to be matched and forming a data list to be matched from the engineering data sources to be matched;
The configuration module is used for configuring the matching keywords according to the pre-constructed atomic keywords and the pre-defined matching rules; the matching keywords are expression forms composed of the atomic keywords and rule operators contained in the matching rules;
The matching module is used for carrying out fuzzy matching on the data list to be matched and the matching keywords based on a mode matching mode of the character strings so as to obtain a matching result of the engineering data source to be matched; the pattern matching mode of the character string comprises the following steps: a stack matching mode and a binary tree matching mode;
The pattern matching mode of the character strings is a binary tree matching mode; correspondingly, the matching module comprises:
the conversion unit is used for converting the prefix expression form in the character string corresponding to the matching keyword into the suffix expression form of the matching keyword;
The construction unit is used for constructing a binary tree expression form of the matching keyword according to the suffix expression form and rule attributes in the predefined matching rule; wherein the rule attribute at least comprises a binary operator, a unary operator and a limit operator;
the matching unit is used for carrying out fuzzy matching on the data list to be matched and the binary tree expression form by adopting a subsequent traversal method and the operation priority of rule symbols in the predefined matching rule;
Wherein the conversion unit includes:
A reading subunit, configured to read a current unit in a prefix expression in the character string corresponding to the matching keyword; wherein the current unit includes: an atomic keyword or matching operator;
the first output subunit is configured to directly output an atomic keyword if the current unit is the atomic keyword;
The second output subunit is configured to take the matching operator as a current matching operator if the current unit is the matching operator, and execute a push operation if the priority of the current matching operator is greater than the priority of the top of stack operator;
Under the condition that the priority of the current matching operator is smaller than that of a stack top operator, the stack top operator executes a pop operation, judges whether the stack top operator is a limit operator, and executes a push operation if the stack top operator is the limit operator;
Under the condition that the priority of the current matching operator is equal to the priority of the stack top operator, performing a stack pulling operation, reading a next unit, taking the next unit as the current unit, and returning to the operation of judging that the current unit is an atomic keyword until the character strings of all the matching keywords are converted, so as to obtain a suffix expression form corresponding to the matching keywords;
The binary tree expression form comprises leaf nodes, intermediate nodes and root nodes; correspondingly, the matching unit comprises:
A searching subunit, configured to search for a leaf node in the binary tree expression form, and execute a subsequent traversal operation from the leaf node; wherein the intermediate node and the leaf node are respectively a left child node or a right child node; the matching priority of the left child node is higher than that of the right child node;
A matching subunit, configured to, when the leaf node is an atomic key, match the leaf node with at least one matching pile in the to-be-matched data list to obtain a third matching result, return the third matching result to an intermediate node corresponding to the leaf node in the binary tree expression form, and execute matching between the intermediate node and the third matching result according to an operation priority of a rule symbol in a predefined matching rule corresponding to the intermediate node to obtain a fourth matching result, until traversing to a root node, and execute matching between the fourth matching result and the operation priority corresponding to the root node to obtain a fuzzy matching target matching result;
The intermediate node and the root node are in the form of operation operators, the operation operators are expressed as rule symbols, and each rule symbol corresponds to a corresponding operation priority level.
8. An electronic device, the electronic device comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data fuzzy matching method of any of claims 1-6.
9. A computer readable storage medium storing computer instructions for causing a processor to implement the data fuzzy matching method of any one of claims 1-6 when executed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410275647.9A CN117874088B (en) | 2024-03-12 | 2024-03-12 | Data fuzzy matching method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410275647.9A CN117874088B (en) | 2024-03-12 | 2024-03-12 | Data fuzzy matching method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117874088A CN117874088A (en) | 2024-04-12 |
CN117874088B true CN117874088B (en) | 2024-05-17 |
Family
ID=90579584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410275647.9A Active CN117874088B (en) | 2024-03-12 | 2024-03-12 | Data fuzzy matching method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117874088B (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112508440A (en) * | 2020-12-18 | 2021-03-16 | 深圳市赛为智能股份有限公司 | Data quality evaluation method and device, computer equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090024366A1 (en) * | 2007-07-18 | 2009-01-22 | Microsoft Corporation | Computerized progressive parsing of mathematical expressions |
-
2024
- 2024-03-12 CN CN202410275647.9A patent/CN117874088B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112508440A (en) * | 2020-12-18 | 2021-03-16 | 深圳市赛为智能股份有限公司 | Data quality evaluation method and device, computer equipment and storage medium |
Non-Patent Citations (5)
Title |
---|
基于二叉树的将中缀表达式转换为前缀表达式的方法;胡云;成都大学学报(自然科学版);20120930(第3期);全文 * |
基于算符优先算法的逻辑表达式合法性判断;吴小钧;小型微型计算机系统;20021021(第10期);全文 * |
基于调度场算法实现长逻辑表达式解析与创建;张文晓;信息与电脑(理论版);20200325(第6期);全文 * |
栈在表达式求值中的应用;李橙;电脑知识与技术;20141205(第34期);全文 * |
浮点型数据算术表达式求值算法研究与实现;杨爱丽;电脑知识与技术;20180605(第16期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117874088A (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111460083A (en) | Document title tree construction method and device, electronic equipment and storage medium | |
CN113590796B (en) | Training method and device for ranking model and electronic equipment | |
CN113657100B (en) | Entity identification method, entity identification device, electronic equipment and storage medium | |
CN112115232A (en) | Data error correction method and device and server | |
CN113033194B (en) | Training method, device, equipment and storage medium for semantic representation graph model | |
CN112559717B (en) | Search matching method, device, electronic equipment and storage medium | |
CN113033204A (en) | Information entity extraction method and device, electronic equipment and storage medium | |
CN114495143A (en) | Text object identification method and device, electronic equipment and storage medium | |
CN112989235A (en) | Knowledge base-based internal link construction method, device, equipment and storage medium | |
CN112560425B (en) | Template generation method and device, electronic equipment and storage medium | |
CN113836316A (en) | Processing method, training method, device, equipment and medium for ternary group data | |
CN112948573A (en) | Text label extraction method, device, equipment and computer storage medium | |
CN110750632B (en) | Improved Chinese ALICE intelligent question-answering method and system | |
CN117874088B (en) | Data fuzzy matching method, device, equipment and medium | |
US20240221727A1 (en) | Voice recognition model training method, voice recognition method, electronic device, and storage medium | |
CN115936018A (en) | Method and device for translating terms, electronic equipment and storage medium | |
CN113553833B (en) | Text error correction method and device and electronic equipment | |
CN115604115A (en) | Configuration information analysis method and device, electronic equipment and storage medium | |
CN112560466B (en) | Link entity association method, device, electronic equipment and storage medium | |
CN114970531A (en) | Intention identification and named entity extraction method and device based on instant messaging message | |
CN114417862A (en) | Text matching method, and training method and device of text matching model | |
CN112528600A (en) | Text data processing method, related device and computer program product | |
CN112818167A (en) | Entity retrieval method, entity retrieval device, electronic equipment and computer-readable storage medium | |
CN117874308B (en) | Train control data acquisition method and device, electronic equipment and storage medium | |
CN113656467B (en) | Method and device for sorting search results and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |