CN115796166B - Regular expression testing method and system for intelligent logistics control system - Google Patents

Regular expression testing method and system for intelligent logistics control system Download PDF

Info

Publication number
CN115796166B
CN115796166B CN202310063491.3A CN202310063491A CN115796166B CN 115796166 B CN115796166 B CN 115796166B CN 202310063491 A CN202310063491 A CN 202310063491A CN 115796166 B CN115796166 B CN 115796166B
Authority
CN
China
Prior art keywords
character
character string
node
string set
strings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310063491.3A
Other languages
Chinese (zh)
Other versions
CN115796166A (en
Inventor
郑黎晓
陈祖希
骆翔宇
周长利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202310063491.3A priority Critical patent/CN115796166B/en
Publication of CN115796166A publication Critical patent/CN115796166A/en
Application granted granted Critical
Publication of CN115796166B publication Critical patent/CN115796166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a regular expression test method and a regular expression test system for an intelligent logistics control system, and belongs to the technical field of electronics. According to the method, after meta characters are obtained based on given regular expression recognition, the priority relation of the meta characters is determined, a priority table is constructed based on the priority relation, then the regular expressions are converted into regular expressions in a suffix form based on the priority table to construct an abstract syntax tree, then the abstract syntax tree is traversed, a character string set is created for the accessed nodes in the traversing process, the character string set corresponding to the root nodes of the abstract syntax tree in the character string set is extracted, and the test character strings are obtained through arrangement, finally, based on the test of the test character string regular expressions, the accuracy of regular expression test can be improved while the description efficiency of the regular expressions is improved, and further support can be provided for effective and reliable application of the intelligent logistics robot control system.

Description

Regular expression testing method and system for intelligent logistics control system
Technical Field
The invention relates to the technical field of electronics, in particular to a regular expression test method and system of an intelligent logistics control system.
Background
Regular expressions are widely used in the intelligent logistics robot control system to describe mode rules of information such as express bill numbers, mailing addresses, telephone numbers and the like, and further intelligent recognition, automatic matching and the like of related information are achieved. However, since regular expression grammar is compact and uses a large number of meta-characters, it results in developer being prone to error in definition and use. Even for very short regular expressions, it is difficult to quickly read and understand their semantics. Erroneous regular expressions can cause defects in the logistics control system in which they are used, and can produce unpredictable behavior and results after deployment. Therefore, ensuring the correctness of regular expressions is an important precondition for their reliable application.
Disclosure of Invention
The invention aims to provide a regular expression testing method and system of an intelligent logistics control system, which can improve the accuracy of regular expression testing while improving the description efficiency of the regular expression.
In order to achieve the above object, the present invention provides the following solutions:
a regular expression test method of an intelligent logistics control system comprises the following steps:
step 1, preprocessing a given regular expression, and identifying to obtain meta characters;
Step 2, determining the priority relation of the meta characters, and constructing a priority table based on the priority relation;
step 3, converting the regular expression into a regular expression in a suffix form by adopting a meta character stack based on the priority table;
step 4, constructing an abstract syntax tree based on the regular expression in the suffix form;
step 5, traversing the abstract syntax tree, and creating a character string set for the accessed node in the traversing process;
step 6, extracting a character string set corresponding to the abstract syntax tree root node in the character string set, and sorting to obtain a test character string;
and 7, completing the test of the given regular expression based on the test character string.
Preferably, the step 3 specifically includes:
step 3.1, defining a meta character stack, and initializing the meta character stack;
step 3.2, setting a suffix expression E' to be initially empty, selecting a symbol which does not appear in a given regular expression, stacking the symbol, and enabling the priority of the symbol to be lowest;
step 3.3, scanning characters in a given regular expression from left to right, and setting the currently scanned characters as ch; if the character ch is a normal character, connecting the character ch into the suffix expression E'; if the character ch is a meta character, the following operations are performed:
If the character ch is a left bracket, pressing the character ch into the meta-character stack;
if the character ch is a right bracket, setting the current stack top element of the meta-character stack as a left bracket matched with the character ch according to a bracket pairing rule so as to pop up the left bracket from the meta-character stack for bracket removing operation;
if the character ch is an operator and the priority of the character ch is higher than the priority of the last entered element in the current meta-character stack, pushing the character ch into the meta-character stack;
if the character ch is an operator and the priority of the character ch is lower than the priority of the last element entering in the current meta-character stack, continuously popping the operator with higher priority than the character ch from the meta-character stack, sequentially connecting the operator with the suffix expression E' according to the pop-up sequence, and pushing the character ch into the meta-character stack; the common characters comprise numbers and letters; the meta-character is an operator for describing different operations on the character string;
and 3.4, repeating the steps 3.2 to 3.3 until all characters in the given regular expression are scanned, and obtaining the regular expression in the suffix form.
Preferably, the step 4 specifically includes:
step 4.1, defining a node stack and initializing to be empty;
Step 4.2, scanning characters in the regular expression in the suffix form from left to right, and setting the currently scanned characters as ch';
step 4.3, if the character ch 'is a common character, constructing the character ch' into a leaf node, and pushing the leaf node into the node stack;
step 4.4, if the character ch' is a meta character, executing the following operations:
if the character ch ' is a binary operator, constructing the character ch ' into a non-leaf node, storing an operand adjacent to the character ch ' in a regular expression in a suffix form in a node stack, continuously popping up two nodes from the node stack to serve as a right child and a left child of the character ch ', and pushing the non-leaf node constructed by the character ch ' into the node stack;
if the character ch ' is a unitary operator, constructing the character ch ' into a non-leaf node, simultaneously ejecting a node from the node stack as a left child of the character ch ', constructing a right child of the character ch ' into an empty node, and pushing the non-leaf node constructed by the character ch ' into the node stack;
and (4) repeating the steps 4.2 to 4.4 until all characters in the regular expression in the form of the suffix are scanned, and taking the stack top element in the node stack as a tree root node of the abstract syntax tree.
Preferably, the step 5 specifically includes:
when traversing the abstract syntax tree, if a node n in the abstract syntax tree which is accessed currently is an empty node, constructing a character string set S (n) as S (n) = { epsilon }, wherein epsilon is an empty character string;
if the node n in the currently accessed abstract syntax tree is a leaf node, constructing a character string set S (n) as S (n) = { ch '}, wherein ch' is a character of the node n;
if the node n in the abstract syntax tree accessed currently is a non-leaf node, constructing a character string set according to the character type of the character theta in the node n.
Preferably, the construction of the character string set is performed according to the character type of the character θ in the node n, and specifically includes:
if the character θ in the node n is a selection operator, constructing a character string set S (n) as S (n) =s (nl)/(nr); nl is the left child of node n, nr is the right child of node n, the set of strings corresponding to the left child of node n is S (nl), and the set of strings corresponding to the right child of node n is S (nr);
if the character θ in the node n is a connection operator, the character string set S (n) is constructed by connecting the character strings in the character string set S (nl) corresponding to the left child of the node n and the character string set S (nr) corresponding to the right child of the node n;
If the character θ in the node n is zero-order or one-time operator, the character string set S (n) is constructed by all the character strings and the null strings corresponding to the left child node as follows: s (n) =s (nl)/(ε);
if the character theta in the node n is any repeated operator, constructing a character string set S (n) according to repeated operation on the character string corresponding to the left child node;
if the character θ in the node n is one or more repeated operators, a character string set S (n) is constructed according to repeated operations on the character string corresponding to the left child node.
Preferably, the character string set S (n) is constructed by connecting the character strings in the character string set S (nl) corresponding to the left child of the node n and the character string set S (nr) corresponding to the right child of the node n, and the specific steps are as follows:
sequencing the character strings in the character string set S (nr) according to the length;
connecting a first character string in the character string set S (nl) with a first character string in the character string set S (nr) according to the arranged order, adding the obtained new character string into the character string set S (n '), connecting a second character string in the character string set S (nl) with a second character string in the character string set S (nr), adding the obtained new character string into the character string set S (n'), and so on until the character string in the character string set S (nl) or the character string set S (nr) is processed;
If the number of the character strings in the character string set S (nl) is equal to the number of the character strings in the character string set S (nr), the character string set S (n') formed by connecting the character string set S (nl) and the character string set S (nr) is used as a character string set S (n);
if the number of the character strings in the character string set S (nl) is larger than the number of the character strings in the character string set S (nr), arbitrarily selecting one character string from the character string set S (nr), respectively connecting the character strings to unprocessed character strings in the character string set S (nl), and adding new character strings obtained by connection into the set S (n') to obtain a character string set S (n);
if the number of the character strings in the character string set S (nl) is smaller than the number of the character strings in the character string set S (nr), arbitrarily selecting one character string from the character string set S (nl), connecting the character string with unprocessed character strings in the character string set S (nr), and adding new character strings obtained by connection into the set S (n') to obtain the character string set S (n).
Preferably, if the character θ in the node n is an arbitrary repeated operator, the character string set S (n) is constructed according to repeated operations on the character string corresponding to the left child node, and specifically includes:
adding the blank string epsilon into the initial character string set to obtain a first character string set;
Adding each character string in the character string set S (nl) into the first character string set to obtain a second character string set;
r backups are carried out on the character strings in the character string set S (nl), and r backup character strings are obtained;
then, sorting the backup character strings according to the lengths respectively;
and connecting the corresponding character strings in the r sequenced backup character strings, and adding the connected character strings into the second character string set to obtain a character string set S (n).
Preferably, if the character θ in the node n is one or more repeated operators, a string set S (n) is constructed according to repeated operations on the strings corresponding to the left child node, and specifically includes:
adding each character string in the character string set S (nl) into the initial character string set to obtain a first character string set;
r backups are carried out on the character strings in the character string set S (nl), and r backup character strings are obtained;
then, sorting the backup character strings according to the lengths respectively;
and connecting corresponding character strings in the r sequenced backup character strings, and adding the connected character strings into the first character string set to obtain a character string set S (n).
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
according to the regular expression testing method of the intelligent logistics control system, after meta characters are obtained based on the given regular expression recognition, the priority relation of the meta characters is determined, and a priority table is constructed based on the priority relation; then, based on the priority table, converting the regular expression into a regular expression in a suffix form to construct an abstract syntax tree; then traversing the abstract syntax tree, creating a character string set for the accessed nodes in the traversing process, extracting the character string set corresponding to the root node of the abstract syntax tree from the character string set, and sorting to obtain a test character string; finally, based on the test of the regular expression of the test character string, the accuracy of the regular expression test can be improved while the description efficiency of the regular expression is improved, so that developers or testers can be helped to find possible errors in the definition of the regular expression describing information such as logistics single number, mailing address, telephone number and the like, and support is provided for effective and reliable application of the intelligent logistics robot control system.
Corresponding to the regular expression test method of the intelligent logistics control system, the invention also provides the following two implementation systems:
one of them is the regular expression test system of wisdom commodity circulation control system, and this system includes:
the preprocessing module is used for preprocessing a given regular expression and identifying to obtain meta characters;
the priority table construction module is used for determining the priority relation of the meta characters and constructing a priority table based on the priority relation;
the suffix form conversion module is used for converting the regular expression into a regular expression in a suffix form by adopting a meta character stack based on the priority table;
the grammar tree construction module is used for constructing an abstract grammar tree based on the regular expression in the suffix form;
the character string set construction module is used for traversing the abstract syntax tree and creating a character string set for the accessed node in the traversing process;
the test character string extraction module is used for extracting a character string set corresponding to the abstract syntax tree root node in the character string set, and sorting the character string set to obtain a test character string;
and the test module is used for completing the test of the given regular expression based on the test character string.
Another is an electronic device, comprising: a processor and a memory;
the processor is connected with the memory; the memory stores a computer software program; the computer software program is used for implementing the regular expression test method of the intelligent logistics control system provided by the above; the processor is configured to retrieve and execute the computer software program stored in the memory.
The technical effects achieved by the two hardware systems provided by the invention are the same as those achieved by the regular expression testing method of the intelligent logistics control system provided by the invention, so that the description is omitted here.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a regular expression test method of an intelligent logistics control system provided by the invention;
FIG. 2 is a flowchart of generating a regular expression in suffix form provided by an embodiment of the present invention;
FIG. 3 is a flow chart of the abstract syntax tree construction provided by the embodiment of the invention;
fig. 4 is a flowchart of generating a character string set according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a regular expression testing method and system of an intelligent logistics control system, which can improve the accuracy of regular expression testing while improving the description efficiency of the regular expression.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in FIG. 1, the regular expression testing method of the intelligent logistics control system provided by the invention comprises the following steps:
Step 1, preprocessing a regular expression, identifying common characters and meta characters in the expression, and determining a priority relation between the meta characters. Specific:
step 1.1, scanning the regular expression from left to right, and identifying common characters appearing in character strings defined or matched by the regular expression, wherein the common characters generally comprise numbers (such as 0,1,2,3, the..9, and the like), lower case letters (such as a, B, C, D, the..z, and the like), upper case letters (such as A, B, C, D, the..Z, and the like), and common Chinese characters and the like.
And 1.2, scanning the regular expression from left to right, and identifying meta characters with special meanings in the expression. Meta-characters generally include operators that describe different operations on a string, such as join operations, parallel operations, any number of repeated operations, one or more repeated operations, zero or one repeated operation, and helper character brackets, etc. Wherein the join and sum operation is a binary operator, and zero or more repetitions, one or more repetitions, and any repetition operation is a unitary operator. A description of several common meta-characters is shown in table 1.
Table 1 introduction of common meta-characters first correspondence table
Figure SMS_1
Step 1.3, determining a priority relation among meta characters, which specifically comprises the following steps: the auxiliary symbol brackets have the highest priority, the repeat operator includes zero or one repeat, one or more repeats, and any number of repeated priorities, the connection operation has the priority that follows the repeat operation, and the operation has the lowest priority. The priority relationship between two adjacent meta-characters θ1 and θ2 in the order of calculation of the expression from left to right is shown in table 2, where a row represents θ1 and a column represents θ2.
Table 2 meta-character priority table
Figure SMS_2
And 2, converting the preprocessed regular expression into a regular expression in a suffix form by using a meta character stack. As shown in fig. 2, the implementation process of this step may be:
step 2.1, defining a meta-character stack, initializing, namely initializing a regular expression in a suffix form to be E', entering the meta-character stack by using a special auxiliary meta-character, and enabling the priority of the meta-character to be the lowest.
Step 2.2, scanning characters in the regular expression from left to right, setting the currently scanned characters as ch, and carrying out the following processing according to different types of ch:
if ch is a normal character, then ch is connected into the suffix expression E'.
If ch is left bracket of the auxiliary sign, then ch is pushed into the meta stack.
If ch is an auxiliary sign right bracket, an element is popped from the meta-character stack, and according to the construction, the popped stack top element is a left bracket matched with ch, and the bracket removing operation is carried out.
If ch is an operator and the priority of the ch is higher than that of the stack top element in the current meta-character stack, pushing the ch into the meta-character stack.
If ch is an operator and the priority of the operator is lower than that of the current operator stack top element, continuously popping the operator with higher priority than ch from the meta-word stack, sequentially connecting the operator with the suffix expression E', and pushing the ch into the meta-word stack.
And 2.3, repeating the steps 2.1 and 2.2 until all characters in the regular expression are scanned, wherein E' is a suffix form expression converted from the regular expression.
And 3, constructing an abstract syntax tree of the regular expression in the form of suffix by utilizing a node stack. In step 3, the flow of constructing the abstract syntax tree from the suffix form regular expression by using the node stack is shown in fig. 3. The abstract syntax tree of a regular expression is a tagged binary ordered tree with internal nodes representing operators and leaf nodes representing operands. The abstract syntax tree represents the structure and the operation sequence of the regular expression in the form of a tree, and is an intuitive description of the regular expression. The concrete steps of constructing an abstract syntax tree from the regular expression E' in suffix form are as follows.
Step 3.1, defining a node stack and initializing to be empty.
Step 3.2, scanning a character ch in the suffix expression E' from left to right, setting the currently scanned character as ch, and carrying out the following processing according to different types of the ch:
if ch is a common character, constructing ch into a leaf node, and pushing the leaf node into a node stack.
If ch is a binary operator (selection operation or connection operation), constructing ch into a non-leaf node, simultaneously continuously popping two nodes from the node stack to serve as a right child and a left child of ch respectively, and then pushing the non-leaf node constructed by ch into the node stack.
If ch is a unitary operator (any repetition, zero or one repetition, one or more times of operation), constructing ch into a non-leaf node, simultaneously ejecting a node from the node stack as the left child of ch, the right child of ch being an empty node, and pushing the non-leaf node constructed by ch into the node stack.
And 3.3, repeating the steps 3.1 and 3.2 until all characters in the suffix expression E' are scanned, wherein a stack top element in the node stack is a tree root node of the abstract syntax tree.
And 4, traversing the abstract syntax tree in the subsequent step, and creating a corresponding character string set for the accessed nodes according to different strategies in the traversing process. As shown in fig. 4, the implementation of this step may be:
Assuming that the current access node is n, the corresponding string set generated for the current access node is denoted as S (n), and the method for generating the string set according to whether n is null, whether n is a leaf node or not, and the like is as follows.
Step 4.1, if n is a null node, constructing a string set S (n) = { epsilon }, wherein epsilon is a null string not containing any character.
Step 4.2, if n is a leaf node and the character in n is ch ', a string set S (n) = { ch ' } is constructed, where the set contains only one string ch '.
And 4.3, if n is a non-leaf node, setting a left child of n as a node nl and a right child as a node nr, and finding that the character string sets corresponding to the left child and the right child are S (nl) and S (nr) respectively in the subsequent traversal process, and adopting the following different construction strategies according to the different characters theta in n by the generation method of the character string set S (n) corresponding to the node n.
1) If θ is the select operator, construct S (n) as the union of S (nl) and S (nr).
2) If θ is a join operator, the specific steps are as follows:
first, the l character strings in S (nl) are ordered from large to small according to the length, and the m character strings in S (nr) are ordered from small to large according to the length. Or the l character strings in S (nl) are ordered from small to large according to the length, and the m character strings in S (nr) are ordered from large to small according to the length.
And secondly, connecting the first character string in S (nl) with the first character string in S (nr) according to the arranged order, and adding the obtained new character string into the set S (n). Connecting the second character string in S (nl) with the second character string in S (nr), and adding the obtained new character string into the set S (n), repeating the process until the character string in S (nl) or S (nr) is processed.
Third, if l=m, the second step is followed by the completion of all the strings in S (nl) and S (nr), and the completion may be completed. If l > m, after the second step, the character strings in S (nl) are not connected, and then one character string is arbitrarily selected from S (nr), and is respectively connected to the unprocessed character strings in S (nl), and the new character strings obtained by connection are added into the set S (n). If l < m, at this time, there are still strings in S (nr) that are not connected, one string is arbitrarily selected from S (nl), and each string is connected to an unprocessed string in S (nr), and the new string obtained by connection is added to the set S (n).
3) If θ is zero-order or one-time operator, according to the construction mode of the abstract syntax tree, the right child is a null node, and all character strings and null strings corresponding to the left child node are added into a set S (n), namely a union set of S (n) being S (nl) and { epsilon } is constructed, wherein epsilon represents the null string.
4) If θ is any repeating operator, according to the construction mode of the abstract syntax tree, the right child is a null node, S (n) is constructed according to the repeating operation on the character string corresponding to the left child node, and three typical repeating times are selected in order to control the scale of the character string finally generated and reflect the characteristics of any repeating operator: repeating zero times, repeating once, and repeating r >1 times, r can be set as desired.
In the first step, the null string epsilon is added to S (n), embodying zero repetitions.
And secondly, adding each character string in S (nl) into S (n), and repeating the embodiment once.
Thirdly, r copies of the character string in S (nl) are respectively recorded as S (nl) 1 and S (nl) 2. The strings in S (nl) 1 are then ordered from large to small in length, the strings in S (nl) 2 are ordered from small to large in length, the strings in S (nl) 3 are ordered from large to small in length, and r backups are ordered in the same manner. Sequentially connecting the r backups S (nl) 1 and S (nl) 2. The first character string in S (nl) r after sequencing, adding the obtained new character string into the set S (n), sequentially connecting the r backups S (nl) 1 and S (nl) 2. The second character string in S (nl) r after sequencing, adding the obtained new character string into the set S (n), and repeating the process until the character strings are all processed. This step is repeated r (r > 1) times.
5) If θ is one or more repeating operators, according to the construction mode of the abstract syntax tree, the right child is a null node, S (n) is constructed according to repeating operation on the character string corresponding to the left child node, and in order to control the finally generated character string scale to reflect the characteristics of one or more repeating operators, two typical repeating times are selected: repeating once and repeating r >1 times, r can be set as desired.
In the first step, each character string in S (nl) is added into S (n), and the embodiment is repeated once.
And secondly, r copies of the character string in S (nl) are respectively marked as S (nl) 1 and S (nl) 2. The strings in S (nl) 1 are then ordered from large to small in length, the strings in S (nl) 2 are ordered from small to large in length, the strings in S (nl) 3 are ordered from large to small in length, and r backups are ordered in the same manner. Sequentially connecting the r backups S (nl) 1 and S (nl) 2. The first character string in S (nl) r after sequencing, adding the obtained new character string into the set S (n), sequentially connecting the r backups S (nl) 1 and S (nl) 2. The second character string in S (nl) r after sequencing, adding the obtained new character string into the set S (n), and repeating the process until the character strings are all processed. This step is repeated r (r > 1) times.
And 5, sorting the character string set corresponding to the abstract syntax tree root node subjected to subsequent traversal and processing to obtain a final test character string. Specifically, the repeated character strings are deleted, the number of the character strings is counted, and the character strings are ordered from small to large according to the length, and the like.
The sorted character string set is provided as a test string of the regular expression to a developer or a tester, and the developer or the tester judges whether the character strings accord with expectations or not so as to check whether errors or defects exist in the definition of the regular expression.
Based on the description, the regular expression test method of the intelligent logistics control system provided by the invention has the following advantages compared with the prior art:
1. according to the method, the test character string is generated in a way of traversing the abstract grammar tree, so that the generation speed of the test string can be improved, the grammar structure of the regular expression can be better embodied, and the generated test string is more representative than random generation.
2. The invention selects several typical repeated conditions for zero-time or multiple-time repeated operation and one-time or multiple-time repeated operation operators which can cause infinite character strings to be generated, reduces the number of generated character strings while ensuring the repeated characteristics to be shown, and can control the scale of a final test set.
3. For the connection operation, the invention adopts the strategy of incomplete connection of the left and right operands, not only can ensure that each character string in the left and right operands participates in the connection operation, but also can avoid combination explosion, so that the number of the character strings finally generated is not excessive, and the test expense is reduced.
4. When the connection and repeated operation are processed, the invention provides that the character strings are ordered according to the length, so that shorter character strings are connected with longer character strings, the length of the finally generated character strings is ensured to be moderate, the judgment by developers or testers is facilitated, and the test efficiency is improved.
In addition, the invention also provides two implementation systems corresponding to the regular expression test method of the intelligent logistics control system, which comprises the following steps:
one of them is the regular expression test system of wisdom commodity circulation control system, and this system includes:
and the preprocessing module is used for preprocessing a given regular expression and identifying and obtaining meta-characters.
And the priority table construction module is used for determining the priority relation of the meta characters and constructing a priority table based on the priority relation.
And the suffix form conversion module is used for converting the regular expression into a regular expression in a suffix form by adopting a meta character stack based on the priority table.
And the grammar tree construction module is used for constructing an abstract grammar tree based on the regular expression in the form of the suffix.
And the character string set construction module is used for traversing the abstract syntax tree and creating a character string set for the accessed nodes in the traversing process.
And the test character string extraction module is used for extracting a character string set corresponding to the abstract syntax tree root node in the character string set, and sorting the character string set to obtain a test character string.
And the test module is used for completing the test of the given regular expression based on the test character string.
Another is an electronic device, comprising: a processor and a memory.
The processor is connected with the memory. The memory stores a computer software program. The computer software program is used for implementing the regular expression test method of the intelligent logistics control system. The processor is used to retrieve and execute the computer software program stored in the memory.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (3)

1. The regular expression testing method of the intelligent logistics control system is characterized by comprising the following steps of:
step 1, preprocessing a given regular expression, and identifying to obtain meta characters; describing a mode rule of the express bill number, the mailing address and the telephone number by using a regular expression;
step 2, determining the priority relation of the meta characters, and constructing a priority table based on the priority relation;
step 3, converting the regular expression into a regular expression in a suffix form by adopting a meta character stack based on the priority table;
step 3.1, defining a meta character stack, and initializing the meta character stack;
step 3.2, setting a suffix expression E' to be initially empty, selecting a symbol which does not appear in a given regular expression, stacking the symbol, and enabling the priority of the symbol to be lowest;
Step 3.3, scanning characters in a given regular expression from left to right, and setting the currently scanned characters as ch; if the character ch is a normal character, connecting the character ch into the suffix expression E'; if the character ch is a meta character, the following operations are performed:
if the character ch is a left bracket, pressing the character ch into the meta-character stack;
if the character ch is a right bracket, setting the current stack top element of the meta-character stack as a left bracket matched with the character ch according to a bracket pairing rule so as to pop up the left bracket from the meta-character stack for bracket removing operation;
if the character ch is an operator and the priority of the character ch is higher than the priority of the last entered element in the current meta-character stack, pushing the character ch into the meta-character stack;
if the character ch is an operator and the priority of the character ch is lower than the priority of the last element entering in the current meta-character stack, continuously popping the operator with higher priority than the character ch from the meta-character stack, sequentially connecting the operator with the suffix expression E' according to the pop-up sequence, and pushing the character ch into the meta-character stack; the common characters comprise numbers and letters; the meta-character is an operator for describing different operations on the character string;
Step 3.4, repeating the steps 3.2 to 3.3 until all characters in the given regular expression are scanned, and obtaining the regular expression in the suffix form;
step 4, constructing an abstract syntax tree based on the regular expression in the suffix form;
step 4.1, defining a node stack and initializing to be empty;
step 4.2, scanning characters in the regular expression in the suffix form from left to right, and setting the currently scanned characters as ch';
step 4.3, if the character ch 'is a common character, constructing the character ch' into a leaf node, and pushing the leaf node into the node stack;
step 4.4, if the character ch' is a meta character, executing the following operations:
if the character ch ' is a binary operator, constructing the character ch ' into a non-leaf node, storing an operand adjacent to the character ch ' in a regular expression in a suffix form in a node stack, continuously popping up two nodes from the node stack to serve as a right child and a left child of the character ch ', and pushing the non-leaf node constructed by the character ch ' into the node stack;
if the character ch ' is a unitary operator, constructing the character ch ' into a non-leaf node, simultaneously ejecting a node from the node stack as a left child of the character ch ', constructing a right child of the character ch ' into an empty node, and pushing the non-leaf node constructed by the character ch ' into the node stack;
Repeating the steps 4.2 to 4.4 until all characters in the regular expression in the form of the suffix are scanned, and taking the stack top element in the node stack as a tree root node of the abstract syntax tree;
step 5, traversing the abstract syntax tree, and creating a character string set for the accessed node in the traversing process;
when traversing the abstract syntax tree, if a node n in the abstract syntax tree which is accessed currently is an empty node, constructing a character string set S (n) as S (n) = { epsilon }, wherein epsilon is an empty character string;
if the node n in the currently accessed abstract syntax tree is a leaf node, constructing a character string set S (n) as S (n) = { ch '}, wherein ch' is a character of the node n;
if the node n in the currently accessed abstract syntax tree is a non-leaf node, constructing a character string set according to the character type of the character theta in the node n;
if the character θ in the node n is a selection operator, constructing a character string set S (n) as S (n) =s (nl)/(nr); nl is the left child of node n, nr is the right child of node n, the set of strings corresponding to the left child of node n is S (nl), and the set of strings corresponding to the right child of node n is S (nr);
if the character θ in the node n is a connection operator, the character string set S (n) is constructed by connecting the character strings in the character string set S (nl) corresponding to the left child of the node n and the character string set S (nr) corresponding to the right child of the node n;
If the character θ in the node n is zero-order or one-time operator, the character string set S (n) is constructed by all the character strings and the null strings corresponding to the left child node as follows: s (n) =s (nl)/(ε);
if the character theta in the node n is any repeated operator, constructing a character string set S (n) according to repeated operation on the character string corresponding to the left child node;
if the character theta in the node n is one or more repeated operators, constructing a character string set S (n) according to repeated operation on the character string corresponding to the left child node;
the character string set S (n) is constructed through the connection of the character strings in the character string set S (nl) corresponding to the left child of the node n and the character string set S (nr) corresponding to the right child of the node n, and the specific steps are as follows:
sequencing the character strings in the character string set S (nr) according to the length;
connecting a first character string in the character string set S (nl) with a first character string in the character string set S (nr) according to the arranged order, adding the obtained new character string into the character string set S (n '), connecting a second character string in the character string set S (nl) with a second character string in the character string set S (nr), adding the obtained new character string into the character string set S (n'), and so on until the character string in the character string set S (nl) or the character string set S (nr) is processed;
If the number of the character strings in the character string set S (nl) is equal to the number of the character strings in the character string set S (nr), the character string set S (n') formed by connecting the character string set S (nl) and the character string set S (nr) is used as a character string set S (n);
if the number of the character strings in the character string set S (nl) is larger than the number of the character strings in the character string set S (nr), arbitrarily selecting one character string from the character string set S (nr), respectively connecting the character strings to unprocessed character strings in the character string set S (nl), and adding new character strings obtained by connection into the set S (n') to obtain a character string set S (n);
if the number of the character strings in the character string set S (nl) is smaller than the number of the character strings in the character string set S (nr), arbitrarily selecting one character string from the character string set S (nl), respectively connecting the character strings with unprocessed character strings in the character string set S (nr), and adding new character strings obtained by connection into the set S (n') to obtain a character string set S (n);
if the character θ in the node n is any repeating operator, a string set S (n) is constructed according to the repeating operation on the string corresponding to the left child node, and specifically includes:
adding the blank string epsilon into the initial character string set to obtain a first character string set;
Adding each character string in the character string set S (nl) into the first character string set to obtain a second character string set;
r backups are carried out on the character strings in the character string set S (nl), and r backup character strings are obtained;
then, sorting the backup character strings according to the lengths respectively;
connecting corresponding character strings in the r sequenced backup character strings, and then adding the connected character strings into a second character string set to obtain a character string set S (n);
if the character θ in the node n is one or more repeated operators, a character string set S (n) is constructed according to repeated operations on the character string corresponding to the left child node, and specifically includes:
adding each character string in the character string set S (nl) into the initial character string set to obtain a first character string set;
r backups are carried out on the character strings in the character string set S (nl), and r backup character strings are obtained;
then, sorting the backup character strings according to the lengths respectively;
connecting corresponding character strings in the r sequenced backup character strings, and then adding the connected character strings into a first character string set to obtain a character string set S (n);
step 6, extracting a character string set corresponding to the abstract syntax tree root node in the character string set, and sorting to obtain a test character string;
And 7, completing the test of the given regular expression based on the test character string.
2. The regular expression test system of the intelligent logistics control system is characterized by comprising the following components: the system comprises a preprocessing module, a priority table construction module, a suffix form conversion module, a grammar tree construction module, a character string set construction module, a test character string extraction module and a test module;
the preprocessing module is used for preprocessing a given regular expression and identifying to obtain meta characters; describing a mode rule of the express bill number, the mailing address and the telephone number by using a regular expression;
the priority table construction module is used for determining the priority relation of the meta characters and constructing a priority table based on the priority relation;
the suffix form conversion module is used for converting the regular expression into a regular expression in a suffix form by adopting a meta character stack based on the priority table; based on the priority table, converting the regular expression into a regular expression in a suffix form by adopting a meta character stack, and specifically comprising the following steps:
step 3.1, defining a meta character stack, and initializing the meta character stack;
step 3.2, setting a suffix expression E' to be initially empty, selecting a symbol which does not appear in a given regular expression, stacking the symbol, and enabling the priority of the symbol to be lowest;
Step 3.3, scanning characters in a given regular expression from left to right, and setting the currently scanned characters as ch; if the character ch is a normal character, connecting the character ch into the suffix expression E'; if the character ch is a meta character, the following operations are performed:
if the character ch is a left bracket, pressing the character ch into the meta-character stack;
if the character ch is a right bracket, setting the current stack top element of the meta-character stack as a left bracket matched with the character ch according to a bracket pairing rule so as to pop up the left bracket from the meta-character stack for bracket removing operation;
if the character ch is an operator and the priority of the character ch is higher than the priority of the last entered element in the current meta-character stack, pushing the character ch into the meta-character stack;
if the character ch is an operator and the priority of the character ch is lower than the priority of the last element entering in the current meta-character stack, continuously popping the operator with higher priority than the character ch from the meta-character stack, sequentially connecting the operator with the suffix expression E' according to the pop-up sequence, and pushing the character ch into the meta-character stack; the common characters comprise numbers and letters; the meta-character is an operator for describing different operations on the character string;
Step 3.4, repeating the steps 3.2 to 3.3 until all characters in the given regular expression are scanned, and obtaining the regular expression in the suffix form;
the grammar tree construction module is used for constructing an abstract grammar tree based on the regular expression in the suffix form; constructing an abstract syntax tree based on the regular expression in the suffix form, which specifically comprises the following steps:
step 4.1, defining a node stack and initializing to be empty;
step 4.2, scanning characters in the regular expression in the suffix form from left to right, and setting the currently scanned characters as ch';
step 4.3, if the character ch 'is a common character, constructing the character ch' into a leaf node, and pushing the leaf node into the node stack;
step 4.4, if the character ch' is a meta character, executing the following operations:
if the character ch ' is a binary operator, constructing the character ch ' into a non-leaf node, storing an operand adjacent to the character ch ' in a regular expression in a suffix form in a node stack, continuously popping up two nodes from the node stack to serve as a right child and a left child of the character ch ', and pushing the non-leaf node constructed by the character ch ' into the node stack;
If the character ch ' is a unitary operator, constructing the character ch ' into a non-leaf node, simultaneously ejecting a node from the node stack as a left child of the character ch ', constructing a right child of the character ch ' into an empty node, and pushing the non-leaf node constructed by the character ch ' into the node stack;
repeating the steps 4.2 to 4.4 until all characters in the regular expression in the form of the suffix are scanned, and taking the stack top element in the node stack as a tree root node of the abstract syntax tree;
the character string set construction module is used for traversing the abstract syntax tree and creating a character string set for the accessed node in the traversing process; when traversing the abstract syntax tree, if a node n in the abstract syntax tree which is accessed currently is an empty node, constructing a character string set S (n) as S (n) = { epsilon }, wherein epsilon is an empty character string;
if the node n in the currently accessed abstract syntax tree is a leaf node, constructing a character string set S (n) as S (n) = { ch '}, wherein ch' is a character of the node n;
if the node n in the currently accessed abstract syntax tree is a non-leaf node, constructing a character string set according to the character type of the character theta in the node n;
if the character θ in the node n is a selection operator, constructing a character string set S (n) as S (n) =s (nl)/(nr); nl is the left child of node n, nr is the right child of node n, the set of strings corresponding to the left child of node n is S (nl), and the set of strings corresponding to the right child of node n is S (nr);
If the character θ in the node n is a connection operator, the character string set S (n) is constructed by connecting the character strings in the character string set S (nl) corresponding to the left child of the node n and the character string set S (nr) corresponding to the right child of the node n;
if the character θ in the node n is zero-order or one-time operator, the character string set S (n) is constructed by all the character strings and the null strings corresponding to the left child node as follows: s (n) =s (nl)/(ε);
if the character theta in the node n is any repeated operator, constructing a character string set S (n) according to repeated operation on the character string corresponding to the left child node;
if the character theta in the node n is one or more repeated operators, constructing a character string set S (n) according to repeated operation on the character string corresponding to the left child node;
the character string set S (n) is constructed through the connection of the character strings in the character string set S (nl) corresponding to the left child of the node n and the character string set S (nr) corresponding to the right child of the node n, and the specific steps are as follows:
sequencing the character strings in the character string set S (nr) according to the length;
connecting a first character string in the character string set S (nl) with a first character string in the character string set S (nr) according to the arranged order, adding the obtained new character string into the character string set S (n '), connecting a second character string in the character string set S (nl) with a second character string in the character string set S (nr), adding the obtained new character string into the character string set S (n'), and so on until the character string in the character string set S (nl) or the character string set S (nr) is processed;
If the number of the character strings in the character string set S (nl) is equal to the number of the character strings in the character string set S (nr), the character string set S (n') formed by connecting the character string set S (nl) and the character string set S (nr) is used as a character string set S (n);
if the number of the character strings in the character string set S (nl) is larger than the number of the character strings in the character string set S (nr), arbitrarily selecting one character string from the character string set S (nr), respectively connecting the character strings to unprocessed character strings in the character string set S (nl), and adding new character strings obtained by connection into the set S (n') to obtain a character string set S (n);
if the number of the character strings in the character string set S (nl) is smaller than the number of the character strings in the character string set S (nr), arbitrarily selecting one character string from the character string set S (nl), respectively connecting the character strings with unprocessed character strings in the character string set S (nr), and adding new character strings obtained by connection into the set S (n') to obtain a character string set S (n);
if the character θ in the node n is any repeating operator, a string set S (n) is constructed according to the repeating operation on the string corresponding to the left child node, and specifically includes:
adding the blank string epsilon into the initial character string set to obtain a first character string set;
Adding each character string in the character string set S (nl) into the first character string set to obtain a second character string set;
r backups are carried out on the character strings in the character string set S (nl), and r backup character strings are obtained;
then, sorting the backup character strings according to the lengths respectively;
connecting corresponding character strings in the r sequenced backup character strings, and then adding the connected character strings into a second character string set to obtain a character string set S (n);
if the character θ in the node n is one or more repeated operators, a character string set S (n) is constructed according to repeated operations on the character string corresponding to the left child node, and specifically includes:
adding each character string in the character string set S (nl) into the initial character string set to obtain a first character string set;
r backups are carried out on the character strings in the character string set S (nl), and r backup character strings are obtained;
then, sorting the backup character strings according to the lengths respectively;
connecting corresponding character strings in the r sequenced backup character strings, and then adding the connected character strings into a first character string set to obtain a character string set S (n);
the test character string extraction module is used for extracting a character string set corresponding to the abstract syntax tree root node in the character string set, and sorting the character string set to obtain a test character string;
And the test module is used for completing the test of the given regular expression based on the test character string.
3. An electronic device, comprising: a processor and a memory;
the processor is connected with the memory; the memory stores a computer software program; the computer software program is used for implementing the regular expression testing method of the intelligent logistics control system as set forth in claim 1; the processor is configured to retrieve and execute the computer software program stored in the memory.
CN202310063491.3A 2023-02-06 2023-02-06 Regular expression testing method and system for intelligent logistics control system Active CN115796166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310063491.3A CN115796166B (en) 2023-02-06 2023-02-06 Regular expression testing method and system for intelligent logistics control system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310063491.3A CN115796166B (en) 2023-02-06 2023-02-06 Regular expression testing method and system for intelligent logistics control system

Publications (2)

Publication Number Publication Date
CN115796166A CN115796166A (en) 2023-03-14
CN115796166B true CN115796166B (en) 2023-05-09

Family

ID=85429827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310063491.3A Active CN115796166B (en) 2023-02-06 2023-02-06 Regular expression testing method and system for intelligent logistics control system

Country Status (1)

Country Link
CN (1) CN115796166B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987405A (en) * 2021-11-02 2022-01-28 山东新汉唐数据科技有限公司 AST-based mathematical expression calculation algorithm

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2010277440A (en) * 2009-05-29 2010-12-09 Internatl Business Mach Corp <Ibm> Method for optimizing processing of character string upon execution of program, computer system of the same, and computer program
CN109977298B (en) * 2019-02-15 2021-07-23 中国科学院信息工程研究所 Method for extracting longest accurate substring from regular expression
CN110928793B (en) * 2019-11-28 2023-07-28 Oppo广东移动通信有限公司 Regular expression detection method and device and computer readable storage medium
CN111181980B (en) * 2019-12-31 2022-05-10 奇安信科技集团股份有限公司 Network security-oriented regular expression matching method and device
CN111460815B (en) * 2020-03-16 2022-04-01 平安科技(深圳)有限公司 Rule processing method, apparatus, medium, and electronic device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113987405A (en) * 2021-11-02 2022-01-28 山东新汉唐数据科技有限公司 AST-based mathematical expression calculation algorithm

Also Published As

Publication number Publication date
CN115796166A (en) 2023-03-14

Similar Documents

Publication Publication Date Title
US10545999B2 (en) Building features and indexing for knowledge-based matching
Bacchelli et al. Extracting structured data from natural language documents with island parsing
CN116860949B (en) Question-answering processing method, device, system, computing equipment and computer storage medium
US11042467B2 (en) Automated searching and identification of software patches
US20220138240A1 (en) Source code retrieval
Okui et al. Disambiguation in regular expression matching via position automata with augmented transitions
CN116560984A (en) Test case clustering grouping method based on call dependency graph
CN111258905A (en) Defect positioning method and device, electronic equipment and computer readable storage medium
CN115796166B (en) Regular expression testing method and system for intelligent logistics control system
CN112115125B (en) Database access object name resolution method and device and electronic equipment
US11556455B2 (en) Automated identification of posts related to software patches
CN111381826A (en) Method and device for generating syntax tree of code file and electronic equipment
CN112925874B (en) Similar code searching method and system based on case marks
CN112541357B (en) Entity identification method and device and intelligent equipment
CN1282917A (en) Software simulation test method
CN115221866B (en) Entity word spelling error correction method and system
CN112307235A (en) Front-end page element naming method and device and electronic equipment
CN115374884B (en) Method for training abstract generation model based on contrast learning and abstract generation method
CN111381827A (en) Method and device for generating syntax tree of code file and electronic equipment
Zamyatina Text mining of companies annual reports in PDF format
CN117992956A (en) Malicious code homology analysis method and device, electronic equipment and readable storage medium
CN118069809A (en) Sensitive word detection method, device, equipment and storage medium
CN116702759A (en) Software semantic optimization method and system based on symmetric spell check
CN118194862A (en) Text error correction method and device based on fault-tolerant suffix automaton
CN117743527A (en) Method, system and storage medium for extracting user search word path

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant