Disclosure of Invention
The invention aims to provide a method and a device for constructing a byte stream state machine, an electronic device and a storage medium, and further to overcome the problem of performance reduction caused by excessive occupation of a device running memory due to the limitations and defects of the related art at least to a certain extent.
According to an aspect of the present invention, there is provided a method for constructing a state machine of a byte stream, comprising the steps of:
constructing a dictionary tree containing a plurality of character strings; each character in the character string is correspondingly stored in a node of the dictionary tree, and the node represents the character string formed by the characters from a root node to the node;
determining a longest suffix node of each node and the position of each node in a byte stream state machine, and determining the offset and a state transfer node of a corresponding node according to the longest suffix node;
and traversing the nodes of the dictionary tree in breadth-first precedence order to generate byte stream node information of a byte stream state machine.
In an exemplary embodiment, the determining the longest suffix node of each node includes:
determining the root node as a longest suffix node of the node in a case where the node is connected to the root node.
In an exemplary embodiment, the dictionary tree includes a root node and a plurality of nodes, and the determining the longest suffix node of each node:
determining a longest suffix node of the node from a longest suffix node of a parent node of the node when the node is not connected to the root node; wherein, two nodes connected in sequence are a father node and a son node respectively.
In an exemplary embodiment, the determining the longest suffix node of the node from the longest suffix node of the parent node of the node comprises: and determining that the child node is the suffix node of the node when the longest suffix node of the parent node contains the same child node as the node.
In an exemplary embodiment, the byte stream node includes type information and address information of a character string.
In an exemplary embodiment, the determining the position of the nodes in the byte stream state machine includes:
and determining the current node position as the sum of the previous node position and the storage size of the current node by taking the root node as an initial position.
In an exemplary embodiment, the building of the state machine of the byte stream further comprises: and writing the byte stream nodes into a file and storing.
According to another aspect of the present invention, there is provided a byte stream state machine building apparatus, including:
the dictionary tree module is used for constructing a dictionary tree containing a plurality of character strings; each character in the character string is correspondingly stored in a node of the dictionary tree, and the node represents the character string formed by the characters from a root node to the node;
the state transition module is used for determining the longest suffix node of each node and the position of each node in the byte stream state machine, and determining the offset and the state transition node of the corresponding node according to the longest suffix node;
and the state machine module is used for traversing the breadth-first precedence of the nodes of the dictionary tree to generate byte stream node information of the byte stream state machine.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the above-described method.
According to another aspect of the present invention, there is provided a computer readable storage medium storing a computer program, characterized in that the computer program realizes the above method when executed by a processor.
In the construction method of the byte stream state machine provided by the invention, a dictionary tree comprising a plurality of character strings is constructed, the state of each node of the dictionary tree is transferred, and then the byte stream state machine formed by byte stream composition is generated by traversing breadth-first front sequence; the method stores the node data of the state machine in a byte stream mode, wherein the node data can be stored continuously and uninterruptedly, the storage space of a system can be saved, and particularly, the performance of equipment can be greatly improved for equipment such as an intelligent terminal which occupies sensitive memory.
Detailed Description
In order to make the objects, features and advantages of the present invention more apparent and understandable, embodiments and technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Example embodiments and examples, however, may be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments and examples are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments and examples to those skilled in the art. The described features, structures, or characteristics of the invention may be combined in any suitable manner in one or more embodiments and examples. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments and examples of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the invention.
Furthermore, the drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Although the steps of the method of the present invention are depicted in the drawings in a particular order, this does not require or imply that the steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the steps. For example, some steps may be decomposed, and some steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The state machine is a control center which is composed of a state register and a combinational logic circuit and can carry out state transition according to a preset state according to a control signal, coordinate the action of a relevant signal and complete a specific operation. The state machine can manage a plurality of states, each state has the capability of realizing the service, and a group of states are switched through the state machine. In the intelligent device taking the android system as an example, the state of each service in the system can be maintained through a state machine, and the matters such as state switching, state keeping and the like can be executed. For example, buttons such as data, WLAN, flashlight, sound, bluetooth, etc. are provided in the pull-down menu option of the Launcher (desktop Launcher in android system) homepage, and when, for example, a bluetooth toggle button is triggered, the android system state machine switches state to a bluetooth state machine. All functional modules in the android system form a tree-structured state machine through organic combination, and corresponding operations can be executed according to a trigger request of a user through the tree-structured state machine.
In the state machine with the tree structure, related data are stored in a structure of a double-array dictionary tree, and a larger memory space is required to be occupied due to the discontinuity of character coding among data, so that the problem of performance reduction possibly exists in equipment which is sensitive to memory occupation, such as an intelligent terminal and the like.
In view of the above problems in the related art, the present invention provides a method and an apparatus for constructing a byte stream state machine, an electronic device, and a storage medium. In the construction method of the byte stream state machine, a dictionary tree containing a plurality of character strings is constructed; wherein, each character in the character string is stored in a node of the dictionary tree correspondingly; determining a longest suffix node of each node, and determining the offset and a state transition node of a corresponding node according to the longest suffix node; and traversing the nodes of the dictionary tree in breadth-first precedence order to generate byte stream nodes of the byte stream state machine. The method stores the node data of the state machine in a byte stream mode, wherein the node data can be stored continuously in a non-interrupted mode, the storage space is saved, and particularly the performance of equipment such as an intelligent terminal which is sensitive to memory occupation can be greatly improved.
An embodiment of the present invention provides a method for constructing a byte stream state machine, and fig. 1 is a schematic flow chart of the method for constructing a byte stream state machine according to the embodiment of the present invention. As shown in fig. 1, the method for constructing the byte stream state machine includes the following steps:
step S11: constructing a dictionary tree containing a plurality of character strings; each character in the character string is correspondingly stored in a node of the dictionary tree, and the node represents the character string formed by the characters from a root node to the node;
a dictionary tree is a one-to-many string matching algorithm, also known as a prefix tree, Trie tree, or word-lookup tree. The dictionary tree is typically applied to the situations of counting, sequencing, storing a large number of character strings and the like by using the characteristics of reducing query time by using the common prefix of the character strings, reducing unnecessary character string comparison to the maximum extent and the like. The structure of the dictionary tree comprises a root node at the top layer and sequentially degraded parent-child nodes, wherein the root node does not store any character, and each node except the root node only stores one character; the character connection passing through the path from the root node to a certain node is the character string corresponding to the node, and all the child nodes of each node contain different characters. In step S11, a dictionary tree is constructed from the set of input strings, wherein each character in each string is stored at a node of the dictionary tree, wherein each node represents a string consisting of root node to node path characters.
The character string used for constructing the dictionary tree is a string of characters consisting of numbers, letters and/or underlines. In an exemplary embodiment, the character strings used for constructing the dictionary tree include { a, ab, bad, bc, bca, c, caa }, and as shown in fig. 2, each longitudinal branch of the dictionary tree represents at least one character string, such as the leftmost branch in the dictionary tree shown in fig. 2 represents the character string a and the character string ab, and the rightmost branch represents the character string c and the character string caa; the top level in the vertical direction is a root node which does not store any character, and the two characters which are longitudinally connected in sequence are the relationship between the father node and the child node.
Step S13: determining a longest suffix node of each node and the position of each node in a byte stream state machine, and determining the offset and a state transfer node of a corresponding node according to the longest suffix node;
in an exemplary embodiment, the dictionary tree includes a root node and nodes, and the determining the longest suffix node of each node: the longest suffix node of each node is determined according to the longest suffix node of the father node of each node, and for the child nodes directly contained in the root node, the longest suffix node is the root node, so that the state transition of the node is directly jumped to the root node; for other child nodes, firstly determining the longest suffix node of the father node of the node, and when the longest suffix node of the father node of the node contains the same node as the child node, taking the same node as the longest suffix node of the child node; wherein, two nodes connected in sequence are a father node and a son node respectively. And when the same node as the child node does not exist in the longest suffix node of the father node, taking the root node as the longest suffix node of the child node. In the case where the longest suffix node of each node is determined, the longest suffix node of each node is taken as the state transition node of the node. In the dictionary tree shown in fig. 3, the longest suffix nodes of the nodes a, b, and c included in the root node are all root nodes; in the character string ab, determining that the longest suffix node of the parent node a of the child node b is a root node, which contains the same node b as the child node b, so that the node b is used as the longest suffix node of the child node b; in the character string bad, the longest suffix node of the parent node b of the child node a is a root node, which contains the same node a as the child node a, so that the node a serves as the longest suffix node of the child node a; the longest suffix node of the parent node of the child node d is the a node, and the child nodes included in the a node do not have the same node as the child node d, so that the root node is used as the longest suffix node of the child node d.
In an exemplary embodiment, determining the position of each node in the byte stream state machine includes, according to a breadth-first precedence traversal strategy, storing each node in a byte stream structure in a dictionary tree according to a tree structure hierarchy from top to bottom and from left to right, so that nodes of the same depth are stored consecutively; the last node in the upper hierarchy is consecutive to the first node in the lower hierarchy. The root node is at position 0 in the streaming structure and the positions of the other nodes are the sum of the position of the previous node and its storage size.
Step S15: and traversing the nodes of the dictionary tree in breadth-first precedence order to generate byte stream node information of a byte stream state machine.
Breadth-first-precedence traversal is a traversal algorithm for connected graphs that is a prototype of many important graph algorithms. The Dijkstra single-source shortest path algorithm and the Prim minimum spanning tree algorithm both adopt ideas similar to the breadth-first front-order traversal strategy. The breadth-first precedence traversal strategy is also a blind search method, and aims to systematically expand and check all nodes in the graph to achieve a search result. The basic process of the breadth-first precedence traversal strategy is to traverse the nodes of the tree (graph) along the width of the tree (graph) starting from the root node, and the algorithm terminates when all nodes are visited.
In the above embodiment, the byte stream nodes of the byte stream state machine shown in fig. 4 can be obtained by flattening the nodes of the dictionary tree according to the breadth-first precedence traversal strategy, that is, the nodes in the dictionary tree are sequentially arranged in rows to generate the byte stream nodes. In an exemplary embodiment, the byte stream node of the byte stream state machine is shown in fig. 5, and the byte stream node comprises: the bit number 1 for representing whether the node is a node in the dictionary tree or not contains 1 bits, 1 represents that the node is a node in the dictionary tree, and 0 represents that the node is not a node in the dictionary tree; the number of bits 2 representing the length of the node character string comprises 1 bit and 127 characters with the maximum length bit; unicode code number 3 of a character, which includes 16 bits; the bit number 4 which represents the number of bytes occupied by the node to the longest suffix node offset comprises 2bits and is divided into 01:1Byte, 10:2Byte and 11:3Byte, and the effective bits corresponding to the offset are respectively 6 bits, 14 bits and 22 bits; a number of bits 5 representing the suffix node offset, comprising 6, 14 or 22 bits; a bit number 6 representing the number of bytes (bytes) occupied by the number of child nodes of the node, which contains 2bits, 01:1 bytes 10:2 bytes 11:3 bytes, and the significant bits corresponding to the number of child nodes are 6, 14 or 22bits, respectively; bit number 7, representing the number of children, which contains 6, 14 or 22 bits; the bit number 8, which represents the offset of the node from the 1 st child node, comprises 16 bits; the number of bits 9 representing the extension field, which contains 8 bits, the extension field representing the type of stored dictionary word by 1 byte; a number of bits 10 indicating an extension field indicating address information (id) of a stored word by 3 bytes; wherein, the extension field is the attribute of the middle character string node in the dictionary tree. The byte stream state machine constructed by the method increases the ID and the type corresponding to the character string in the output content besides the matched character string, and can effectively carry out secondary processing such as merging, analyzing and the like on the matched character string.
In an exemplary embodiment, the method for constructing the byte stream state machine may further include: and writing the byte stream nodes into a file and storing. The state machine is converted into the form of the byte stream to be stored through the distribution, and the byte stream state machine can be directly loaded without regenerating the form of the state machine when in use, so that the working time can be saved, and the working efficiency of the system can be improved.
In an exemplary embodiment, the program code for implementing the method of constructing the state machine of the byte stream is as follows:
// Tree AC automaton node information
Tree Node info
byte depth;
char c;
int id;
Node suffix;
List children;
int offset;
// bit operation constant
int S_BIT8 = 1 << 7;
int BITS6 = 0x3F;
int BITS8 = 0xFF;
int BITS7 = 0x7F;
int BITS14 = 0x3FFF;
int BITS22 = 0x3FFFFF;
int BITS24 = 0xFFFFFF;
// node type
int NODE_WORD = 0;
// only extension type takes 1byte
int NODE_TYPE = 1;
// only extending the ID to 3 bytes
int NODE_ID = 3;
4 bytes for extended type and ID
int NODE_TYPE_ID = 4;
procedure addWord(word, id, type)
i ← 0
while i < word.len -1
// creating Gray nodes
node = addNode(word[i], false)
// recording the depth of the node
node.depth ← ((i + 1) & BITS7)
i ← i + 1
// creating blue nodes
node = addNode(word[word.len-1], true)
node.depth ← (word.len & BITS7)
node.id ← ((type << 24) | (id & BITS24))
return node
procedure addNode(c, isBlue)
node ← null
If the current node has no child nodes, an empty list is created
if this.children = null
this.children ← empty list
else
i ← 0
while i < this.children.size
tempNode ← this.children.get(i)
If there is already a byte point corresponding to character c, exit the while loop early
if tempNode.c = c
node ← tempNode
break
i ← i+1
/**
No child nodes corresponding to the character c exist under the current node, 1 new node is created,
and adding the new node to the child node list
*/
if node = null
node ← new Node(char)
this.children.add(node)
If the node joined is a blue node, bit 7 of depth is set to 1
if isBlue is true
node.depth ← node.depth | S_BIT8
return node
procedure nextNode(c)
if this.children ≠ null
i ← 0
while i < this.children.size
node ← this.children[i]
if node.c = c
return node
if this is root
return this
else
return null
procedure buildSuffix()
queue ← empty queue
// the root node is the 1 st node written, so the offset is 0; flag is shown in the description of node type
size ← root.byteSpace(0 , flag)
i ← 0
/**
All the longest suffix nodes of the child nodes under the root node are root nodes
Size represents the bit sub of the current node in the Byte stream
*/
while i < root.children.size
node ← root.children[i]
node.suffix ← root
size ← size + node.byteSpace(size,flag)
queue.push(node)
/**
Traversing nodes of the tree according to breadth-first precedence order to construct longest suffix nodes
Calculating the bit of each node in the Byte stream, see Byte superpace
*/
while not queue.isEmpty()
curNode ← queue.shift()
if curNode.children ≠ null
k ← 0
suffixNode ← null
while k < curNode.children.size
node ← curNode.children[i]
queue.push(node)
parentSuffixNode ← curNode.suffix
while (suffixNode ← parentSuffixNode.nextNode(node.c) = null
parentSuffixNode ← parentSuffixNode.suffix
node.suffix ← suffixNode
size ← (size + node.byteSpace(size,flag))
procedure byteSpace(offset , flag)
// the bit of the current node in the Byte stream
this.offset ← offset
size ← 3
childrenCount ← 0
V/calculating the number of bytes occupied by the cheap quantity of the current node from the longest suffix node
if this.suffix.offset = 0
size ← (size +1)
else
size ← (size + bytes(offset - this.suffix.offset))
if this.children ≠ null
childrenCount ← this.children.size
V/calculating the number of bytes occupied by the child node number of the current node
size ← (size + bytes(childrenCount))
if childrenCount > 0
size ← (size + 2)
V/blue node may need to calculate the number of bytes occupied by the extension type or ID, see flag
if this is blue
size ← (size + flag)
return size
procedure bytes(count)
if count <= BITS6
return 1
else if count <= BITS14
return 2
else if count <= BITS22
return 3
else
return error
procedure writeToParcel(file , int flag)
queue ← empty queue
node ← null
while i < root.children.size
node ← root.children[i]
queue.push(node)
while not queue.isEmpty()
curNode ← queue.shift()
k ← 0
while k < curNode.children.size
node ← curNode.children[i]
queue.push(node)
// write 1 Bytes' depth data
file.writeByte(curNode.depth)
file.writeChar(curNode.c)
// write the offset of the current node from the longest suffix node of 1-3 Bytes
if curNode.suffix.offset = 0
file.writeByte( 1 << 6)
else
writeBytes(file , curNode.offset - curNode.suffix.offset)
count ← 0
if curNode.children ≠ null
count ← curNode.children.size()
// the number of child nodes of the current node written with 1-3 Bytes
writeBytes(file , count)
// write 2 Bytes current node offset from first child node
if count > 0
file.writeShort(curNode.children[0].offset - curNode.offset)
// only blue nodes need to write
if curNode is blue
if flag = NODE_TYPE
Type data of// write 1 Bytes
file.writeByte((curNode.id >> 24) & BITS8)
else if flag = NODE_ID
// write 3 Bytes' id data
file.writeByte((id >>> 16) & BITS8)
file.writeByte((id >>> 8) & BITS8)
file.writeByte(id & BITS8)
else if flag = NODE_TYPE_ID
Type and id data of/write 4 Bytes
file.writeInt(id)
else
// no extension information, no write data is required.
procedure writeBytes(file , int data)
byteCount ← bytes(data)
i ← byteCount
while i > 0
if i = byteCount
file.writeByte((byteCount << 6) | ((data >> ((i - 1) * 8)) & BITS6))
else
file.writeByte((data >> (i - 1) * 8) & BITS8);
i ← (i - 1)
Another embodiment of the present invention provides a byte stream state machine building apparatus, and fig. 6 is a structural diagram of a byte stream state machine building apparatus according to an embodiment of the present invention. As shown in fig. 6, the constructing apparatus 60 of the byte stream state machine includes:
a dictionary tree module 62 for constructing a dictionary tree containing a plurality of character strings; each character in the character string is correspondingly stored in a node of the dictionary tree, and the node represents the character string formed by the characters from a root node to the node;
a state transition module 64, configured to determine a longest suffix node of each node and a position of each node in the byte stream state machine, and determine an offset and a state transition node of a corresponding node according to the longest suffix node;
a state machine module 66, configured to perform breadth-first precedence traversal on the nodes of the trie to generate byte stream node information of the byte stream state machine.
The details of each module/unit in the above device have been described in detail in the corresponding method section, and are not described herein again. It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In addition to the above-described methods and apparatus, embodiments of the invention may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in the methods according to various embodiments of the invention described in the "exemplary methods" section above of this specification.
The computer program product may write program code for carrying out operations for embodiments of the present invention in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Another embodiment of the present invention provides an electronic device, which may be used to perform all or part of the steps of the method or the network control method described in this exemplary embodiment. The device comprises: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform steps in a method according to various embodiments of the present invention described in the "exemplary method" section above.
Another embodiment of the present invention provides a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps in the method according to various embodiments of the present invention described in the "exemplary method" above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present invention have been described above with reference to specific embodiments, but it should be noted that the advantages, effects, etc. mentioned in the present invention are only examples and are not limiting, and the advantages, effects, etc. must not be considered to be possessed by various embodiments of the present invention. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the invention is not limited to the specific details described above.
The block diagrams of devices, apparatuses, systems involved in the present invention are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.