CN108681554A - A kind of matching process, device and equipment using regular expression - Google Patents

A kind of matching process, device and equipment using regular expression Download PDF

Info

Publication number
CN108681554A
CN108681554A CN201810290338.3A CN201810290338A CN108681554A CN 108681554 A CN108681554 A CN 108681554A CN 201810290338 A CN201810290338 A CN 201810290338A CN 108681554 A CN108681554 A CN 108681554A
Authority
CN
China
Prior art keywords
regular expression
compiled
finite automata
regular
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810290338.3A
Other languages
Chinese (zh)
Other versions
CN108681554B (en
Inventor
温悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810290338.3A priority Critical patent/CN108681554B/en
Publication of CN108681554A publication Critical patent/CN108681554A/en
Application granted granted Critical
Publication of CN108681554B publication Critical patent/CN108681554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of matching process, device and equipment using regular expression of this disclosure, a large amount of first regular expression is passed through combination by this method, obtain the second a small amount of regular expression, each second regular expression is compiled as corresponding finite automata respectively again, text is matched finally by finite automata.Quantity due to combining the second obtained regular expression is less than the quantity of the first regular expression, the number of matched text can be effectively reduced, matching efficiency is improved.

Description

A kind of matching process, device and equipment using regular expression
Technical field
This specification is related to field of computer technology more particularly to a kind of matching process, device using regular expression And equipment.
Background technology
Currently, it is the common method of industry to carry out matching to the data in text by regular expression.
Usually, the quantity for being used for matched regular expression is all differed at several to tens, and in the prior art In, for each regular expression, it is required for using the regular expression to all text filtering one time.
If that is, there is N regular expression, then each text will be filtered n times, this is big, just in amount of text In the case of then expression formula is a fairly large number of, there are prodigious performance issues.Therefore, it is necessary to a kind of matchings using regular expression Method, to improve matching efficiency.
Invention content
This specification provides a kind of matching process, device and equipment using regular expression, to solve the prior art Using the less efficient problem of regular expression matching.
Present description provides a kind of matching process using regular expression, including:
Determine each first regular expression;
Each first regular expression is combined, at least one second regular expression, the second canonical table are obtained It is less than the quantity of the first regular expression up to the quantity of formula;
Each second regular expression is compiled as finite automata;
Matched text is treated using the finite automata to be matched, and result is obtained.
Present description provides a kind of coalignments using regular expression, including:
Determining module determines each first regular expression;
Each first regular expression is combined by composite module, obtains at least one second regular expression, and described The quantity of two regular expressions is less than the quantity of the first regular expression;
Each second regular expression is compiled as finite automata by collector;
Matching module is treated matched text using the finite automata and is matched, obtains result.
Present description provides a kind of matching unit using regular expression, the equipment includes one or more store Device and processor, the memory store program, and are configured to execute following step by one or more of processors Suddenly:
Determine each first regular expression;
Each first regular expression is combined, at least one second regular expression, the second canonical table are obtained It is less than the quantity of the first regular expression up to the quantity of formula;
Each second regular expression is compiled as finite automata;
Matched text is treated using the finite automata to be matched, and result is obtained.
Above-mentioned at least one technical solution that this specification uses can reach following advantageous effect:
A large amount of first regular expression by combination, is obtained the second a small amount of regular expression, then will by this specification Each second regular expression is compiled as corresponding finite automata respectively, is matched to text finally by finite automata. The quantity of the first regular expression is less than due to combining the obtained quantity of the second regular expression, it can effectively reduce Number with text improves matching efficiency.
Description of the drawings
Attached drawing described herein is used for providing further understanding this specification, forms part of this specification, The illustrative embodiments and their description of this specification do not constitute the improper restriction to this specification for explaining this specification. In the accompanying drawings:
Fig. 1 is the process schematic using regular expression matching that this specification provides;
Fig. 2 is the schematic diagram that each first regular expression is compiled as to NFA respectively;
Fig. 3 is that will combine the schematic diagram that the second regular expression that each first regular expression obtains is compiled as NFA;
Fig. 4 is the state transition graph illustrated by taking the corresponding finite automatas of regular expression ab (c | d) as an example;
Fig. 5 is the structural schematic diagram for the coalignment using regular expression that this specification provides;
Fig. 6 is the structural schematic diagram for the matching unit using regular expression that this specification provides.
Specific implementation mode
In the prior art, it when using regular expression matching text, needs each regular expression being compiled into Corresponding finite automata, then all texts are filtered using each finite automata successively, and this specification will just Then expression formula is combined, and obtains the regular expression after a small amount of combination, using the regular expression after combination to text into Row filtering, can be improved matching efficiency..
In order to make those skilled in the art more fully understand the technical solution in this specification one or more embodiment, Below in conjunction with the attached drawing in this specification one or more embodiment, to the technology in this specification one or more embodiment Scheme is clearly and completely described, it is clear that and described embodiment is only this specification a part of the embodiment, rather than Whole embodiments.The embodiment of base in this manual, those of ordinary skill in the art are not before making creative work The every other embodiment obtained is put, the range of this specification protection should be all belonged to.
Fig. 1 is the process schematic using regular expression matching that this specification provides, and specifically includes following steps:
S100:Determine each first regular expression.
In the present specification, regular expression can be handled by regular expression engine, such as compiling uses regular expressions Formula such as matches at the text.
The first regular expression described in this specification refer to for text carry out it is matched, do not carry out any processing Original regular expression.In practical application, there may be more than ten even tens for these first regular expressions.Such as, Ab (c | d), ab (e | f), ab (g | h) etc..
S102:Each first regular expression is combined, obtains at least one second regular expression, described second just Then the quantity of expression formula is less than the quantity of the first regular expression.
In the present specification, each first regular expression can be combined by regular expression engine, it also can be by other Hardware or software each first regular expression is combined.
In combination, the supported grammer of regular expression engine can be used, each first regular expression is combined, as long as group The quantity of the second regular expression obtained after conjunction is less than the quantity of the first regular expression.Specifically, can be by each first Regular expression is connected with random order, passes through specified concatenation character phase between each two adjacent first regular expression Even, the second regular expression is obtained.The specified concatenation character can be specifically " | ".
Continue to use the example above, it is assumed that each first regular expression is ab (c | d), ab (e | f), ab (g | h), then can by this three A first regular expression is combined as (ab (c | d)) | (ab (e | f)) and ab (g | h), as two obtained the second regular expressions Formula.Certainly, these three the first regular expressions can be also combined as to second regular expression (ab (c | d)) | (ab (e | f)) |(ab(g|h))。
S104:Each second regular expression is compiled as finite automata.
It, can be by regular expression engine, using specified algorithm by each second canonical table in this specification embodiment It is compiled as finite automata up to formula.Specifically, (Thompson's construction) algorithm can be constructed by thompson, it will Each second regular expression is compiled as corresponding non deterministic finite automaton (Nondeterministic Finite Automaton, NFA), and merge the node repeated in the NFA.Alternatively, can also be calculated by Powerset construction Each second regular expression is compiled as corresponding deterministic finite automaton (Deterministic Finite by method Automaton, DFA), and merge the node repeated in the DFA.Certainly, other compiler algorithm compilings second can also be used just Then expression formula, as long as can finite automata be compiled as the second regular expression.
Wherein, it is canonical table to be either compiled into the step of NFA is still compiled into DFA, merges the node wherein repeated It is completed up to formula engine when compiling the second regular expression by corresponding algorithm, the purpose is to be multiplexed the section of repetition as possible Point, illustrates by taking NFA as an example below, as shown in Figures 2 and 3.
Fig. 2 is the schematic diagram that each first regular expression is compiled as to NFA respectively.In fig. 2, if not combining three One regular expression ab (c | d), ab (e | f), ab (g | h), then need respectively to be compiled these three first regular expressions, Corresponding 3 three NFA are obtained, then text is matched using these three NFA respectively, this is exactly the method for the prior art.
Fig. 3 is that will combine the schematic diagram that the second regular expression that each first regular expression obtains is compiled as NFA.Scheming In 3, since three the first regular expression ab (c | d), ab (e | f), ab (g | h) have been combined into a second canonical table Up to formula (ab (c | d)) | (ab (e | f)) | (ab (g | h)), therefore, using thompson construction algorithm to second regular expression When being compiled, it is found that there is the node " a " and " b " repeated, therefore, according to thompson construction algorithm, canonical table in NFA It can merge these nodes repeated up to formula engine, obtain NFA as shown in Figure 3.NFA as shown in Figure 3 is finally reused to text This is matched, and each text three times, need to only match once, and due to having been incorporated in used NFA without matching The node repeated, therefore can further promote matching efficiency.
S106:Matched text is treated using the finite automata to be matched, and result is obtained.
By the above method, the second a small amount of regular expressions have been obtained since each first regular expression to be combined Formula, therefore compared with the prior art, the quantity for the finite automata that the above method that this specification provides finally uses is less than existing There are the quantity for the finite automata that technology uses, the number that each text is matched as a result, to be reduced, improve using canonical The efficiency of expression formula matched text.Moreover, when the second regular expression is compiled into finite automata, since compiler algorithm can be certainly Dynamic circuit connector and the node repeated, therefore can further promote matching efficiency.
May be the corresponding processing strategy of each first regular expression setting in addition, in practical application scene, When being matched to a text, if gone out using some first regular expression matching as a result, if can use first canonical The corresponding processing strategy of expression formula, handles the text and/or the result.Therefore, in order to using group as shown in Figure 1 After the method matched text for closing each first regular expression, it still is able to use above-mentioned processing method to text and/or matches Result handled, in the present specification, combine each first regular expression when, can by " by the first regular expression make For the subexpression of the second regular expression " method, each first regular expression is combined to obtain the second regular expressions Formula, that is, after combination, each first regular expression is the subexpression of second regular expression.Such as, second in upper example Regular expression (ab (c | d)) | (ab (e | f)) | (ab (g | h)) in three subexpression ab (c | d), ab (e | f), ab (g | H) three the first regular expressions before actually combining.
It further, can be before each second regular expression be compiled as finite automata, just for each second Then expression formula, for the one-to-one mark of each subexpression setting in second regular expression.Also be equivalent to be for Each of include that the first regular expression is provided with one-to-one mark in second regular expression.
Specifically, with each subexpression in the second regular expression, mark can be capture group correspondingly (Capturing Group), capture group are substantially that the subexpression in the second regular expression is saved in digital number Or (be typically maintained in memory) in the group explicitly named, facilitate and quotes below.
After being provided with corresponding capture group for each subexpression in the second regular expression, to the second regular expressions When formula is compiled, each of finite automata compiled out is final to be received state and will be corresponded with each capture group. As shown in Figure 4.
Fig. 4 is the state transition graph illustrated by taking the corresponding finite automatas of regular expression ab (c | d) as an example, in Fig. 4, State 0,1,2,3,4 be all receive state, but only stateful 3 and 4 be it is final receive state (also referred to as terminal), state turns It is as follows to change table:
a b c d
State 0 State 1 Do not receive Do not receive Do not receive
State 1 Do not receive State 2 Do not receive Do not receive
State 2 Do not receive Do not receive State 3 (terminal) State 4 (terminal)
State 3 Do not receive Do not receive Do not receive Do not receive
State 4 Do not receive Do not receive Do not receive Do not receive
Table 1
Similar, if the second regular expression (ab (c | d)) | (ab (e | f)) | (ab (g | h)) subexpression be arranged Capture group, then after second regular expression being compiled into finite automata, each of finite automata it is final receive shape State also can be corresponding at least one capture group, to identify the final state that receives is reached by the corresponding sublist of capture group What formula obtained, it can determine that with the final result that state matches that receives be also with the corresponding son of corresponding capture group whereby What expression formula obtained.
Finally, after the finite automata stated in use matches result, then institute can be matched according to the finite automata Final when stating result receives the corresponding capture group of state, determines the first regular expression for matching the result.Also It is to say, finally receives the corresponding subexpression of the corresponding capture group of state when matching the result, exactly match the knot The first regular expression of fruit, subsequently then can be according to being in advance the processing strategy of first regular expression setting, to the text And/or result is handled.
It is the matching process using regular expression that one or more embodiments of this specification provide above, based on same The thinking of sample, this specification additionally provide the corresponding coalignment for utilizing regular expression, as shown in Figure 5.
Determining module 501 determines each first regular expression;
Each first regular expression is combined by composite module 502, obtains at least one second regular expression, institute The quantity for stating the second regular expression is less than the quantity of the first regular expression;
Each second regular expression is compiled as finite automata by collector 503;
Matching module 504 is treated matched text using the finite automata and is matched, obtains result.
The composite module 502 is specifically used for, and each first regular expression is connected with random order, each two adjacent It is connected by specified concatenation character between first regular expression, obtains the second regular expression.
Each first regular expression is the subexpression of second regular expression;
Described device further includes:
Setup module 505, for each second regular expression, for each subexpression in second regular expression The one-to-one mark of setting.
The mark includes capture group;
Each of described finite automata it is final to receive state corresponding at least one capture group.
The matching module 504 is additionally operable to, and the final receiving when result is matched according to the finite automata The corresponding capture group of state determines the first regular expression for matching the result.
The collector 503 is specifically used for, and is compiled as second regular expression using thompson construction algorithm Non deterministic finite automaton NFA, and merge the node repeated in the NFA;Or, using Powerset construction Second regular expression is compiled as deterministic finite automaton DFA by algorithm, and merges the node repeated in the DFA.
This specification also correspondence provides a kind of matching unit using regular expression, as shown in Figure 6.Pacify in the equipment Equipped with application, which includes one or more memories and processor, and the memory stores program, and is configured to Following steps are executed by one or more of processors:
Determine each first regular expression;
Each first regular expression is combined, at least one second regular expression, the second canonical table are obtained It is less than the quantity of the first regular expression up to the quantity of formula;
Each second regular expression is compiled as finite automata;
Matched text is treated using the finite automata to be matched, and result is obtained.
In the 1990s, the improvement of a technology can be distinguished clearly be on hardware improvement (for example, Improvement to circuit structures such as diode, transistor, switches) or software on improvement (improvement for method flow).So And with the development of technology, the improvement of current many method flows can be considered as directly improving for hardware circuit. Designer nearly all obtains corresponding hardware circuit by the way that improved method flow to be programmed into hardware circuit.Cause This, it cannot be said that the improvement of a method flow cannot be realized with hardware entities module.For example, programmable logic device (Programmable Logic Device, PLD) (such as field programmable gate array (Field Programmable Gate Array, FPGA)) it is exactly such a integrated circuit, logic function determines device programming by user.By designer Voluntarily programming comes a digital display circuit " integrated " on a piece of PLD, designs and makes without asking chip maker Dedicated IC chip.Moreover, nowadays, substitution manually makes IC chip, this programming is also used instead mostly " patrols Volume compiler (logic compiler) " software realizes that software compiler used is similar when it writes with program development, And the source code before compiling also write by handy specific programming language, this is referred to as hardware description language (Hardware Description Language, HDL), and HDL is also not only a kind of, but there are many kind, such as ABEL (Advanced Boolean Expression Language)、AHDL(Altera Hardware Description Language)、Confluence、CUPL(Cornell University Programming Language)、HDCal、JHDL (Java Hardware Description Language)、Lava、Lola、MyHDL、PALASM、RHDL(Ruby Hardware Description Language) etc., VHDL (Very-High-Speed are most generally used at present Integrated Circuit Hardware Description Language) and Verilog.Those skilled in the art also answer This understands, it is only necessary to method flow slightly programming in logic and is programmed into integrated circuit with above-mentioned several hardware description languages, The hardware circuit for realizing the logical method flow can be readily available.
Controller can be implemented in any suitable manner, for example, controller can take such as microprocessor or processing The computer for the computer readable program code (such as software or firmware) that device and storage can be executed by (micro-) processor can Read medium, logic gate, switch, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), the form of programmable logic controller (PLC) and embedded microcontroller, the example of controller includes but not limited to following microcontroller Device:ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone Labs C8051F320, are deposited Memory controller is also implemented as a part for the control logic of memory.It is also known in the art that in addition to Pure computer readable program code mode is realized other than controller, can be made completely by the way that method and step is carried out programming in logic Controller is obtained in the form of logic gate, switch, application-specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. to come in fact Existing identical function.Therefore this controller is considered a kind of hardware component, and to including for realizing various in it The device of function can also be considered as the structure in hardware component.Or even, it can will be regarded for realizing the device of various functions For either the software module of implementation method can be the structure in hardware component again.
System, device, module or the unit that above-described embodiment illustrates can specifically realize by computer chip or entity, Or it is realized by the product with certain function.It is a kind of typically to realize that equipment is computer.Specifically, computer for example may be used Think personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media play It is any in device, navigation equipment, electronic mail equipment, game console, tablet computer, wearable device or these equipment The combination of equipment.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit is realized can in the same or multiple software and or hardware when specification.
It should be understood by those skilled in the art that, the embodiment of this specification can be provided as method, system or computer journey Sequence product.Therefore, in terms of this specification can be used complete hardware embodiment, complete software embodiment or combine software and hardware Embodiment form.Moreover, it wherein includes computer usable program code that this specification, which can be used in one or more, The computer implemented in computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of program product.
This specification is with reference to the method, equipment (system) and computer according to this specification one or more embodiment The flowchart and/or the block diagram of program product describes.It should be understood that flow chart and/or side can be realized by computer program instructions The combination of the flow and/or box in each flow and/or block and flowchart and/or the block diagram in block diagram.It can provide These computer program instructions are set to the processing of all-purpose computer, special purpose computer, Embedded Processor or other programmable datas Standby processor is to generate a machine so that is executed by computer or the processor of other programmable data processing devices Instruction generates specifies for realizing in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes Function device.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and memory.
Memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology realizes information storage.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moves State random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only disc read only memory (CD-ROM) (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or other magnetic storage apparatus Or any other non-transmission medium, it can be used for storage and can be accessed by a computing device information.As defined in this article, it calculates Machine readable medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, the terms "include", "comprise" or its any other variant are intended to nonexcludability Including so that process, method, commodity or equipment including a series of elements include not only those elements, but also wrap Include other elements that are not explicitly listed, or further include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that wanted including described There is also other identical elements in the process of element, method, commodity or equipment.
This specification can describe in the general context of computer-executable instructions executed by a computer, such as journey Sequence module.Usually, program module include routines performing specific tasks or implementing specific abstract data types, program, object, Component, data structure etc..One or more embodiments that this specification can also be put into practice in a distributed computing environment, at this In a little distributed computing environment, by executing task by the connected remote processing devices of communication network.It is counted in distribution It calculates in environment, program module can be located in the local and remote computer storage media including storage device.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so description is fairly simple, related place is referring to embodiment of the method Part explanation.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims It is interior.In some cases, the action recorded in detail in the claims or step can be come according to different from the sequence in embodiment It executes and desired result still may be implemented.In addition, the process described in the accompanying drawings not necessarily require show it is specific suitable Sequence or consecutive order could realize desired result.In some embodiments, multitasking and parallel processing be also can With or it may be advantageous.
The foregoing is merely one or more embodiments of this specification, are not limited to this specification.For For those skilled in the art, one or more embodiments of this specification can have various modifications and variations.It is all in this explanation Any modification, equivalent replacement, improvement and so within the spirit and principle of one or more embodiments of book, should be included in Within the right of this specification.

Claims (13)

1. a kind of matching process using regular expression, including:
Determine each first regular expression;
Each first regular expression is combined, at least one second regular expression, second regular expression are obtained Quantity be less than the first regular expression quantity;
Each second regular expression is compiled as finite automata;
Matched text is treated using the finite automata to be matched, and result is obtained.
2. each first regular expression is combined by the method as described in claim 1, at least one second canonical table is obtained Up to formula, specifically include:
Each first regular expression is connected with random order, by specified between each two adjacent first regular expression Concatenation character is connected, and obtains the second regular expression.
3. method as claimed in claim 1 or 2, each first regular expression is that the sublist of second regular expression reaches Formula;
Before each second regular expression is compiled as finite automata, the method further includes:
For each second regular expression, for the one-to-one mark of each subexpression setting in second regular expression Know.
4. method as claimed in claim 3, the mark includes capture group;
Each of described finite automata it is final to receive state corresponding at least one capture group.
5. method as claimed in claim 4, the method further include:
Final when matching the result according to the finite automata receives the corresponding capture group of state, and determination matches First regular expression of the result.
6. each second regular expression is compiled as finite automata, specifically included by the method as described in claim 1:
Second regular expression is compiled as by non deterministic finite automaton NFA using thompson construction algorithm, and is merged The node repeated in the NFA;Or
Second regular expression is compiled as by deterministic finite automaton using Powerset construction algorithms DFA, and merge the node repeated in the DFA.
7. a kind of coalignment using regular expression, including:
Determining module determines each first regular expression;
Each first regular expression is combined by composite module, obtains at least one second regular expression, described second just Then the quantity of expression formula is less than the quantity of the first regular expression;
Each second regular expression is compiled as finite automata by collector;
Matching module is treated matched text using the finite automata and is matched, obtains result.
8. device as claimed in claim 7, the composite module is specifically used for, by each first regular expression with random order It connects, is connected by specified concatenation character between each two adjacent first regular expression, obtains the second regular expression.
9. device as claimed in claim 7 or 8, each first regular expression is that the sublist of second regular expression reaches Formula;
Described device further includes:
Setup module, for each second regular expression, for each subexpression setting one in second regular expression One corresponding mark.
10. device as claimed in claim 9, the mark includes capture group;
Each of described finite automata it is final to receive state corresponding at least one capture group.
11. device as claimed in claim 10, the matching module is additionally operable to, and is matched according to the finite automata described Final when as a result receives the corresponding capture group of state, determines the first regular expression for matching the result.
12. device as claimed in claim 7, the collector is specifically used for, using thompson construction algorithm by described Two regular expressions are compiled as non deterministic finite automaton NFA, and merge the node repeated in the NFA;Or, using Second regular expression is compiled as deterministic finite automaton DFA by Powerset construction algorithms, and is merged The node repeated in the DFA.
13. a kind of matching unit using regular expression, the equipment includes one or more memories and processor, institute Memory storage program is stated, and is configured to execute following steps by one or more of processors:
Determine each first regular expression;
Each first regular expression is combined, at least one second regular expression, second regular expression are obtained Quantity be less than the first regular expression quantity;
Each second regular expression is compiled as finite automata;
Matched text is treated using the finite automata to be matched, and result is obtained.
CN201810290338.3A 2018-04-03 2018-04-03 Matching method, device and equipment using regular expression Active CN108681554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810290338.3A CN108681554B (en) 2018-04-03 2018-04-03 Matching method, device and equipment using regular expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810290338.3A CN108681554B (en) 2018-04-03 2018-04-03 Matching method, device and equipment using regular expression

Publications (2)

Publication Number Publication Date
CN108681554A true CN108681554A (en) 2018-10-19
CN108681554B CN108681554B (en) 2021-08-24

Family

ID=63800248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810290338.3A Active CN108681554B (en) 2018-04-03 2018-04-03 Matching method, device and equipment using regular expression

Country Status (1)

Country Link
CN (1) CN108681554B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109474644A (en) * 2019-01-11 2019-03-15 深圳前海微众银行股份有限公司 Safety protecting method, device, equipment, WAF and readable storage medium storing program for executing
CN113703737A (en) * 2021-08-31 2021-11-26 深信服科技股份有限公司 Register transmission level code generation method, device, equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853301A (en) * 2010-05-25 2010-10-06 华为技术有限公司 Regular expression matching method and system
CN103617226A (en) * 2013-11-25 2014-03-05 华为技术有限公司 Regular expression matching method and device
US20140289264A1 (en) * 2013-03-21 2014-09-25 Hewlett-Packard Development Company, L.P. One pass submatch extraction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853301A (en) * 2010-05-25 2010-10-06 华为技术有限公司 Regular expression matching method and system
US20140289264A1 (en) * 2013-03-21 2014-09-25 Hewlett-Packard Development Company, L.P. One pass submatch extraction
CN103617226A (en) * 2013-11-25 2014-03-05 华为技术有限公司 Regular expression matching method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109474644A (en) * 2019-01-11 2019-03-15 深圳前海微众银行股份有限公司 Safety protecting method, device, equipment, WAF and readable storage medium storing program for executing
CN109474644B (en) * 2019-01-11 2021-04-23 深圳前海微众银行股份有限公司 Security protection method, device, equipment, WAF and readable storage medium
CN113703737A (en) * 2021-08-31 2021-11-26 深信服科技股份有限公司 Register transmission level code generation method, device, equipment and medium

Also Published As

Publication number Publication date
CN108681554B (en) 2021-08-24

Similar Documents

Publication Publication Date Title
KR102125177B1 (en) Blockchain-based data processing method and device
CN107391526A (en) A kind of data processing method and equipment based on block chain
CN107679700A (en) Business flow processing method, apparatus and server
CN109214193B (en) Data encryption and machine learning model training method and device and electronic equipment
CN107480587A (en) A kind of method and device of model configuration and image recognition
CN108270662A (en) A kind of message distributing method, device and equipment
CN108346107A (en) A kind of social content Risk Identification Method, device and equipment
EP3640813A1 (en) Random walk method, apparatus and device, and cluster-based random walk method, apparatus and device
CN108491468A (en) A kind of document processing method, device and server
CN109104327A (en) A kind of business diary generation method, device and equipment
CN116167461B (en) Model training method and device, storage medium and electronic equipment
CN107451204B (en) Data query method, device and equipment
CN109345221A (en) The checking method and device of resource circulation
CN108681554A (en) A kind of matching process, device and equipment using regular expression
CN107038058A (en) A kind of code process method and device
CN109597678A (en) Task processing method and device
CN108255471A (en) A kind of system configuration item configuration device based on configuration external member, method and apparatus
CN110069523A (en) A kind of data query method, apparatus and inquiry system
CN109656946A (en) A kind of multilist relation query method, device and equipment
CN107609051A (en) A kind of image rendering method, device and electronic equipment
CN108345536A (en) A kind of dispositions method, device and the equipment of continuous integrating environment
CN108959330A (en) A kind of processing of database, data query method and apparatus
CN111209277A (en) Data processing method, device, equipment and medium
CN110008382A (en) A kind of method, system and the equipment of determining TopN data
CN110032565A (en) A kind of method, system and electronic equipment generating statistical information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201028

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201028

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant