CN107038161A - A kind of device for filtering data and method - Google Patents

A kind of device for filtering data and method Download PDF

Info

Publication number
CN107038161A
CN107038161A CN201510408180.1A CN201510408180A CN107038161A CN 107038161 A CN107038161 A CN 107038161A CN 201510408180 A CN201510408180 A CN 201510408180A CN 107038161 A CN107038161 A CN 107038161A
Authority
CN
China
Prior art keywords
data
filtering rule
filtered
rule
abstract syntax
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510408180.1A
Other languages
Chinese (zh)
Other versions
CN107038161B (en
Inventor
丁崔灿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510408180.1A priority Critical patent/CN107038161B/en
Priority to PCT/CN2016/088302 priority patent/WO2017008650A1/en
Publication of CN107038161A publication Critical patent/CN107038161A/en
Application granted granted Critical
Publication of CN107038161B publication Critical patent/CN107038161B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The purpose of the application is to provide a kind of device for filtering data and method, the initial data to be filtered are converted into structuring data to be filtered after the initial data to be filtered of acquisition every time, and matching primitives are carried out in real time using corresponding filtering rule, filter result is obtained immediately, solve real time problems, arithmetical operation is supported simultaneously, string operations, relational calculus, logical operation, regular expression computing, set operation, and reserved expansion interface, and the filtering rule is the simple operation expression form with variable, solve filtering rule description complicated, extension is difficult and the problem of difficult management.

Description

A kind of device for filtering data and method
Technical field
The application is related to computer realm, more particularly to it is a kind of from mass data according to the filtering of setting Regular real time filtering goes out to meet the technology of the data of filtering rule.
Background technology
With information technology explosive growth, data volume is growing day by day, and various fields are to mass data The requirement of processing is continuously increased.
Filtering rule is met for how to be filtered out from mass data according to the filtering rule of setting Data, there is following several method in the prior art:
SQL statement (Structured Query Language) based on relationship memory type database come Valid data are filtered, however, this method needs to be buffered in mass data into the logical number of memory database According in table, a large amount of memory sources are taken, and the periodicity hard execution of SQL statement is to reach real-time It is required that;
Mass data storage based on Hbase (PostgreSQL database distributed, towards row) Scheme, uses a kind of Map-Reduce algorithms (programming model algorithm, for large-scale dataset Concurrent operation) valid data are filtered, however, at Map-Reduce model tasks are analogous to batch The rear computation schema of reason, to having stored in the mass data in Hbase, can only periodically be performed Computing matching result, real-time is difficult to be protected, and complicated Map-Reduce model tasks Need to write by extension to realize, it is difficult to meet the real-time variable and a variety of meters to a large amount of filtering rules The demand of calculation;
Based on CEP engines (Complex event processing, Complex Event Processing), mould is used Monitoring and Decision Control that formula matching algorithm does enterprise application system to filter valid data to be more suitable for, so And ripe CEP engines are business software mostly, user cost is high, and CEP engines have respectively From pattern rules method described, such as Drools uses XML format, and Esper uses EPL lattice Formula, the demand for different systems needs to write substantial amounts of adaptation code to use, and for nonstandard The matching algorithm of standardization needs extension to write to realize, difficulty is realized in increase, in addition, CEP engines Realize different, therefore be difficult the performance monitoring to CEP engines and tuning.
The content of the invention
The application technical problem to be solved be how in the case where being not take up a large amount of memory sources, according to The filtering rule of setting, filters out and meets filtering rule, and can expire in real time from mass data The real-time variable and the demand of a variety of calculating of a large amount of filtering rules of foot.
To achieve the above object, it was used for the method for filter data this application provides a kind of, wherein, institute The method of stating includes:
Initial data to be filtered are obtained, and it is to be filtered that the initial data to be filtered are converted into structuring Data, wherein, the structuring data to be filtered include the number of data fields mark and key-value to form According to body;
Filtering rule is loaded, wherein, each filtering rule includes regular field designation, rule name Claim and regular operation expression, and set up the mistake using the field designation of the filtering rule as index Filter the first list of rules of rule;
Structuring data to be filtered are obtained, and are advised according to data fields mark from described first Some mistakes with the regular field designation corresponding with data fields mark are then obtained in list Filter rule;
PARALLEL MATCHING is carried out to structuring data to be filtered using acquired some filtering rules Computing.
Further, the initially data to be filtered that obtain include:
The initial data to be filtered are obtained from distributed message middleware.
Further, the initial data to be filtered are converted into structuring data to be filtered also includes:
Structuring data to be filtered are sent to obstruction queue;
Obtaining structuring data to be filtered includes:
Structuring data to be filtered are obtained from the obstruction queue.
Further, structuring data to be filtered are carried out using acquired some filtering rules PARALLEL MATCHING computing includes:
Enter line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up;
Using the data volume of structuring data to be filtered as input parameter, some described can transport is traveled through Row abstract syntax tree, and carry out PARALLEL MATCHING calculating using some abstract syntax tree that run.
Further, line discipline compiling is entered to acquired filtering rule, abstract language can be run to set up Method tree includes:
The regular expression of acquired filtering rule is analyzed, to be converted into abstract syntax tree;
Precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain;
Wherein, carrying out a precomputation to the abstract syntax tree includes:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming In the operation storehouse;
When the element is operator, corresponding two operands of the operator are spread out of into the fortune Row storehouse, calculates to obtain result of calculation;
When the element is special elementses, then the special elementses are converted into programming language information structure After element in incoming operation storehouse.
Further, using it is some it is described run abstract syntax tree carry out PARALLEL MATCHING calculate include:
It is the parameter in the data volume by the variable replacement for running abstract syntax tree;
Matching primitives are carried out to the abstract syntax tree that runs using the operation storehouse.
Further, methods described also includes:
Newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule.
Further, set up using the field designation of the filtering rule as the filtering rule of index First list of rules also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule Then list;
The newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule Including at least any one of following:
Newly-increased filtering rule is increased in the Second Rule list;
Corresponding filtering rule is deleted from the Second Rule list;
Filtering rule is searched from the Second Rule list, and the filtering rule searched is repaiied Reorganization is translated.
Further, each filtering rule also includes:Notifying device bound in the filtering rule Information;
Methods described also includes:
The structuring data to be filtered for meeting the corresponding filtering rule are sent to the filtering rule institute The notifying device of binding, in case transmission.
On the other hand a kind of device for filtering data is additionally provided according to the application, wherein, it is described Equipment includes:
First device, for obtaining initial data to be filtered, and will the initially data conversion to be filtered For structuring data to be filtered, wherein, the structuring data to be filtered include data fields mark and Data volume of the key-value to form;
Second device, for loading filtering rule, wherein, each filtering rule includes rule and led Domain identifier, rule name and regular operation expression, and set up with the field designation of the filtering rule For the first list of rules of the filtering rule of index;
3rd device, for obtaining structuring data to be filtered, and according to the data fields mark Know and obtained from first list of rules with the rule neck corresponding with data fields mark Some filtering rules of domain identifier;
4th device, for utilizing acquired some filtering rules to structuring data to be filtered Carry out PARALLEL MATCHING computing.
Further, the first device includes:
The unit of the initial data to be filtered is obtained from distributed message middleware.
Further, the first device includes:
For structuring data to be filtered to be sent to the unit of obstruction queue;
The 3rd device includes:
The unit of structuring data to be filtered is obtained from the obstruction queue.
Further, the 4th device includes:
For entering line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up Unit;
It is some described for the data volume of structuring data to be filtered, as input parameter, to be traveled through Abstract syntax tree can be run, and PARALLEL MATCHING calculating is carried out using some abstract syntax tree that run Unit.
Further, it is described to be used to enter acquired filtering rule line discipline compiling, it can be transported with setting up The unit of row abstract syntax tree includes:
Analyzed for the regular expression to acquired filtering rule, to be converted into abstract syntax The module of tree;
For carrying out precomputation to the abstract syntax tree, described abstract syntax tree can be run to obtain Module, wherein, the module is used for:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming In the operation storehouse,
When the element is operator, corresponding two operands of operator are spread out of into the operation heap Stack, is calculated to obtain result of calculation,
For being special elementses when the element, then the special elementses are converted into programming language information After structural element in the incoming operation storehouse.
Further, it is described to be used to join the data volume of structuring data to be filtered as input Number, travel through it is some it is described can run abstract syntax tree, and described run abstract syntax tree using some Carrying out the unit of PARALLEL MATCHING calculating includes:
For being the parameter in the data volume by the variable replacement for running abstract syntax tree Module;
For running the mould that abstract syntax tree carries out matching primitives to described using the operation storehouse Block.
Further, the equipment also includes:
5th device, for increasing filtering rule, deletion filtering rule newly or entering to existing filtering rule Row modification compiling.
Further, the second device also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule The then unit of list;
5th device includes:
For the unit for increasing to newly-increased filtering rule in the Second Rule list;
Unit for deleting corresponding filtering rule from the Second Rule list;
Enter for searching filtering rule from the Second Rule list, and to the filtering rule searched The unit of row modification compiling.
Further, each filtering rule also includes:Notifying device bound in the filtering rule Information;
The equipment also includes:
6th device, for by the structuring data to be filtered for meeting the corresponding filtering rule send to Notifying device bound in the filtering rule, in case transmission.
Compared with prior art, the equipment for data filtering provided according to the embodiment of the application one And method use streaming computing mode, will not be cached in internal memory also will not curing data, i.e., obtain every time Take after initial data to be filtered and the initial data to be filtered are converted into structuring data to be filtered, and Matching primitives are carried out in real time using corresponding filtering rule, filter result is obtained immediately, solve magnanimity stream The real time problems of the filtering of formula data;
Further, the device and method for data filtering provided according to the embodiment of the application one The method and apparatus for crossing filter data supports arithmetical operation, string operations, relational calculus, logic Computing, regular expression computing, set operation, and reserved expansion interface, and the filtering rule It is then the simple operation expression form with variable, solves filtering rule description complexity, extends not The problem of easy and difficult management;
In addition, equipment for data filtering that the application is provided according to the embodiment of the application one and side Method is developed for autonomous Design, advantage of lower cost, and can monitor and adjust in any code path It is excellent.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, this The other features, objects and advantages of application will become more apparent upon:
Fig. 1 shows that a kind of equipment of the device for filtering data on the one hand provided according to the application is shown It is intended to;
Fig. 2 shows a kind of device for filtering data provided according to the preferred embodiment of the application one Equipment schematic diagram;
Fig. 3 shows to be used for setting for filter data according to a kind of another the of preferred embodiment offer of the application Standby equipment schematic diagram;
Fig. 4 shows that on the one hand a kind of of offer was used for the method flow diagram of filter data according to the application;
Fig. 5 shows to be used for the method for filter data according to a kind of of the preferred embodiment of the application one offer Flow chart;
Fig. 6 shows to be used for the side of filter data according to a kind of another the of preferred embodiment offer of the application Method flow chart;
It is described for filtering number that Fig. 7 shows that the one kind provided according to the preferred embodiment of the application one includes According to the equipment schematic diagram of the system of equipment;
Fig. 8 to Figure 10 shows to utilize acquired some filterings to advise according in the concrete scene of the application one Then structuring data to be filtered are carried out with the schematic diagram of PARALLEL MATCHING computing.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows that a kind of equipment of the device for filtering data on the one hand provided according to the application is shown It is intended to, wherein, the equipment 1 includes:First device 11, second device 12,3rd device 13 With the 4th device 14.
Specifically, the first device 11 is used to obtain initial data to be filtered, and will be described initial Data to be filtered are converted to structuring data to be filtered, wherein, the structuring data to be filtered include Data fields identify the data volume to form with key-value;The second device 12 is used to load filtering rule Then, wherein, each filtering rule includes regular field designation, rule name, regular operation table Up to formula, and set up the first rule using the field designation of the filtering rule as the filtering rule of index Then list;The 3rd device 13 is used to obtain structuring data to be filtered, and according to described Data fields mark is obtained from first list of rules to be had with data fields mark relatively Some filtering rules for the regular field designation answered;4th device 14 is used for using acquired Some filtering rules carry out PARALLEL MATCHING computing to structuring data to be filtered.
Further, the first device 11 is used to obtain initial data to be filtered, and will be described first The data to be filtered that begin are converted to structuring data to be filtered, wherein, the structuring packet to be filtered Data fields mark and key-value are included to the data volume of form here, structuring data to be filtered include Data fields identify the data volume to form with key-value.
Wherein, the data fields identify the classification for showing structuring data to be filtered, its In, the classification for example and is not limited to:When the CPU usage of main frame, the access delay of certain website Between etc., the data fields mark can be identified using data or word etc., in addition, any energy The mode of enough marks recognized by computer can serve as the embodiment of the data fields mark, And be incorporated herein by reference.Wherein, the key-value records the structure to the data volume of form Change the key-value of data to be filtered to the details of form (Key-Value forms), the data volume For example (it is only for example, however it is not limited to this):InstanceId=AY123456, clusterId= Hangzhou, value=92, bizTime=1427041923825, unit=Percent, wherein, Expression value (Value), equal sign or so on the right side of key (Key), each equal sign are represented on the left of each equal sign The information of both sides constitutes data volume of the key-value to form, here, the key included by the data volume-right May include it is one or more, its key-to quantity be not restricted by.
It is preferred that, the initial data to be filtered are obtained from mass data, the first device 11 Also include:The unit of the initial data to be filtered is obtained from distributed message middleware.Described One device 11 passes through distributed message middleware, it is preferred that the distributed message middleware is MetaQ (a kind of distributed message middleware), MetaQ are disappearing for a distributed, queuing model Middleware is ceased, MetaQ has the characteristics that:It ensure that strict message sequence;There is provided what is enriched Message pull mode, efficient subscriber's horizontal extension ability, real-time message subscribing mechanism, hundred million grades Message accumulates ability, make use of MetaQ company-data Sharding (burst) characteristic, can be with Multiple equipment 1 is formed the identical peer node of multiple functions and carry out cluster, and have cluster For load balance ability, the scalability under mass data background, high availability and performance are met It is required that.
It is preferred that, the first device 11 can also include:For by structuring number to be filtered The unit for extremely blocking queue according to sending;Correspondingly, the 3rd device 13 is included from the obstruction team The unit of structuring data to be filtered is obtained in row.
Here, it is described obstruction queue can block when queue full further enqueue operations until The queue of the obstruction queue is discontented with.Specifically, the first device 11 treated the structuring Filter data is sent to obstruction queue, then structuring data to be filtered, which enter, blocks queue wait, institute Wait order of the 3rd device 13 according to structuring data to be filtered is stated, according to this from the obstruction Structuring data to be filtered are obtained in queue, after structuring data to be filtered are acquired i.e. Deleted from the obstruction queue.Here, the structuring data to be filtered waited in the obstruction queue are accounted for During the full obstruction queue, the obstruction queue obstruction first device 11 is transmitted across filter data and entered Enter to block the operation of queue, during so as to avoid disposal ability not enough, EMS memory occupation is excessive, so that Play a part of peak load shifting in mass data filter process, it is to avoid handling failure.
Further, the second device 12 is used to load filtering rule, wherein, each mistake Filter rule includes:Regular field designation, rule name and regular operation expression, and set up with described The field designation of filtering rule is the first list of rules of the filtering rule indexed.
Here, the regular field designation is used for the classification for showing the filtering rule, wherein, it is described Classification for example and be not limited to:The CPU usage of main frame, access delay time of certain website etc., The regular field designation can be identified using data or word etc., in addition, any can be counted The mode of the mark of calculation machine identification can serve as the embodiment of the data fields mark, and to draw Mode is incorporated herein.Preferably, the content of the regular field designation and the data fields mark The content of knowledge is identical or essentially identical, so that the 3rd device 13 is identified according to the data fields Being obtained from first list of rules has the regular field corresponding with data fields mark Some filtering rules of mark.Wherein, the rule name can be the rule name of globally unique identification Claim, in order to the management service of filtering rule.Wherein, regular operation expression can be numeral, word The regular expression of string form composition is accorded with, for example (it is only for example, however it is not limited to this):instanceId =' AY123456'| | clusterId=hangzhou)s &&value>80, regular operation expression may be used also To include the data acquisition system type of the primary type composition such as nonnumeric, character string, for example (it is only for example, It is not limited to this):Array, Hash set etc..
Further, the second device 12 is set up using the field designation of the filtering rule as index The filtering rule the first list of rules, first list of rules be used for be 3rd device 13 Obtain filtering rule and support is provided.
Further, the 3rd device 13 obtains structuring data to be filtered, and according to institute Stating data fields mark and being obtained from first list of rules has and data fields mark phase Some filtering rules of corresponding regular field designation.Specifically, the 3rd device 13 is according to institute Data fields mark is stated to obtain with corresponding with data fields mark from first list of rules Regular field designation some filtering rules.
Further, the 4th device 14 utilizes acquired some filtering rules to the structure Change data to be filtered and carry out PARALLEL MATCHING computing.
It is preferred that, for each structuring data to be filtered, the 3rd device 13 is according to it Data fields mark obtains some some filtering rules with the regular field designation of corresponding identical, then 4th device 14 carries out one using each acquired filtering rule to structuring data to be filtered Secondary matching operation, the 4th device 14 carries out PARALLEL MATCHING meter to some acquired filtering rules Calculate, to make full use of the performance of multi-core central processing unit, improve filter efficiency.
Specifically, the 4th device 14 includes:For entering line discipline to acquired filtering rule Compiling, the unit of abstract syntax tree can be run to set up;With for by structuring data to be filtered Data volume as input parameter, travel through it is some it is described can run abstract syntax tree, and utilize some institutes The unit that abstract syntax tree carries out PARALLEL MATCHING calculating can be run by stating.
4th device 14 realizes the function of abstract syntax tree, it would be preferable to support arithmetical operation, word Symbol string computing, relational calculus, logical operation, regular expression computing and set operation etc., and reserve Expansion interface, can support user-defined computing etc..
Further, the filtering rule acquired in 14 pairs of the 4th device enters line discipline compiling, with Foundation can run abstract syntax tree (AST, Abstract Syntax Tree), here, described abstract Syntax tree is the tree-shaped form of expression of the abstract syntax structure of regular expression.
Specifically, it is described to be used to enter acquired filtering rule line discipline compiling, be run with setting up The unit of abstract syntax tree includes:Divided for the regular expression to acquired filtering rule Analysis, to be converted into the module of abstract syntax tree;With for the abstract syntax tree carry out precomputation, So that the module of abstract syntax tree can be run described in acquisition.
Specifically, the regular expression of acquired filtering rule is analyzed, it is abstract to be converted into Syntax tree, can be realized using Antlr (Another Tool for Language Recognition), User-defined filtering rule expression formula can be converted into abstract syntax tree;By to regular expression Formula carries out the Token streams that morphological analysis obtains AST, and Token streams (token) are identified including analysis Character string rule various arithmetic operations symbol, arithmetic operation symbol includes but is not limited to for example:Operator, Numeral, character string, variable, regular expression etc..
Wherein, arithmetic operation symbol is for example including the example below code:
In specific application scenarios, the regular expression of such as filtering rule is following character string forms Content:
CPU>90/100and clusterId in[‘hz’,’qd’]and instanceId like‘AK47\w+’
Fig. 8 to Figure 10 shows to utilize acquired some filterings to advise according in the concrete scene of the application one Then structuring data to be filtered are carried out with the schematic diagram of PARALLEL MATCHING computing.By writing Antlr Morphological analysis rule, obtains AST token streams as shown in Figure 9, and preservation form in systems Issue of priority is solved using operation expression suffix notation postfix notation, as shown in figure 8, preservation form is: OP:Operator, Num:Numeral, Var:Variable, Regex:Regular expression, StrArray:Character string number Group.
Then, precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain. Wherein, the constant expression that the precomputation is used for during AST token are flowed is precalculated, with Judge whether subexpression can calculate, and each element in the abstract syntax tree is checked by precomputation Whether it is specific type, the element of specific type therein is converted into programming language information structural elements Element, such as, but not limited to, Like operating parameters element is explained and is converted into regular expression, by In Operating parameter element, which is explained, is converted into set.Can be by the constant expression in AST by precomputation Advance budget is carried out, so that accelerate processing speed during operation, and the cycling of elements of specific type is program Language data structure element, wherein, the element of the specific type is nonnumeric, character string composition The element of primary type, such as, but not limited to data acquisition system type, such as, but not limited to array, Hash Map, Hash set etc..
Example is connected, in specific scene, the 4th device 14 is to described abstract shown in Fig. 9 Syntax tree carries out a precomputation, and result of calculation is that can run abstract syntax tree (AST), wherein, AST token streams are as shown in Figure 10, wherein, " 0.9 ", " Java.util.HashSet [' hz ', ' qd '] " And " Java.util.regex.pattern ' AK47 W+ ' " is the result of calculation Jing Guo precomputation.
In an optional embodiment, the code sample for carrying out precomputation is as follows:
Certain those skilled in the art from now on may be used it should be understood that above-mentioned example code is only for example The other forms such as method, code for the progress precomputation that can occur, are such as applicable the application, can be with The mode of reference is contained within the protection domain of the application.
Specifically, for carrying out precomputation to the abstract syntax tree, with obtain it is described can run it is abstract The module of syntax tree, wherein, the module is used for:Operation storehouse is created according to the abstract syntax tree, By the element in the abstract syntax tree it is incoming it is described operation storehouse in, when the element be operator When, corresponding two operands of operator are spread out of into the operation storehouse, calculated to obtain result of calculation, For being special elementses when the element, then the special elementses are converted into programming language information structure After element in the incoming operation storehouse.
Further, the 4th device 14 is also included the data of structuring data to be filtered Body as input parameter, travel through it is some it is described can run abstract syntax tree, and described transported using some Row abstract syntax tree carries out the unit of PARALLEL MATCHING calculating.
The process and one that PARALLEL MATCHING calculates computing is carried out using some abstract syntax tree that run Time precomputation is identical, and all expression formulas when running under normal circumstances in AST are can computational chart Up to formula, so final result of calculation is the value of a determination, the value be Boolean FALSE or TRUE, if the Boolean of result of calculation is TRUE, the structural data is then judged as meeting The filtering rule.
Here, by using it is described run abstract syntax tree to the data carry out matching primitives, when Herein described equipment 1 has been assigned to 1000 filtering rules, then for each structuring This 1000 filtering rules are concurrently performed by data to be filtered in the thread pool of the equipment 1 With computing, concurrent filtering rule is come with the performance for making full use of multi-core CPU.
It is specifically, described to be used for the data volume of structuring data to be filtered as input parameter, Abstract syntax tree can be run described in traversal is some, and is carried out using some abstract syntax tree that run The unit that PARALLEL MATCHING is calculated includes:For being institute by the variable replacement for running abstract syntax tree State the module of the parameter in data volume;For running abstract syntax to described using the operation storehouse Tree carries out the module of matching primitives.
Wherein, it is the parameter in the data volume by the variable replacement for running abstract syntax tree Code sample is as follows:
Matching primitives are carried out to the abstract syntax tree that runs using the operation storehouse, wherein, it is right The each nodes of AST are handled, and it is as follows to be put into the code sample of run time stack:
Wherein, the code sample for carrying out corresponding computing to the operator node in AST is as follows:
Wherein, the code sample for carrying out matching primitives is as follows:
Certain those skilled in the art from now on may be used it should be understood that above-mentioned example code is only for example The other forms such as method, the code that can occur, are such as applicable the application, can wrap by reference It is contained within the protection domain of the application.
Hereafter, structuring data to be filtered can be further processed for the equipment 1, for example Alarm etc..
Fig. 2 shows a kind of device for filtering data provided according to the preferred embodiment of the application one Equipment schematic diagram, the equipment 1 includes:First device 11 ', second device 12 ', 3rd device 13 ', the 4th device 14 ' and the 5th device 15 '.
Shown in the content and Fig. 1 of the first device 11 ', 3rd device 13 ' and the 4th device 14 ' The first device 11 of equipment 1,3rd device 13 are identical or essentially identical with the content of the 4th device 14, For simplicity, repeat no more, be only incorporated herein by reference.
It is preferred that, base of the second device 12 ' in the content for quoting the second device 12 shown in Fig. 1 On plinth, the second device 12 ' also includes:It is rope to set up according to the rule name of the filtering rule The unit of the Second Rule list of the filtering rule drawn;The second device 12 ' is according to the mistake Filter the regular field designation of rule and the rule name of the filtering rule is built for the index of two dimensions Vertical first list of rules and Second Rule list, wherein, with the regular field designation of the filtering rule Supplied to search during filter data for the first list of rules of index, using the rule name of the filtering rule as The Second Rule list of index supplies to search when the management and maintenance of filtering rule.The structuring is obtained to treat When crossing filter data, the filtering rule in the list of rules of matched and searched first is identified according to data fields, is looked for To the list of corresponding filtering rule, and the list of the filtering rule is traveled through, number to be filtered will be formatted According to data volume as input parameter, concurrent matching primitives are done to each rule in list.Described Two list of rules are easy to be managed filtering rule.
5th device 15 ' is used to increase filtering rule newly, deletes filtering rule or to existing filtering Rule is modified compiling.
Specifically, the 5th device 15 ' includes being used for newly-increased filtering rule is increased into described the Unit in two list of rules;For deleting corresponding filtering rule from the Second Rule list Unit;For searching filtering rule, and the filtering rule to being searched from the Second Rule list Modify the unit of compiling.5th device 15 ' can modify and additions and deletions to filtering rule Operation, improves the flexibility of filtering rule.
Fig. 3 shows a kind of device for filtering data provided according to another preferred embodiment of the application Equipment schematic diagram, wherein, the equipment 1 includes first device 11 ", second device 12 ", the Three devices 13 ", the 4th device 14 ", the 5th device 15 " and the 6th device 16 ".
Wherein, the first device 11 ", second device 12 ", 3rd device 13 ", the 4th device 14 " and the first device 11 ' of the 5th equipment 1 shown in device 15 " and Fig. 2, second device 12 ', 3rd device 13 ', the 4th device 14 ' are identical or essentially identical with the content of the 5th device 15 ", are letter For the sake of bright, repeat no more, and be incorporated herein by reference.
Here, each filtering rule also includes:The letter of notifying device bound in the filtering rule Breath;6th device 16 ' is used for the structuring data to be filtered by the corresponding filtering rule is met Send to the notifying device bound in the filtering rule, in case transmission.Wherein, here, the notifying device It is one group of realization to reserving notification interface, customized advice method can be achieved, such as using difference Host-host protocol, different compression algorithms, different serializing algorithms are transmitted into down-stream system cluster In different systems.Wherein, the notifying device can be carried out freely when filtering rule is created Combination assembling is tied to any filtering rule.
Fig. 4 shows that on the one hand a kind of of offer was used for the method flow diagram of filter data according to the application, Wherein, methods described includes:Step S11, step S12, step S13 and step S14.
Specifically, the step S11 includes:Initial data to be filtered are obtained, and are initially treated described Filtering data are converted to structuring data to be filtered, wherein, the structuring data to be filtered include number According to the data volume of field designation and key-value to form;The step S12 includes:Load filtering rule, Wherein, each filtering rule includes regular field designation, rule name, regular operation expression, And first rules column of the foundation using the field designation of the filtering rule as the filtering rule of index Table;The step S13 includes:Structuring data to be filtered are obtained, and are led according to the data Domain identifier is obtained from first list of rules with the rule corresponding with data fields mark Then some filtering rules of field designation;The step S14 includes:Utilize acquired some filterings Rule carries out PARALLEL MATCHING computing to structuring data to be filtered.
Further, in the step S11:Initial data to be filtered are obtained, and will be described initial Data to be filtered are converted to structuring data to be filtered, wherein, the structuring data to be filtered include Data fields identify the data volume with key-value to form here, structuring data to be filtered include number According to the data volume of field designation and key-value to form.
Wherein, the data fields identify the classification for showing structuring data to be filtered, its In, the classification for example and is not limited to:When the CPU usage of main frame, the access delay of certain website Between etc., the data fields mark can be identified using data or word etc., in addition, any energy The mode of enough marks recognized by computer can serve as the embodiment of the data fields mark, And be incorporated herein by reference.Wherein, the key-value records the structure to the data volume of form Change the key-value of data to be filtered to the details of form (Key-Value forms), the data volume For example (it is only for example, however it is not limited to this):InstanceId=AY123456, clusterId= Hangzhou, value=92, bizTime=1427041923825, unit=Percent, wherein, Expression value (Value), equal sign or so on the right side of key (Key), each equal sign are represented on the left of each equal sign The information of both sides constitutes data volume of the key-value to form, here, the key included by the data volume-right May include it is one or more, its key-to quantity be not restricted by.
It is preferred that, the initial data to be filtered are obtained from mass data, and the step S11 is also wrapped Include:The initial data to be filtered are obtained from distributed message middleware, by distributed message Between part, it is preferred that MetaQ (a kind of distributed message middleware) is a distributed, queue mould The message-oriented middleware of type, has the characteristics that:It ensure that strict message sequence;Abundant disappear is provided Breath pull mode, efficient subscriber's horizontal extension ability, real-time message subscribing mechanism, hundred million grades disappear Accumulation ability is ceased, MetaQ company-data Sharding (burst) characteristic is make use of, Fig. 7 shows Go out is used to filter data equipment described in a kind of application provided according to the preferred embodiment of the application one The equipment schematic diagram of system, multiple equipment 1 forms the identical peer node of multiple functions and collected Group, and cluster is possessed load balance ability, the scalability under mass data background is met, High availability and performance requirement.
It is preferred that, the step S11 also includes:Structuring data to be filtered are sent to obstruction Queue;Correspondingly, the step S13 includes:The structuring is obtained from the obstruction queue to treat Cross filter data.
Here, it is described obstruction queue can block when queue full further enqueue operations until The queue of the obstruction queue is discontented with.Specifically, the step S11 is by structuring number to be filtered According to sending to queue is blocked, then structuring data to be filtered enter obstruction queue and waited, the step Rapid S13 is obtained from the obstruction queue according to this according to the wait order of structuring data to be filtered Structuring data to be filtered are taken, i.e. from the resistance after structuring data to be filtered are acquired Queue is filled in delete.Here, the structuring data to be filtered waited in the obstruction queue take the resistance When filling in queue, the obstruction queue blocks the step S11 and is transmitted across filter data into obstruction queue Operation, during so as to avoid disposal ability not enough, EMS memory occupation is excessive, so that in mass data mistake Play a part of peak load shifting during filter, it is to avoid handling failure.
Further, in the step S12, filtering rule is loaded, wherein, each filtering Rule includes:Regular field designation, rule name and regular operation expression, and set up with the mistake The field designation of filter rule is the first list of rules of the filtering rule indexed.
Here, the regular field designation is used for the classification for showing the filtering rule, wherein, it is described Classification for example and be not limited to:The CPU usage of main frame, access delay time of certain website etc., The regular field designation can be identified using data or word etc., in addition, any can be counted The mode of the mark of calculation machine identification can serve as the embodiment of the data fields mark, and to draw Mode is incorporated herein.Preferably, the content of the regular field designation and the data fields mark The content of knowledge is identical or essentially identical, so that the step S13 is identified from institute according to the data fields State to obtain in the first list of rules and there is the regular field designation corresponding with data fields mark Some filtering rules.Wherein, the rule name can be the rule name of globally unique identification, In order to the management service of filtering rule.Wherein, regular operation expression can be numeral, character string The regular expression of form composition, for example (it is only for example, however it is not limited to this):InstanceId= ' AY123456'| | clusterId=hangzhou)s &&value>80, regular operation expression can be with Include the data acquisition system type of the primary type composition such as nonnumeric, character string, for example (it is only for example, It is not limited to this):Array, Hash set etc..
Further, the step S12 includes:Set up using the field designation of the filtering rule as rope First list of rules of the filtering rule drawn, wherein, first list of rules is used to be step S13 obtains filtering rule and provides support.
Further, in the step S13, the acquisition structuring data to be filtered, and according to The data fields mark obtains to have from first list of rules to be identified with the data fields Some filtering rules of corresponding regular field designation.Specifically, the step S13 is according to described Data fields mark is obtained with corresponding with data fields mark from first list of rules Some filtering rules of regular field designation.
Further, in the step S14, using acquired some filtering rules to the knot Structureization data to be filtered carry out PARALLEL MATCHING computing.
It is preferred that, for each structuring data to be filtered, the step S13 is according to its data Field designation obtains some some filtering rules with the regular field designation of corresponding identical, then step S14 carries out once matching to structuring data to be filtered using each acquired filtering rule and transported Calculate, the step S14 carries out PARALLEL MATCHING calculating to some acquired filtering rules, with fully profit With the performance of multi-core central processing unit, filter efficiency is improved.
Specifically, the step S14 includes:Enter line discipline compiling to acquired filtering rule, with Foundation can run abstract syntax tree;Join the data volume of structuring data to be filtered as input Number, travel through it is some it is described can run abstract syntax tree, and described run abstract syntax tree using some Carry out PARALLEL MATCHING calculating.
The step S14 realizes the function of abstract syntax tree, it would be preferable to support arithmetical operation, character string Computing, relational calculus, logical operation, regular expression computing and set operation etc., and reserved expansion Interface is opened up, user-defined computing etc. can be supported.
Further, line discipline compiling is entered to acquired filtering rule, abstract language can be run to set up Method tree (AST, Abstract Syntax Tree), here, the abstract syntax tree is rule list Up to the tree-shaped form of expression of the abstract syntax structure of formula.
Wherein, line discipline compiling is entered to acquired filtering rule, abstract syntax tree can be run to set up Including:Analyzed for the regular expression to acquired filtering rule, to be converted into abstract language Method tree, specifically, can use Antlr (Another Tool for Language Recognition) To realize, user-defined filtering rule expression formula can be converted into abstract syntax tree;By right Regular expression carries out the Token streams that morphological analysis obtains AST, and Token streams (token) include dividing The various arithmetic operation symbols of the character string rule identified are analysed, arithmetic operation symbol includes but is not limited to for example: Operator, numeral, character string, variable, regular expression etc..
Wherein, the 4th device 14 of equipment 1 is converted shown in the code sample and Fig. 1 of arithmetic operation symbol Abstract syntax tree arithmetic operation symbol code sample content it is identical or essentially identical, be concise rise See, repeat no more, be only incorporated herein by reference.
In specific application scenarios, the regular expression of such as filtering rule is following character string forms Content:
CPU>90/100and clusterId in[‘hz’,’qd’]and instanceId like‘AK47\w+’
By writing Antlr morphological analyses rule, AST token streams as shown in Figure 9 are obtained, and Preservation form in systems solves issue of priority using operation expression suffix notation postfix notation, such as schemes Shown in 8, preservation form is:OP:Operator, Num:Numeral, Var:Variable, Regex:Regular expressions Formula, StrArray:Character string dimension.
Then, precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain. Wherein, the constant expression that the precomputation is used for during AST token are flowed is precalculated, with Judge whether subexpression can calculate, and each element in the abstract syntax tree is checked by precomputation Whether it is specific type, the element of specific type therein is converted into programming language information structural elements Element, such as, but not limited to, Like operating parameters element is explained and is converted into regular expression, by In Operating parameter element, which is explained, is converted into set.Can be by the constant expression in AST by precomputation Advance budget is carried out, so that accelerate processing speed during operation, and the cycling of elements of specific type is program Language data structure element, wherein, the element of the specific type is nonnumeric, character string composition The element of primary type, such as, but not limited to data acquisition system type, such as, but not limited to array, Hash Map, Hash set etc..
Example is connected, in specific scene, the abstract syntax tree shown in Fig. 9 is carried out once pre- Calculate, result of calculation is that can run abstract syntax tree (AST), wherein, AST token streams are such as Shown in Figure 10, wherein, " 0.9 ", " Java.util.HashSet [' hz ', ' qd '] " and " Java.util.regex.pattern ' AK47 W+ ' " is the result of calculation Jing Guo precomputation.
Showing for precomputation can be carried out with the 4th device 14 shown in Fig. 1 by carrying out the code sample of precomputation The content of example code is identical or essentially identical, for simplicity, repeats no more, only by reference It is incorporated herein.
Specifically, carrying out precomputation to the abstract syntax tree includes:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming In the operation storehouse;When the element is operator, corresponding two operands of operator are passed Go out and run storehouse, calculate to obtain result of calculation;When the element is special elementses, then by the spy Different element is converted to after programming language information structural element in incoming operation storehouse.
Further, using the data volume of structuring data to be filtered as input parameter, if traversal Abstract syntax tree can be run described in dry, and parallel is carried out using some abstract syntax tree that run Process with calculating, PARALLEL MATCHING calculating computing is carried out using some abstract syntax tree that run Process is identical with a precomputation, and all expression formulas when running under normal circumstances in AST are Can calculation expression, so final result of calculation be one determination value, the value be Boolean FALSE or TRUE, if the Boolean of result of calculation is TRUE, the structural data is then It is judged as meeting the filtering rule.
Here, described calculated by profit using some abstract syntax tree progress PARALLEL MATCHINGs that can run Run abstract syntax tree with described matching primitives are carried out to the data, for example, when herein described Equipment 1 has been assigned to 1000 filtering rules, then for each structuring data to be filtered, Matching operation is concurrently performed to this 1000 filtering rules in the thread pool of the equipment 1, to fill Divide using the performance of multi-core CPU come concurrent filtering rule.
Specifically, using it is some it is described run abstract syntax tree carry out PARALLEL MATCHING calculate include:Will The variable replacement for running abstract syntax tree is the parameter in the data volume;Utilize the operation Storehouse carries out matching primitives to the abstract syntax tree that runs.
By example of the variable replacement for running abstract syntax tree for the parameter in the data volume 4th device 14 of code and the equipment 1 in described Fig. 1 replace code sample content is identical or base This is identical, for simplicity, repeats no more, is only incorporated herein by reference.
The code sample of corresponding computing and setting in described Fig. 1 are carried out to the operator node in AST The content for the code sample that standby 1 the 4th device 14 carries out corresponding computing is identical or essentially identical, For simplicity, repeat no more, be only incorporated herein by reference.
Similarly, matching primitives and the 4th device 14 progress of the equipment 1 in described Fig. 1 are carried out The content of code sample with calculating is identical or essentially identical, for simplicity, repeats no more, only with The mode of reference is incorporated herein.
Hereafter, structuring data to be filtered can also be further processed for methods described, for example Alarm etc..
Fig. 5 shows to be used for the method stream of filter data according to a kind of of the preferred embodiment of the application one offer Journey schematic diagram, methods described includes:Step S11 ', step S12 ', step S13 ', step S14 ' With step S15 '.
Step S11 shown in the step S11 ', step S13 ' and step S14 ' content and Fig. 4, Step S12 is identical or essentially identical with step S14 content, for simplicity, repeats no more, only It is incorporated herein by reference.
It is preferred that, the step S12 ' quote Fig. 4 shown in step S12 content on the basis of, The step S12 ' also includes:Set up the mistake for index according to the rule name of the filtering rule Filter the Second Rule list of rule;The step S12 ' according to the filtering rule regular field designation Rule name with the filtering rule is that the index of two dimensions sets up the first list of rules and second List of rules, wherein, using the regular field designation of the filtering rule as the first list of rules of index For filter data when search, using the rule name of the filtering rule as index Second Rule list supply Searched when the management and maintenance of filtering rule.When obtaining structuring data to be filtered, according to data Filtering rule in the list of rules of field designation matched and searched first, finds the row of corresponding filtering rule Table, and the list of the filtering rule is traveled through, the data volume for formatting data to be filtered is joined as input Number, concurrent matching primitives are done to each rule in list.The Second Rule list is easy to filtering Rule is managed.
In the step S15 ', increase filtering rule newly, delete filtering rule or existing filtering is advised Then modify compiling.
Specifically, the step S15 ' includes following at least any one:By newly-increased filtering rule increase Into the Second Rule list;Corresponding filtering rule is deleted from the Second Rule list;From Filtering rule is searched in the Second Rule list, and volume of being modified to the filtering rule searched Translate, the step S15 ' can modify and additions and deletions operation to filtering rule, improve filtering rule Flexibility.
Fig. 6 shows to be used for the method for filter data according to a kind of another the of preferred embodiment offer of the application Flow chart, wherein, methods described includes step S11 ", step S12 ", step S13 ", step S14 ", step S15 " and step S16 ".
Wherein, the step S11 ", step S12 ", step S13 ", step S14 " and step Step S11 ', step S12 ', step S13 ', step S14 ' shown in rapid S15 " and Fig. 5 and Step S15 " content is identical or essentially identical, for simplicity, repeats no more, and with reference Mode is incorporated herein.
Here, each filtering rule also includes:The letter of notifying device bound in the filtering rule Breath;In the step S16 ', the structuring data to be filtered for meeting the corresponding filtering rule are sent out The notifying device bound in the filtering rule is delivered to, in case transmission.Here, the notifying device is to reserved One group of realization of notification interface, can be achieved customized advice method, such as is assisted using different transmission View, different compression algorithms, different serializing algorithms, which is transmitted into down-stream system cluster, different is In system.Wherein, the notifying device can carry out freely combination assembling when filtering rule is created It is tied to any filtering rule.
Compared with prior art, the equipment for data filtering provided according to the embodiment of the application one And method use streaming computing mode, will not be cached in internal memory also will not curing data, i.e., obtain every time Take after initial data to be filtered and the initial data to be filtered are converted into structuring data to be filtered, and Matching primitives are carried out in real time using corresponding filtering rule, filter result is obtained immediately, solve magnanimity stream The real time problems of the filtering of formula data;
Further, the device and method for data filtering provided according to the embodiment of the application one The method and apparatus for crossing filter data supports arithmetical operation, string operations, relational calculus, logic Computing, regular expression computing, set operation, and reserved expansion interface, and the filtering rule It is then the simple operation expression form with variable, solves filtering rule description complexity, extends not The problem of easy and difficult management;
In addition, equipment for data filtering that the application is provided according to the embodiment of the application one and side Method is developed for autonomous Design, advantage of lower cost, and can monitor and adjust in any code path It is excellent.
Through multiple performance test, obtained performance indications are substantially in the virtual of the core 8G of separate unit 4 configurations Machine can support 500,000 filtering rules, and processing stream data TPS reaches 20000, filtered out effectively Data TPS reaches 2000, and system average load is stable in load1-4 or so, and cpu resource is obtained Effectively utilize.
In one typical configuration of the application, terminal, the equipment of service network and trusted party include One or more processors (CPU), input/output interface, network interface and internal memory.Internal memory may Including the volatile memory in computer-readable medium, random access memory (RAM) and/ Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory.Internal memory It is the example of computer-readable medium.Computer-readable medium includes permanent and impermanency, can Mobile and non-removable media can be realized that information is stored by any method or technique.Information can be Computer-readable instruction, data structure, the module of program or other data.The storage medium of computer Example include, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), Dynamic random access memory (DRAM), other kinds of random access memory (RAM), only Read memory (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank Or other memory techniques, read-only optical disc read-only storage (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storages are set Standby or any other non-transmission medium, the information that can be accessed by a computing device available for storage.According to Herein defines, and computer-readable medium does not include non-temporary computer readable media (transitory Media), such as the data-signal and carrier wave of modulation.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware, For example, can be using application specific integrated circuit (ASIC), general purpose computer or any other is similar hard Part equipment is realized.In one embodiment, the software program of the application can pass through computing device To realize steps described above or function.Similarly, the software program of the application (includes the number of correlation According to structure) it can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetic Or CD-ROM driver or floppy disc and similar devices.In addition, some steps or function of the application can be used Hardware realizes, for example, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the application can be applied to computer program product, such as computer program Instruction, when it is computer-executed, by the operation of the computer, can call or provide basis The present processes and/or technical scheme.And the programmed instruction of the present processes is called, it may be deposited Store up in fixed or moveable recording medium, and/or by broadcast or other signal bearing medias Data flow and be transmitted, and/or be stored according to the computer equipment of described program instruction operation In working storage.Here, including a device, the device bag according to one embodiment of the application The memory for storing computer program instructions and the processor for execute program instructions are included, its In, when the computer program instructions are by the computing device, trigger the plant running and be based on foregoing According to the methods and/or techniques scheme of multiple embodiments of the application.
It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment Section, and in the case of without departing substantially from spirit herein or essential characteristic, can be with other specific Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary , and be nonrestrictive, scope of the present application is limited by appended claims rather than described above It is fixed, it is intended that all changes fallen in the implication and scope of the equivalency of claim are included In the application.The right that any reference in claim should not be considered as involved by limitation will Ask.Furthermore, it is to be understood that the word of " comprising " one is not excluded for other units or step, odd number is not excluded for plural number.Dress Software can also be passed through by a unit or device by putting the multiple units stated in claim or device Or hardware is realized.The first, the second grade word is used for representing title, and is not offered as any specific Order.

Claims (18)

1. a kind of be used for the method for filter data, wherein, methods described includes:
Initial data to be filtered are obtained, and it is to be filtered that the initial data to be filtered are converted into structuring Data, wherein, the structuring data to be filtered include data fields mark and key-value to form Data volume;
Filtering rule is loaded, wherein, each filtering rule includes regular field designation, rule name Claim and regular operation expression, and set up the mistake using the field designation of the filtering rule as index Filter the first list of rules of rule;
Structuring data to be filtered are obtained, and are advised according to data fields mark from described first Some mistakes with the regular field designation corresponding with data fields mark are then obtained in list Filter rule;
PARALLEL MATCHING is carried out to structuring data to be filtered using acquired some filtering rules Computing.
2. according to the method described in claim 1, wherein, obtaining initial data to be filtered includes:
The initial data to be filtered are obtained from distributed message middleware.
3. method according to claim 1 or 2, wherein, the initial data to be filtered are turned Being changed to structuring data to be filtered also includes:
Structuring data to be filtered are sent to obstruction queue;
Obtaining structuring data to be filtered includes:
Structuring data to be filtered are obtained from the obstruction queue.
4. according to the method in any one of claims 1 to 3, wherein, if using acquired Dry filtering rule carries out PARALLEL MATCHING computing to structuring data to be filtered to be included:
Enter line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up;
Using the data volume of structuring data to be filtered as input parameter, some described can transport is traveled through Row abstract syntax tree, and carry out PARALLEL MATCHING calculating using some abstract syntax tree that run.
5. method according to claim 4, wherein, line discipline is entered to acquired filtering rule Compiling, can run abstract syntax tree with foundation includes:
The regular expression of acquired filtering rule is analyzed, to be converted into abstract syntax tree;
Precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain;
Wherein, carrying out a precomputation to the abstract syntax tree includes:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming In the operation storehouse;
When the element is operator, corresponding two operands of the operator are spread out of into the fortune Row storehouse, calculates to obtain result of calculation;
When the element is special elementses, then the special elementses are converted into programming language information structure After element in the incoming operation storehouse.
6. the method according to claim 4 or 5, wherein, using it is some it is described run it is abstract Syntax tree, which carries out PARALLEL MATCHING calculating, to be included:
It is the parameter in the data volume by the variable replacement for running abstract syntax tree;
Matching primitives are carried out to the abstract syntax tree that runs using the operation storehouse.
7. method according to any one of claim 1 to 6, wherein, methods described also includes:
Newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule.
8. method according to claim 7, wherein, set up and marked with the field of the filtering rule Knowing the first list of rules of the filtering rule for index also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule Then list;
The newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule Including at least any one of following:
Newly-increased filtering rule is increased in the Second Rule list;
Corresponding filtering rule is deleted from the Second Rule list;
Filtering rule is searched from the Second Rule list, and the filtering rule searched is repaiied Reorganization is translated.
9. method according to any one of claim 1 to 8, wherein, each filtering rule Then also include:The information of notifying device bound in the filtering rule;
Methods described also includes:
The structuring data to be filtered for meeting the corresponding filtering rule are sent to the filtering rule institute The notifying device of binding, in case transmission.
10. a kind of device for filtering data, wherein, the equipment includes:
First device, for obtaining initial data to be filtered, and will the initially data conversion to be filtered For structuring data to be filtered, wherein, the structuring data to be filtered include data fields mark and Data volume of the key-value to form;
Second device, for loading filtering rule, wherein, each filtering rule includes rule and led Domain identifier, rule name and regular operation expression, and set up with the field designation of the filtering rule For the first list of rules of the filtering rule of index;
3rd device, for obtaining structuring data to be filtered, and according to the data fields mark Know and obtained from first list of rules with the rule neck corresponding with data fields mark Some filtering rules of domain identifier;
4th device, for utilizing acquired some filtering rules to structuring data to be filtered Carry out PARALLEL MATCHING computing.
11. equipment according to claim 10, wherein, the first device includes:
The unit of the initial data to be filtered is obtained from distributed message middleware.
12. the equipment according to claim 10 or 11, wherein, the first device includes:
For structuring data to be filtered to be sent to the unit of obstruction queue;
The 3rd device includes:
The unit of structuring data to be filtered is obtained from the obstruction queue.
13. the equipment according to any one of claim 10 to 12, wherein, the 4th dress Put including:
For entering line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up Unit;
It is some described for the data volume of structuring data to be filtered, as input parameter, to be traveled through Abstract syntax tree can be run, and PARALLEL MATCHING calculating is carried out using some abstract syntax tree that run Unit.
14. equipment according to claim 13, wherein, it is described to be used to advise acquired filtering Then enter line discipline compiling, included with the unit that foundation can run abstract syntax tree:
Analyzed for the regular expression to acquired filtering rule, to be converted into abstract syntax The module of tree;
For carrying out precomputation to the abstract syntax tree, described abstract syntax tree can be run to obtain Module, wherein, the module is used for:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming In the operation storehouse,
When the element is operator, corresponding two operands of operator are spread out of into the operation heap Stack, is calculated to obtain result of calculation,
For being special elementses when the element, then the special elementses are converted into programming language information After structural element in the incoming operation storehouse.
15. the equipment according to claim 13 or 14, wherein, it is described to be used for the structure Change the data volumes of data to be filtered as input parameter, travel through it is some it is described can run abstract syntax tree, And included using some units for running abstract syntax tree progress PARALLEL MATCHING calculating:
For being the parameter in the data volume by the variable replacement for running abstract syntax tree Module;
For running the mould that abstract syntax tree carries out matching primitives to described using the operation storehouse Block.
16. the equipment according to any one of claim 10 to 15, wherein, the equipment is also Including:
5th device, for increasing filtering rule, deletion filtering rule newly or entering to existing filtering rule Row modification compiling.
17. equipment according to claim 16, wherein, the second device also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule The then unit of list;
5th device includes:
For the unit for increasing to newly-increased filtering rule in the Second Rule list;
Unit for deleting corresponding filtering rule from the Second Rule list;
Enter for searching filtering rule from the Second Rule list, and to the filtering rule searched The unit of row modification compiling.
18. the equipment according to any one of claim 10 to 17, wherein, each mistake Filter rule also includes:The information of notifying device bound in the filtering rule;
The equipment also includes:
6th device, for by the structuring data to be filtered for meeting the corresponding filtering rule send to Notifying device bound in the filtering rule, in case transmission.
CN201510408180.1A 2015-07-13 2015-07-13 Equipment and method for filtering data Active CN107038161B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510408180.1A CN107038161B (en) 2015-07-13 2015-07-13 Equipment and method for filtering data
PCT/CN2016/088302 WO2017008650A1 (en) 2015-07-13 2016-07-04 Device and method for filtering data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510408180.1A CN107038161B (en) 2015-07-13 2015-07-13 Equipment and method for filtering data

Publications (2)

Publication Number Publication Date
CN107038161A true CN107038161A (en) 2017-08-11
CN107038161B CN107038161B (en) 2021-03-26

Family

ID=57757755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510408180.1A Active CN107038161B (en) 2015-07-13 2015-07-13 Equipment and method for filtering data

Country Status (2)

Country Link
CN (1) CN107038161B (en)
WO (1) WO2017008650A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766538A (en) * 2017-10-28 2018-03-06 杭州安恒信息技术有限公司 Data filtering processing module and synchronous, asynchronous filter method based on java
CN109189807A (en) * 2018-09-13 2019-01-11 北京奇虎科技有限公司 A kind of filter method and device of alert data
WO2019061913A1 (en) * 2017-09-29 2019-04-04 上海望友信息科技有限公司 Data type identification method and system, computer readable storage medium and device
CN109672704A (en) * 2017-10-16 2019-04-23 阿里巴巴集团控股有限公司 Processing method, device and the electronic equipment of message
CN110287174A (en) * 2019-05-09 2019-09-27 北京善义善美科技有限公司 A kind of data filtering engine and system and filter method
CN110427754A (en) * 2019-08-12 2019-11-08 腾讯科技(深圳)有限公司 Network application attack detection method, device, equipment and storage medium
CN111427915A (en) * 2020-03-25 2020-07-17 京东数字科技控股有限公司 Information processing method and device, storage medium and electronic equipment
CN112068933A (en) * 2020-09-02 2020-12-11 成都鱼泡科技有限公司 Real-time distributed data monitoring method

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565338B (en) * 2020-11-10 2023-06-20 中国人民解放军战略支援部队信息工程大学 Ethernet message capturing, filtering, storing and real-time analyzing method and system
CN115047835B (en) * 2022-06-27 2024-06-04 中国核动力研究设计院 DCS-based periodic test data acquisition method, device, equipment and medium
CN116383290B (en) * 2023-03-22 2023-10-31 中国华能集团有限公司北京招标分公司 Data generalization and analysis method

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1953373A (en) * 2006-09-19 2007-04-25 清华大学 A method to filter and verify open real IPv6 source address
CN101127774A (en) * 2007-09-19 2008-02-20 中兴通讯股份有限公司 Priority processing method for initial filtering rule
CN101158948A (en) * 2006-10-08 2008-04-09 中国科学院软件研究所 Text content filtering method and system
CN101282332A (en) * 2008-05-22 2008-10-08 上海交通大学 System for generating assaulting chart facing network safety alarm incident
CN101304589A (en) * 2008-04-14 2008-11-12 中国联合通信有限公司 Method and system for monitoring and filtering garbage short message transmitted by short message gateway
CN101414929A (en) * 2008-11-18 2009-04-22 华为技术有限公司 Method, device and system for acquiring information
CN101860531A (en) * 2010-04-21 2010-10-13 北京星网锐捷网络技术有限公司 Filtering rule matching method of data packet and device thereof
CN102082728A (en) * 2010-12-28 2011-06-01 北京锐安科技有限公司 Dynamic loading method for filtering rules of network audit system
CN102231134A (en) * 2011-07-29 2011-11-02 哈尔滨工业大学 Method for detecting redundant code defects based on static analysis
CN102654864A (en) * 2011-03-02 2012-09-05 华北计算机系统工程研究所 Independent transparent security audit protection method facing real-time database
CN103116620A (en) * 2013-01-29 2013-05-22 中国电力科学研究院 Unstructured data safe filtering method based on strategy
CN103338155A (en) * 2013-07-01 2013-10-02 安徽中新软件有限公司 High-efficiency filtering method for data packets
CN103631966A (en) * 2013-12-18 2014-03-12 用友软件股份有限公司 Configurable multiple-valued matching field analysis method
CN103780460A (en) * 2014-01-15 2014-05-07 珠海市佳讯实业有限公司 System for realizing hardware filtering of TAP device through FPGA
US20140282949A1 (en) * 2013-03-15 2014-09-18 Kaarya Llc System and Method for Account Access
CN104331278A (en) * 2014-10-15 2015-02-04 南京航空航天大学 Instruction filtering method and device for specifications of ARINC661

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102467561A (en) * 2010-11-19 2012-05-23 金蝶软件(中国)有限公司 Form data filtering method and device
US8949371B1 (en) * 2011-09-29 2015-02-03 Symantec Corporation Time and space efficient method and system for detecting structured data in free text
CN103034700B (en) * 2012-12-05 2016-06-29 北京奇虎科技有限公司 The processing method of rich text content and system
CN103618733B (en) * 2013-12-06 2017-06-27 北京中创腾锐技术有限公司 A kind of data filtering system and method for being applied to mobile Internet
CN104317947B (en) * 2014-11-07 2017-12-12 南京烽火星空通信发展有限公司 A kind of real-time architecture comparing system based on mass data

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1953373A (en) * 2006-09-19 2007-04-25 清华大学 A method to filter and verify open real IPv6 source address
CN101158948A (en) * 2006-10-08 2008-04-09 中国科学院软件研究所 Text content filtering method and system
CN101127774A (en) * 2007-09-19 2008-02-20 中兴通讯股份有限公司 Priority processing method for initial filtering rule
CN101304589A (en) * 2008-04-14 2008-11-12 中国联合通信有限公司 Method and system for monitoring and filtering garbage short message transmitted by short message gateway
CN101282332A (en) * 2008-05-22 2008-10-08 上海交通大学 System for generating assaulting chart facing network safety alarm incident
CN101414929A (en) * 2008-11-18 2009-04-22 华为技术有限公司 Method, device and system for acquiring information
CN101860531A (en) * 2010-04-21 2010-10-13 北京星网锐捷网络技术有限公司 Filtering rule matching method of data packet and device thereof
CN102082728A (en) * 2010-12-28 2011-06-01 北京锐安科技有限公司 Dynamic loading method for filtering rules of network audit system
CN102654864A (en) * 2011-03-02 2012-09-05 华北计算机系统工程研究所 Independent transparent security audit protection method facing real-time database
CN102231134A (en) * 2011-07-29 2011-11-02 哈尔滨工业大学 Method for detecting redundant code defects based on static analysis
CN103116620A (en) * 2013-01-29 2013-05-22 中国电力科学研究院 Unstructured data safe filtering method based on strategy
US20140282949A1 (en) * 2013-03-15 2014-09-18 Kaarya Llc System and Method for Account Access
CN103338155A (en) * 2013-07-01 2013-10-02 安徽中新软件有限公司 High-efficiency filtering method for data packets
CN103631966A (en) * 2013-12-18 2014-03-12 用友软件股份有限公司 Configurable multiple-valued matching field analysis method
CN103780460A (en) * 2014-01-15 2014-05-07 珠海市佳讯实业有限公司 System for realizing hardware filtering of TAP device through FPGA
CN104331278A (en) * 2014-10-15 2015-02-04 南京航空航天大学 Instruction filtering method and device for specifications of ARINC661

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019061913A1 (en) * 2017-09-29 2019-04-04 上海望友信息科技有限公司 Data type identification method and system, computer readable storage medium and device
CN109672704A (en) * 2017-10-16 2019-04-23 阿里巴巴集团控股有限公司 Processing method, device and the electronic equipment of message
CN109672704B (en) * 2017-10-16 2022-02-25 阿里巴巴集团控股有限公司 Message processing method and device and electronic equipment
CN107766538A (en) * 2017-10-28 2018-03-06 杭州安恒信息技术有限公司 Data filtering processing module and synchronous, asynchronous filter method based on java
CN109189807A (en) * 2018-09-13 2019-01-11 北京奇虎科技有限公司 A kind of filter method and device of alert data
CN110287174A (en) * 2019-05-09 2019-09-27 北京善义善美科技有限公司 A kind of data filtering engine and system and filter method
CN110427754A (en) * 2019-08-12 2019-11-08 腾讯科技(深圳)有限公司 Network application attack detection method, device, equipment and storage medium
CN110427754B (en) * 2019-08-12 2024-02-13 腾讯科技(深圳)有限公司 Network application attack detection method, device, equipment and storage medium
CN111427915A (en) * 2020-03-25 2020-07-17 京东数字科技控股有限公司 Information processing method and device, storage medium and electronic equipment
CN112068933A (en) * 2020-09-02 2020-12-11 成都鱼泡科技有限公司 Real-time distributed data monitoring method

Also Published As

Publication number Publication date
CN107038161B (en) 2021-03-26
WO2017008650A1 (en) 2017-01-19

Similar Documents

Publication Publication Date Title
CN107038161A (en) A kind of device for filtering data and method
US11379755B2 (en) Feature processing tradeoff management
US10318882B2 (en) Optimized training of linear machine learning models
CN103678520B (en) A kind of multi-dimensional interval query method and its system based on cloud computing
US8959519B2 (en) Processing hierarchical data in a map-reduce framework
CN103003813B (en) Columnar storage representations of records
CN106960020B (en) A kind of method and apparatus creating concordance list
CN103793493A (en) Method and system for processing car-mounted terminal mass data
Gao et al. Handling data skew in MapReduce cluster by using partition tuning
EP3069271B1 (en) Dynamic stream computing topology
CN110175184A (en) A kind of lower drill method, system and the electronic equipment of data dimension
Dayarathna et al. Automatic optimization of stream programs via source program operator graph transformations
CN108062384A (en) The method and apparatus of data retrieval
WO2014059836A1 (en) Method and system for blog content search
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
CN109033173A (en) It is a kind of for generating the data processing method and device of multidimensional index data
Theeten et al. Chive: Bandwidth optimized continuous querying in distributed clouds
Gherissi et al. Object-centric predictive process monitoring
CN103324762A (en) Hadoop-based index creation method and indexing method thereof
CN107276912B (en) Memory, message processing method and distributed storage system
Sawyer et al. Understanding query performance in Accumulo
US11017031B2 (en) System and method of data transformation
CN113806466A (en) Path time query method and device, electronic equipment and readable storage medium
CN106940715A (en) A kind of method and apparatus of the inquiry based on concordance list
CN116975052A (en) Data processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant