CN107038161A - A kind of device for filtering data and method - Google Patents
A kind of device for filtering data and method Download PDFInfo
- Publication number
- CN107038161A CN107038161A CN201510408180.1A CN201510408180A CN107038161A CN 107038161 A CN107038161 A CN 107038161A CN 201510408180 A CN201510408180 A CN 201510408180A CN 107038161 A CN107038161 A CN 107038161A
- Authority
- CN
- China
- Prior art keywords
- data
- filtering rule
- filtered
- rule
- abstract syntax
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The purpose of the application is to provide a kind of device for filtering data and method, the initial data to be filtered are converted into structuring data to be filtered after the initial data to be filtered of acquisition every time, and matching primitives are carried out in real time using corresponding filtering rule, filter result is obtained immediately, solve real time problems, arithmetical operation is supported simultaneously, string operations, relational calculus, logical operation, regular expression computing, set operation, and reserved expansion interface, and the filtering rule is the simple operation expression form with variable, solve filtering rule description complicated, extension is difficult and the problem of difficult management.
Description
Technical field
The application is related to computer realm, more particularly to it is a kind of from mass data according to the filtering of setting
Regular real time filtering goes out to meet the technology of the data of filtering rule.
Background technology
With information technology explosive growth, data volume is growing day by day, and various fields are to mass data
The requirement of processing is continuously increased.
Filtering rule is met for how to be filtered out from mass data according to the filtering rule of setting
Data, there is following several method in the prior art:
SQL statement (Structured Query Language) based on relationship memory type database come
Valid data are filtered, however, this method needs to be buffered in mass data into the logical number of memory database
According in table, a large amount of memory sources are taken, and the periodicity hard execution of SQL statement is to reach real-time
It is required that;
Mass data storage based on Hbase (PostgreSQL database distributed, towards row)
Scheme, uses a kind of Map-Reduce algorithms (programming model algorithm, for large-scale dataset
Concurrent operation) valid data are filtered, however, at Map-Reduce model tasks are analogous to batch
The rear computation schema of reason, to having stored in the mass data in Hbase, can only periodically be performed
Computing matching result, real-time is difficult to be protected, and complicated Map-Reduce model tasks
Need to write by extension to realize, it is difficult to meet the real-time variable and a variety of meters to a large amount of filtering rules
The demand of calculation;
Based on CEP engines (Complex event processing, Complex Event Processing), mould is used
Monitoring and Decision Control that formula matching algorithm does enterprise application system to filter valid data to be more suitable for, so
And ripe CEP engines are business software mostly, user cost is high, and CEP engines have respectively
From pattern rules method described, such as Drools uses XML format, and Esper uses EPL lattice
Formula, the demand for different systems needs to write substantial amounts of adaptation code to use, and for nonstandard
The matching algorithm of standardization needs extension to write to realize, difficulty is realized in increase, in addition, CEP engines
Realize different, therefore be difficult the performance monitoring to CEP engines and tuning.
The content of the invention
The application technical problem to be solved be how in the case where being not take up a large amount of memory sources, according to
The filtering rule of setting, filters out and meets filtering rule, and can expire in real time from mass data
The real-time variable and the demand of a variety of calculating of a large amount of filtering rules of foot.
To achieve the above object, it was used for the method for filter data this application provides a kind of, wherein, institute
The method of stating includes:
Initial data to be filtered are obtained, and it is to be filtered that the initial data to be filtered are converted into structuring
Data, wherein, the structuring data to be filtered include the number of data fields mark and key-value to form
According to body;
Filtering rule is loaded, wherein, each filtering rule includes regular field designation, rule name
Claim and regular operation expression, and set up the mistake using the field designation of the filtering rule as index
Filter the first list of rules of rule;
Structuring data to be filtered are obtained, and are advised according to data fields mark from described first
Some mistakes with the regular field designation corresponding with data fields mark are then obtained in list
Filter rule;
PARALLEL MATCHING is carried out to structuring data to be filtered using acquired some filtering rules
Computing.
Further, the initially data to be filtered that obtain include:
The initial data to be filtered are obtained from distributed message middleware.
Further, the initial data to be filtered are converted into structuring data to be filtered also includes:
Structuring data to be filtered are sent to obstruction queue;
Obtaining structuring data to be filtered includes:
Structuring data to be filtered are obtained from the obstruction queue.
Further, structuring data to be filtered are carried out using acquired some filtering rules
PARALLEL MATCHING computing includes:
Enter line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up;
Using the data volume of structuring data to be filtered as input parameter, some described can transport is traveled through
Row abstract syntax tree, and carry out PARALLEL MATCHING calculating using some abstract syntax tree that run.
Further, line discipline compiling is entered to acquired filtering rule, abstract language can be run to set up
Method tree includes:
The regular expression of acquired filtering rule is analyzed, to be converted into abstract syntax tree;
Precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain;
Wherein, carrying out a precomputation to the abstract syntax tree includes:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming
In the operation storehouse;
When the element is operator, corresponding two operands of the operator are spread out of into the fortune
Row storehouse, calculates to obtain result of calculation;
When the element is special elementses, then the special elementses are converted into programming language information structure
After element in incoming operation storehouse.
Further, using it is some it is described run abstract syntax tree carry out PARALLEL MATCHING calculate include:
It is the parameter in the data volume by the variable replacement for running abstract syntax tree;
Matching primitives are carried out to the abstract syntax tree that runs using the operation storehouse.
Further, methods described also includes:
Newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule.
Further, set up using the field designation of the filtering rule as the filtering rule of index
First list of rules also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule
Then list;
The newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule
Including at least any one of following:
Newly-increased filtering rule is increased in the Second Rule list;
Corresponding filtering rule is deleted from the Second Rule list;
Filtering rule is searched from the Second Rule list, and the filtering rule searched is repaiied
Reorganization is translated.
Further, each filtering rule also includes:Notifying device bound in the filtering rule
Information;
Methods described also includes:
The structuring data to be filtered for meeting the corresponding filtering rule are sent to the filtering rule institute
The notifying device of binding, in case transmission.
On the other hand a kind of device for filtering data is additionally provided according to the application, wherein, it is described
Equipment includes:
First device, for obtaining initial data to be filtered, and will the initially data conversion to be filtered
For structuring data to be filtered, wherein, the structuring data to be filtered include data fields mark and
Data volume of the key-value to form;
Second device, for loading filtering rule, wherein, each filtering rule includes rule and led
Domain identifier, rule name and regular operation expression, and set up with the field designation of the filtering rule
For the first list of rules of the filtering rule of index;
3rd device, for obtaining structuring data to be filtered, and according to the data fields mark
Know and obtained from first list of rules with the rule neck corresponding with data fields mark
Some filtering rules of domain identifier;
4th device, for utilizing acquired some filtering rules to structuring data to be filtered
Carry out PARALLEL MATCHING computing.
Further, the first device includes:
The unit of the initial data to be filtered is obtained from distributed message middleware.
Further, the first device includes:
For structuring data to be filtered to be sent to the unit of obstruction queue;
The 3rd device includes:
The unit of structuring data to be filtered is obtained from the obstruction queue.
Further, the 4th device includes:
For entering line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up
Unit;
It is some described for the data volume of structuring data to be filtered, as input parameter, to be traveled through
Abstract syntax tree can be run, and PARALLEL MATCHING calculating is carried out using some abstract syntax tree that run
Unit.
Further, it is described to be used to enter acquired filtering rule line discipline compiling, it can be transported with setting up
The unit of row abstract syntax tree includes:
Analyzed for the regular expression to acquired filtering rule, to be converted into abstract syntax
The module of tree;
For carrying out precomputation to the abstract syntax tree, described abstract syntax tree can be run to obtain
Module, wherein, the module is used for:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming
In the operation storehouse,
When the element is operator, corresponding two operands of operator are spread out of into the operation heap
Stack, is calculated to obtain result of calculation,
For being special elementses when the element, then the special elementses are converted into programming language information
After structural element in the incoming operation storehouse.
Further, it is described to be used to join the data volume of structuring data to be filtered as input
Number, travel through it is some it is described can run abstract syntax tree, and described run abstract syntax tree using some
Carrying out the unit of PARALLEL MATCHING calculating includes:
For being the parameter in the data volume by the variable replacement for running abstract syntax tree
Module;
For running the mould that abstract syntax tree carries out matching primitives to described using the operation storehouse
Block.
Further, the equipment also includes:
5th device, for increasing filtering rule, deletion filtering rule newly or entering to existing filtering rule
Row modification compiling.
Further, the second device also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule
The then unit of list;
5th device includes:
For the unit for increasing to newly-increased filtering rule in the Second Rule list;
Unit for deleting corresponding filtering rule from the Second Rule list;
Enter for searching filtering rule from the Second Rule list, and to the filtering rule searched
The unit of row modification compiling.
Further, each filtering rule also includes:Notifying device bound in the filtering rule
Information;
The equipment also includes:
6th device, for by the structuring data to be filtered for meeting the corresponding filtering rule send to
Notifying device bound in the filtering rule, in case transmission.
Compared with prior art, the equipment for data filtering provided according to the embodiment of the application one
And method use streaming computing mode, will not be cached in internal memory also will not curing data, i.e., obtain every time
Take after initial data to be filtered and the initial data to be filtered are converted into structuring data to be filtered, and
Matching primitives are carried out in real time using corresponding filtering rule, filter result is obtained immediately, solve magnanimity stream
The real time problems of the filtering of formula data;
Further, the device and method for data filtering provided according to the embodiment of the application one
The method and apparatus for crossing filter data supports arithmetical operation, string operations, relational calculus, logic
Computing, regular expression computing, set operation, and reserved expansion interface, and the filtering rule
It is then the simple operation expression form with variable, solves filtering rule description complexity, extends not
The problem of easy and difficult management;
In addition, equipment for data filtering that the application is provided according to the embodiment of the application one and side
Method is developed for autonomous Design, advantage of lower cost, and can monitor and adjust in any code path
It is excellent.
Brief description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, this
The other features, objects and advantages of application will become more apparent upon:
Fig. 1 shows that a kind of equipment of the device for filtering data on the one hand provided according to the application is shown
It is intended to;
Fig. 2 shows a kind of device for filtering data provided according to the preferred embodiment of the application one
Equipment schematic diagram;
Fig. 3 shows to be used for setting for filter data according to a kind of another the of preferred embodiment offer of the application
Standby equipment schematic diagram;
Fig. 4 shows that on the one hand a kind of of offer was used for the method flow diagram of filter data according to the application;
Fig. 5 shows to be used for the method for filter data according to a kind of of the preferred embodiment of the application one offer
Flow chart;
Fig. 6 shows to be used for the side of filter data according to a kind of another the of preferred embodiment offer of the application
Method flow chart;
It is described for filtering number that Fig. 7 shows that the one kind provided according to the preferred embodiment of the application one includes
According to the equipment schematic diagram of the system of equipment;
Fig. 8 to Figure 10 shows to utilize acquired some filterings to advise according in the concrete scene of the application one
Then structuring data to be filtered are carried out with the schematic diagram of PARALLEL MATCHING computing.
Same or analogous reference represents same or analogous part in accompanying drawing.
Embodiment
The application is described in further detail below in conjunction with the accompanying drawings.
Fig. 1 shows that a kind of equipment of the device for filtering data on the one hand provided according to the application is shown
It is intended to, wherein, the equipment 1 includes:First device 11, second device 12,3rd device 13
With the 4th device 14.
Specifically, the first device 11 is used to obtain initial data to be filtered, and will be described initial
Data to be filtered are converted to structuring data to be filtered, wherein, the structuring data to be filtered include
Data fields identify the data volume to form with key-value;The second device 12 is used to load filtering rule
Then, wherein, each filtering rule includes regular field designation, rule name, regular operation table
Up to formula, and set up the first rule using the field designation of the filtering rule as the filtering rule of index
Then list;The 3rd device 13 is used to obtain structuring data to be filtered, and according to described
Data fields mark is obtained from first list of rules to be had with data fields mark relatively
Some filtering rules for the regular field designation answered;4th device 14 is used for using acquired
Some filtering rules carry out PARALLEL MATCHING computing to structuring data to be filtered.
Further, the first device 11 is used to obtain initial data to be filtered, and will be described first
The data to be filtered that begin are converted to structuring data to be filtered, wherein, the structuring packet to be filtered
Data fields mark and key-value are included to the data volume of form here, structuring data to be filtered include
Data fields identify the data volume to form with key-value.
Wherein, the data fields identify the classification for showing structuring data to be filtered, its
In, the classification for example and is not limited to:When the CPU usage of main frame, the access delay of certain website
Between etc., the data fields mark can be identified using data or word etc., in addition, any energy
The mode of enough marks recognized by computer can serve as the embodiment of the data fields mark,
And be incorporated herein by reference.Wherein, the key-value records the structure to the data volume of form
Change the key-value of data to be filtered to the details of form (Key-Value forms), the data volume
For example (it is only for example, however it is not limited to this):InstanceId=AY123456, clusterId=
Hangzhou, value=92, bizTime=1427041923825, unit=Percent, wherein,
Expression value (Value), equal sign or so on the right side of key (Key), each equal sign are represented on the left of each equal sign
The information of both sides constitutes data volume of the key-value to form, here, the key included by the data volume-right
May include it is one or more, its key-to quantity be not restricted by.
It is preferred that, the initial data to be filtered are obtained from mass data, the first device 11
Also include:The unit of the initial data to be filtered is obtained from distributed message middleware.Described
One device 11 passes through distributed message middleware, it is preferred that the distributed message middleware is
MetaQ (a kind of distributed message middleware), MetaQ are disappearing for a distributed, queuing model
Middleware is ceased, MetaQ has the characteristics that:It ensure that strict message sequence;There is provided what is enriched
Message pull mode, efficient subscriber's horizontal extension ability, real-time message subscribing mechanism, hundred million grades
Message accumulates ability, make use of MetaQ company-data Sharding (burst) characteristic, can be with
Multiple equipment 1 is formed the identical peer node of multiple functions and carry out cluster, and have cluster
For load balance ability, the scalability under mass data background, high availability and performance are met
It is required that.
It is preferred that, the first device 11 can also include:For by structuring number to be filtered
The unit for extremely blocking queue according to sending;Correspondingly, the 3rd device 13 is included from the obstruction team
The unit of structuring data to be filtered is obtained in row.
Here, it is described obstruction queue can block when queue full further enqueue operations until
The queue of the obstruction queue is discontented with.Specifically, the first device 11 treated the structuring
Filter data is sent to obstruction queue, then structuring data to be filtered, which enter, blocks queue wait, institute
Wait order of the 3rd device 13 according to structuring data to be filtered is stated, according to this from the obstruction
Structuring data to be filtered are obtained in queue, after structuring data to be filtered are acquired i.e.
Deleted from the obstruction queue.Here, the structuring data to be filtered waited in the obstruction queue are accounted for
During the full obstruction queue, the obstruction queue obstruction first device 11 is transmitted across filter data and entered
Enter to block the operation of queue, during so as to avoid disposal ability not enough, EMS memory occupation is excessive, so that
Play a part of peak load shifting in mass data filter process, it is to avoid handling failure.
Further, the second device 12 is used to load filtering rule, wherein, each mistake
Filter rule includes:Regular field designation, rule name and regular operation expression, and set up with described
The field designation of filtering rule is the first list of rules of the filtering rule indexed.
Here, the regular field designation is used for the classification for showing the filtering rule, wherein, it is described
Classification for example and be not limited to:The CPU usage of main frame, access delay time of certain website etc.,
The regular field designation can be identified using data or word etc., in addition, any can be counted
The mode of the mark of calculation machine identification can serve as the embodiment of the data fields mark, and to draw
Mode is incorporated herein.Preferably, the content of the regular field designation and the data fields mark
The content of knowledge is identical or essentially identical, so that the 3rd device 13 is identified according to the data fields
Being obtained from first list of rules has the regular field corresponding with data fields mark
Some filtering rules of mark.Wherein, the rule name can be the rule name of globally unique identification
Claim, in order to the management service of filtering rule.Wherein, regular operation expression can be numeral, word
The regular expression of string form composition is accorded with, for example (it is only for example, however it is not limited to this):instanceId
=' AY123456'| | clusterId=hangzhou)s &&value>80, regular operation expression may be used also
To include the data acquisition system type of the primary type composition such as nonnumeric, character string, for example (it is only for example,
It is not limited to this):Array, Hash set etc..
Further, the second device 12 is set up using the field designation of the filtering rule as index
The filtering rule the first list of rules, first list of rules be used for be 3rd device 13
Obtain filtering rule and support is provided.
Further, the 3rd device 13 obtains structuring data to be filtered, and according to institute
Stating data fields mark and being obtained from first list of rules has and data fields mark phase
Some filtering rules of corresponding regular field designation.Specifically, the 3rd device 13 is according to institute
Data fields mark is stated to obtain with corresponding with data fields mark from first list of rules
Regular field designation some filtering rules.
Further, the 4th device 14 utilizes acquired some filtering rules to the structure
Change data to be filtered and carry out PARALLEL MATCHING computing.
It is preferred that, for each structuring data to be filtered, the 3rd device 13 is according to it
Data fields mark obtains some some filtering rules with the regular field designation of corresponding identical, then
4th device 14 carries out one using each acquired filtering rule to structuring data to be filtered
Secondary matching operation, the 4th device 14 carries out PARALLEL MATCHING meter to some acquired filtering rules
Calculate, to make full use of the performance of multi-core central processing unit, improve filter efficiency.
Specifically, the 4th device 14 includes:For entering line discipline to acquired filtering rule
Compiling, the unit of abstract syntax tree can be run to set up;With for by structuring data to be filtered
Data volume as input parameter, travel through it is some it is described can run abstract syntax tree, and utilize some institutes
The unit that abstract syntax tree carries out PARALLEL MATCHING calculating can be run by stating.
4th device 14 realizes the function of abstract syntax tree, it would be preferable to support arithmetical operation, word
Symbol string computing, relational calculus, logical operation, regular expression computing and set operation etc., and reserve
Expansion interface, can support user-defined computing etc..
Further, the filtering rule acquired in 14 pairs of the 4th device enters line discipline compiling, with
Foundation can run abstract syntax tree (AST, Abstract Syntax Tree), here, described abstract
Syntax tree is the tree-shaped form of expression of the abstract syntax structure of regular expression.
Specifically, it is described to be used to enter acquired filtering rule line discipline compiling, be run with setting up
The unit of abstract syntax tree includes:Divided for the regular expression to acquired filtering rule
Analysis, to be converted into the module of abstract syntax tree;With for the abstract syntax tree carry out precomputation,
So that the module of abstract syntax tree can be run described in acquisition.
Specifically, the regular expression of acquired filtering rule is analyzed, it is abstract to be converted into
Syntax tree, can be realized using Antlr (Another Tool for Language Recognition),
User-defined filtering rule expression formula can be converted into abstract syntax tree;By to regular expression
Formula carries out the Token streams that morphological analysis obtains AST, and Token streams (token) are identified including analysis
Character string rule various arithmetic operations symbol, arithmetic operation symbol includes but is not limited to for example:Operator,
Numeral, character string, variable, regular expression etc..
Wherein, arithmetic operation symbol is for example including the example below code:
In specific application scenarios, the regular expression of such as filtering rule is following character string forms
Content:
CPU>90/100and clusterId in[‘hz’,’qd’]and instanceId like‘AK47\w+’
Fig. 8 to Figure 10 shows to utilize acquired some filterings to advise according in the concrete scene of the application one
Then structuring data to be filtered are carried out with the schematic diagram of PARALLEL MATCHING computing.By writing Antlr
Morphological analysis rule, obtains AST token streams as shown in Figure 9, and preservation form in systems
Issue of priority is solved using operation expression suffix notation postfix notation, as shown in figure 8, preservation form is:
OP:Operator, Num:Numeral, Var:Variable, Regex:Regular expression, StrArray:Character string number
Group.
Then, precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain.
Wherein, the constant expression that the precomputation is used for during AST token are flowed is precalculated, with
Judge whether subexpression can calculate, and each element in the abstract syntax tree is checked by precomputation
Whether it is specific type, the element of specific type therein is converted into programming language information structural elements
Element, such as, but not limited to, Like operating parameters element is explained and is converted into regular expression, by In
Operating parameter element, which is explained, is converted into set.Can be by the constant expression in AST by precomputation
Advance budget is carried out, so that accelerate processing speed during operation, and the cycling of elements of specific type is program
Language data structure element, wherein, the element of the specific type is nonnumeric, character string composition
The element of primary type, such as, but not limited to data acquisition system type, such as, but not limited to array, Hash
Map, Hash set etc..
Example is connected, in specific scene, the 4th device 14 is to described abstract shown in Fig. 9
Syntax tree carries out a precomputation, and result of calculation is that can run abstract syntax tree (AST), wherein,
AST token streams are as shown in Figure 10, wherein, " 0.9 ", " Java.util.HashSet [' hz ', ' qd '] "
And " Java.util.regex.pattern ' AK47 W+ ' " is the result of calculation Jing Guo precomputation.
In an optional embodiment, the code sample for carrying out precomputation is as follows:
Certain those skilled in the art from now on may be used it should be understood that above-mentioned example code is only for example
The other forms such as method, code for the progress precomputation that can occur, are such as applicable the application, can be with
The mode of reference is contained within the protection domain of the application.
Specifically, for carrying out precomputation to the abstract syntax tree, with obtain it is described can run it is abstract
The module of syntax tree, wherein, the module is used for:Operation storehouse is created according to the abstract syntax tree,
By the element in the abstract syntax tree it is incoming it is described operation storehouse in, when the element be operator
When, corresponding two operands of operator are spread out of into the operation storehouse, calculated to obtain result of calculation,
For being special elementses when the element, then the special elementses are converted into programming language information structure
After element in the incoming operation storehouse.
Further, the 4th device 14 is also included the data of structuring data to be filtered
Body as input parameter, travel through it is some it is described can run abstract syntax tree, and described transported using some
Row abstract syntax tree carries out the unit of PARALLEL MATCHING calculating.
The process and one that PARALLEL MATCHING calculates computing is carried out using some abstract syntax tree that run
Time precomputation is identical, and all expression formulas when running under normal circumstances in AST are can computational chart
Up to formula, so final result of calculation is the value of a determination, the value be Boolean FALSE or
TRUE, if the Boolean of result of calculation is TRUE, the structural data is then judged as meeting
The filtering rule.
Here, by using it is described run abstract syntax tree to the data carry out matching primitives, when
Herein described equipment 1 has been assigned to 1000 filtering rules, then for each structuring
This 1000 filtering rules are concurrently performed by data to be filtered in the thread pool of the equipment 1
With computing, concurrent filtering rule is come with the performance for making full use of multi-core CPU.
It is specifically, described to be used for the data volume of structuring data to be filtered as input parameter,
Abstract syntax tree can be run described in traversal is some, and is carried out using some abstract syntax tree that run
The unit that PARALLEL MATCHING is calculated includes:For being institute by the variable replacement for running abstract syntax tree
State the module of the parameter in data volume;For running abstract syntax to described using the operation storehouse
Tree carries out the module of matching primitives.
Wherein, it is the parameter in the data volume by the variable replacement for running abstract syntax tree
Code sample is as follows:
Matching primitives are carried out to the abstract syntax tree that runs using the operation storehouse, wherein, it is right
The each nodes of AST are handled, and it is as follows to be put into the code sample of run time stack:
Wherein, the code sample for carrying out corresponding computing to the operator node in AST is as follows:
Wherein, the code sample for carrying out matching primitives is as follows:
Certain those skilled in the art from now on may be used it should be understood that above-mentioned example code is only for example
The other forms such as method, the code that can occur, are such as applicable the application, can wrap by reference
It is contained within the protection domain of the application.
Hereafter, structuring data to be filtered can be further processed for the equipment 1, for example
Alarm etc..
Fig. 2 shows a kind of device for filtering data provided according to the preferred embodiment of the application one
Equipment schematic diagram, the equipment 1 includes:First device 11 ', second device 12 ', 3rd device
13 ', the 4th device 14 ' and the 5th device 15 '.
Shown in the content and Fig. 1 of the first device 11 ', 3rd device 13 ' and the 4th device 14 '
The first device 11 of equipment 1,3rd device 13 are identical or essentially identical with the content of the 4th device 14,
For simplicity, repeat no more, be only incorporated herein by reference.
It is preferred that, base of the second device 12 ' in the content for quoting the second device 12 shown in Fig. 1
On plinth, the second device 12 ' also includes:It is rope to set up according to the rule name of the filtering rule
The unit of the Second Rule list of the filtering rule drawn;The second device 12 ' is according to the mistake
Filter the regular field designation of rule and the rule name of the filtering rule is built for the index of two dimensions
Vertical first list of rules and Second Rule list, wherein, with the regular field designation of the filtering rule
Supplied to search during filter data for the first list of rules of index, using the rule name of the filtering rule as
The Second Rule list of index supplies to search when the management and maintenance of filtering rule.The structuring is obtained to treat
When crossing filter data, the filtering rule in the list of rules of matched and searched first is identified according to data fields, is looked for
To the list of corresponding filtering rule, and the list of the filtering rule is traveled through, number to be filtered will be formatted
According to data volume as input parameter, concurrent matching primitives are done to each rule in list.Described
Two list of rules are easy to be managed filtering rule.
5th device 15 ' is used to increase filtering rule newly, deletes filtering rule or to existing filtering
Rule is modified compiling.
Specifically, the 5th device 15 ' includes being used for newly-increased filtering rule is increased into described the
Unit in two list of rules;For deleting corresponding filtering rule from the Second Rule list
Unit;For searching filtering rule, and the filtering rule to being searched from the Second Rule list
Modify the unit of compiling.5th device 15 ' can modify and additions and deletions to filtering rule
Operation, improves the flexibility of filtering rule.
Fig. 3 shows a kind of device for filtering data provided according to another preferred embodiment of the application
Equipment schematic diagram, wherein, the equipment 1 includes first device 11 ", second device 12 ", the
Three devices 13 ", the 4th device 14 ", the 5th device 15 " and the 6th device 16 ".
Wherein, the first device 11 ", second device 12 ", 3rd device 13 ", the 4th device
14 " and the first device 11 ' of the 5th equipment 1 shown in device 15 " and Fig. 2, second device 12 ',
3rd device 13 ', the 4th device 14 ' are identical or essentially identical with the content of the 5th device 15 ", are letter
For the sake of bright, repeat no more, and be incorporated herein by reference.
Here, each filtering rule also includes:The letter of notifying device bound in the filtering rule
Breath;6th device 16 ' is used for the structuring data to be filtered by the corresponding filtering rule is met
Send to the notifying device bound in the filtering rule, in case transmission.Wherein, here, the notifying device
It is one group of realization to reserving notification interface, customized advice method can be achieved, such as using difference
Host-host protocol, different compression algorithms, different serializing algorithms are transmitted into down-stream system cluster
In different systems.Wherein, the notifying device can be carried out freely when filtering rule is created
Combination assembling is tied to any filtering rule.
Fig. 4 shows that on the one hand a kind of of offer was used for the method flow diagram of filter data according to the application,
Wherein, methods described includes:Step S11, step S12, step S13 and step S14.
Specifically, the step S11 includes:Initial data to be filtered are obtained, and are initially treated described
Filtering data are converted to structuring data to be filtered, wherein, the structuring data to be filtered include number
According to the data volume of field designation and key-value to form;The step S12 includes:Load filtering rule,
Wherein, each filtering rule includes regular field designation, rule name, regular operation expression,
And first rules column of the foundation using the field designation of the filtering rule as the filtering rule of index
Table;The step S13 includes:Structuring data to be filtered are obtained, and are led according to the data
Domain identifier is obtained from first list of rules with the rule corresponding with data fields mark
Then some filtering rules of field designation;The step S14 includes:Utilize acquired some filterings
Rule carries out PARALLEL MATCHING computing to structuring data to be filtered.
Further, in the step S11:Initial data to be filtered are obtained, and will be described initial
Data to be filtered are converted to structuring data to be filtered, wherein, the structuring data to be filtered include
Data fields identify the data volume with key-value to form here, structuring data to be filtered include number
According to the data volume of field designation and key-value to form.
Wherein, the data fields identify the classification for showing structuring data to be filtered, its
In, the classification for example and is not limited to:When the CPU usage of main frame, the access delay of certain website
Between etc., the data fields mark can be identified using data or word etc., in addition, any energy
The mode of enough marks recognized by computer can serve as the embodiment of the data fields mark,
And be incorporated herein by reference.Wherein, the key-value records the structure to the data volume of form
Change the key-value of data to be filtered to the details of form (Key-Value forms), the data volume
For example (it is only for example, however it is not limited to this):InstanceId=AY123456, clusterId=
Hangzhou, value=92, bizTime=1427041923825, unit=Percent, wherein,
Expression value (Value), equal sign or so on the right side of key (Key), each equal sign are represented on the left of each equal sign
The information of both sides constitutes data volume of the key-value to form, here, the key included by the data volume-right
May include it is one or more, its key-to quantity be not restricted by.
It is preferred that, the initial data to be filtered are obtained from mass data, and the step S11 is also wrapped
Include:The initial data to be filtered are obtained from distributed message middleware, by distributed message
Between part, it is preferred that MetaQ (a kind of distributed message middleware) is a distributed, queue mould
The message-oriented middleware of type, has the characteristics that:It ensure that strict message sequence;Abundant disappear is provided
Breath pull mode, efficient subscriber's horizontal extension ability, real-time message subscribing mechanism, hundred million grades disappear
Accumulation ability is ceased, MetaQ company-data Sharding (burst) characteristic is make use of, Fig. 7 shows
Go out is used to filter data equipment described in a kind of application provided according to the preferred embodiment of the application one
The equipment schematic diagram of system, multiple equipment 1 forms the identical peer node of multiple functions and collected
Group, and cluster is possessed load balance ability, the scalability under mass data background is met,
High availability and performance requirement.
It is preferred that, the step S11 also includes:Structuring data to be filtered are sent to obstruction
Queue;Correspondingly, the step S13 includes:The structuring is obtained from the obstruction queue to treat
Cross filter data.
Here, it is described obstruction queue can block when queue full further enqueue operations until
The queue of the obstruction queue is discontented with.Specifically, the step S11 is by structuring number to be filtered
According to sending to queue is blocked, then structuring data to be filtered enter obstruction queue and waited, the step
Rapid S13 is obtained from the obstruction queue according to this according to the wait order of structuring data to be filtered
Structuring data to be filtered are taken, i.e. from the resistance after structuring data to be filtered are acquired
Queue is filled in delete.Here, the structuring data to be filtered waited in the obstruction queue take the resistance
When filling in queue, the obstruction queue blocks the step S11 and is transmitted across filter data into obstruction queue
Operation, during so as to avoid disposal ability not enough, EMS memory occupation is excessive, so that in mass data mistake
Play a part of peak load shifting during filter, it is to avoid handling failure.
Further, in the step S12, filtering rule is loaded, wherein, each filtering
Rule includes:Regular field designation, rule name and regular operation expression, and set up with the mistake
The field designation of filter rule is the first list of rules of the filtering rule indexed.
Here, the regular field designation is used for the classification for showing the filtering rule, wherein, it is described
Classification for example and be not limited to:The CPU usage of main frame, access delay time of certain website etc.,
The regular field designation can be identified using data or word etc., in addition, any can be counted
The mode of the mark of calculation machine identification can serve as the embodiment of the data fields mark, and to draw
Mode is incorporated herein.Preferably, the content of the regular field designation and the data fields mark
The content of knowledge is identical or essentially identical, so that the step S13 is identified from institute according to the data fields
State to obtain in the first list of rules and there is the regular field designation corresponding with data fields mark
Some filtering rules.Wherein, the rule name can be the rule name of globally unique identification,
In order to the management service of filtering rule.Wherein, regular operation expression can be numeral, character string
The regular expression of form composition, for example (it is only for example, however it is not limited to this):InstanceId=
' AY123456'| | clusterId=hangzhou)s &&value>80, regular operation expression can be with
Include the data acquisition system type of the primary type composition such as nonnumeric, character string, for example (it is only for example,
It is not limited to this):Array, Hash set etc..
Further, the step S12 includes:Set up using the field designation of the filtering rule as rope
First list of rules of the filtering rule drawn, wherein, first list of rules is used to be step
S13 obtains filtering rule and provides support.
Further, in the step S13, the acquisition structuring data to be filtered, and according to
The data fields mark obtains to have from first list of rules to be identified with the data fields
Some filtering rules of corresponding regular field designation.Specifically, the step S13 is according to described
Data fields mark is obtained with corresponding with data fields mark from first list of rules
Some filtering rules of regular field designation.
Further, in the step S14, using acquired some filtering rules to the knot
Structureization data to be filtered carry out PARALLEL MATCHING computing.
It is preferred that, for each structuring data to be filtered, the step S13 is according to its data
Field designation obtains some some filtering rules with the regular field designation of corresponding identical, then step
S14 carries out once matching to structuring data to be filtered using each acquired filtering rule and transported
Calculate, the step S14 carries out PARALLEL MATCHING calculating to some acquired filtering rules, with fully profit
With the performance of multi-core central processing unit, filter efficiency is improved.
Specifically, the step S14 includes:Enter line discipline compiling to acquired filtering rule, with
Foundation can run abstract syntax tree;Join the data volume of structuring data to be filtered as input
Number, travel through it is some it is described can run abstract syntax tree, and described run abstract syntax tree using some
Carry out PARALLEL MATCHING calculating.
The step S14 realizes the function of abstract syntax tree, it would be preferable to support arithmetical operation, character string
Computing, relational calculus, logical operation, regular expression computing and set operation etc., and reserved expansion
Interface is opened up, user-defined computing etc. can be supported.
Further, line discipline compiling is entered to acquired filtering rule, abstract language can be run to set up
Method tree (AST, Abstract Syntax Tree), here, the abstract syntax tree is rule list
Up to the tree-shaped form of expression of the abstract syntax structure of formula.
Wherein, line discipline compiling is entered to acquired filtering rule, abstract syntax tree can be run to set up
Including:Analyzed for the regular expression to acquired filtering rule, to be converted into abstract language
Method tree, specifically, can use Antlr (Another Tool for Language Recognition)
To realize, user-defined filtering rule expression formula can be converted into abstract syntax tree;By right
Regular expression carries out the Token streams that morphological analysis obtains AST, and Token streams (token) include dividing
The various arithmetic operation symbols of the character string rule identified are analysed, arithmetic operation symbol includes but is not limited to for example:
Operator, numeral, character string, variable, regular expression etc..
Wherein, the 4th device 14 of equipment 1 is converted shown in the code sample and Fig. 1 of arithmetic operation symbol
Abstract syntax tree arithmetic operation symbol code sample content it is identical or essentially identical, be concise rise
See, repeat no more, be only incorporated herein by reference.
In specific application scenarios, the regular expression of such as filtering rule is following character string forms
Content:
CPU>90/100and clusterId in[‘hz’,’qd’]and instanceId like‘AK47\w+’
By writing Antlr morphological analyses rule, AST token streams as shown in Figure 9 are obtained, and
Preservation form in systems solves issue of priority using operation expression suffix notation postfix notation, such as schemes
Shown in 8, preservation form is:OP:Operator, Num:Numeral, Var:Variable, Regex:Regular expressions
Formula, StrArray:Character string dimension.
Then, precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain.
Wherein, the constant expression that the precomputation is used for during AST token are flowed is precalculated, with
Judge whether subexpression can calculate, and each element in the abstract syntax tree is checked by precomputation
Whether it is specific type, the element of specific type therein is converted into programming language information structural elements
Element, such as, but not limited to, Like operating parameters element is explained and is converted into regular expression, by In
Operating parameter element, which is explained, is converted into set.Can be by the constant expression in AST by precomputation
Advance budget is carried out, so that accelerate processing speed during operation, and the cycling of elements of specific type is program
Language data structure element, wherein, the element of the specific type is nonnumeric, character string composition
The element of primary type, such as, but not limited to data acquisition system type, such as, but not limited to array, Hash
Map, Hash set etc..
Example is connected, in specific scene, the abstract syntax tree shown in Fig. 9 is carried out once pre-
Calculate, result of calculation is that can run abstract syntax tree (AST), wherein, AST token streams are such as
Shown in Figure 10, wherein, " 0.9 ", " Java.util.HashSet [' hz ', ' qd '] " and
" Java.util.regex.pattern ' AK47 W+ ' " is the result of calculation Jing Guo precomputation.
Showing for precomputation can be carried out with the 4th device 14 shown in Fig. 1 by carrying out the code sample of precomputation
The content of example code is identical or essentially identical, for simplicity, repeats no more, only by reference
It is incorporated herein.
Specifically, carrying out precomputation to the abstract syntax tree includes:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming
In the operation storehouse;When the element is operator, corresponding two operands of operator are passed
Go out and run storehouse, calculate to obtain result of calculation;When the element is special elementses, then by the spy
Different element is converted to after programming language information structural element in incoming operation storehouse.
Further, using the data volume of structuring data to be filtered as input parameter, if traversal
Abstract syntax tree can be run described in dry, and parallel is carried out using some abstract syntax tree that run
Process with calculating, PARALLEL MATCHING calculating computing is carried out using some abstract syntax tree that run
Process is identical with a precomputation, and all expression formulas when running under normal circumstances in AST are
Can calculation expression, so final result of calculation be one determination value, the value be Boolean
FALSE or TRUE, if the Boolean of result of calculation is TRUE, the structural data is then
It is judged as meeting the filtering rule.
Here, described calculated by profit using some abstract syntax tree progress PARALLEL MATCHINGs that can run
Run abstract syntax tree with described matching primitives are carried out to the data, for example, when herein described
Equipment 1 has been assigned to 1000 filtering rules, then for each structuring data to be filtered,
Matching operation is concurrently performed to this 1000 filtering rules in the thread pool of the equipment 1, to fill
Divide using the performance of multi-core CPU come concurrent filtering rule.
Specifically, using it is some it is described run abstract syntax tree carry out PARALLEL MATCHING calculate include:Will
The variable replacement for running abstract syntax tree is the parameter in the data volume;Utilize the operation
Storehouse carries out matching primitives to the abstract syntax tree that runs.
By example of the variable replacement for running abstract syntax tree for the parameter in the data volume
4th device 14 of code and the equipment 1 in described Fig. 1 replace code sample content is identical or base
This is identical, for simplicity, repeats no more, is only incorporated herein by reference.
The code sample of corresponding computing and setting in described Fig. 1 are carried out to the operator node in AST
The content for the code sample that standby 1 the 4th device 14 carries out corresponding computing is identical or essentially identical,
For simplicity, repeat no more, be only incorporated herein by reference.
Similarly, matching primitives and the 4th device 14 progress of the equipment 1 in described Fig. 1 are carried out
The content of code sample with calculating is identical or essentially identical, for simplicity, repeats no more, only with
The mode of reference is incorporated herein.
Hereafter, structuring data to be filtered can also be further processed for methods described, for example
Alarm etc..
Fig. 5 shows to be used for the method stream of filter data according to a kind of of the preferred embodiment of the application one offer
Journey schematic diagram, methods described includes:Step S11 ', step S12 ', step S13 ', step S14 '
With step S15 '.
Step S11 shown in the step S11 ', step S13 ' and step S14 ' content and Fig. 4,
Step S12 is identical or essentially identical with step S14 content, for simplicity, repeats no more, only
It is incorporated herein by reference.
It is preferred that, the step S12 ' quote Fig. 4 shown in step S12 content on the basis of,
The step S12 ' also includes:Set up the mistake for index according to the rule name of the filtering rule
Filter the Second Rule list of rule;The step S12 ' according to the filtering rule regular field designation
Rule name with the filtering rule is that the index of two dimensions sets up the first list of rules and second
List of rules, wherein, using the regular field designation of the filtering rule as the first list of rules of index
For filter data when search, using the rule name of the filtering rule as index Second Rule list supply
Searched when the management and maintenance of filtering rule.When obtaining structuring data to be filtered, according to data
Filtering rule in the list of rules of field designation matched and searched first, finds the row of corresponding filtering rule
Table, and the list of the filtering rule is traveled through, the data volume for formatting data to be filtered is joined as input
Number, concurrent matching primitives are done to each rule in list.The Second Rule list is easy to filtering
Rule is managed.
In the step S15 ', increase filtering rule newly, delete filtering rule or existing filtering is advised
Then modify compiling.
Specifically, the step S15 ' includes following at least any one:By newly-increased filtering rule increase
Into the Second Rule list;Corresponding filtering rule is deleted from the Second Rule list;From
Filtering rule is searched in the Second Rule list, and volume of being modified to the filtering rule searched
Translate, the step S15 ' can modify and additions and deletions operation to filtering rule, improve filtering rule
Flexibility.
Fig. 6 shows to be used for the method for filter data according to a kind of another the of preferred embodiment offer of the application
Flow chart, wherein, methods described includes step S11 ", step S12 ", step S13 ", step
S14 ", step S15 " and step S16 ".
Wherein, the step S11 ", step S12 ", step S13 ", step S14 " and step
Step S11 ', step S12 ', step S13 ', step S14 ' shown in rapid S15 " and Fig. 5 and
Step S15 " content is identical or essentially identical, for simplicity, repeats no more, and with reference
Mode is incorporated herein.
Here, each filtering rule also includes:The letter of notifying device bound in the filtering rule
Breath;In the step S16 ', the structuring data to be filtered for meeting the corresponding filtering rule are sent out
The notifying device bound in the filtering rule is delivered to, in case transmission.Here, the notifying device is to reserved
One group of realization of notification interface, can be achieved customized advice method, such as is assisted using different transmission
View, different compression algorithms, different serializing algorithms, which is transmitted into down-stream system cluster, different is
In system.Wherein, the notifying device can carry out freely combination assembling when filtering rule is created
It is tied to any filtering rule.
Compared with prior art, the equipment for data filtering provided according to the embodiment of the application one
And method use streaming computing mode, will not be cached in internal memory also will not curing data, i.e., obtain every time
Take after initial data to be filtered and the initial data to be filtered are converted into structuring data to be filtered, and
Matching primitives are carried out in real time using corresponding filtering rule, filter result is obtained immediately, solve magnanimity stream
The real time problems of the filtering of formula data;
Further, the device and method for data filtering provided according to the embodiment of the application one
The method and apparatus for crossing filter data supports arithmetical operation, string operations, relational calculus, logic
Computing, regular expression computing, set operation, and reserved expansion interface, and the filtering rule
It is then the simple operation expression form with variable, solves filtering rule description complexity, extends not
The problem of easy and difficult management;
In addition, equipment for data filtering that the application is provided according to the embodiment of the application one and side
Method is developed for autonomous Design, advantage of lower cost, and can monitor and adjust in any code path
It is excellent.
Through multiple performance test, obtained performance indications are substantially in the virtual of the core 8G of separate unit 4 configurations
Machine can support 500,000 filtering rules, and processing stream data TPS reaches 20000, filtered out effectively
Data TPS reaches 2000, and system average load is stable in load1-4 or so, and cpu resource is obtained
Effectively utilize.
In one typical configuration of the application, terminal, the equipment of service network and trusted party include
One or more processors (CPU), input/output interface, network interface and internal memory.Internal memory may
Including the volatile memory in computer-readable medium, random access memory (RAM) and/
Or the form, such as read-only storage (ROM) or flash memory (flash RAM) such as Nonvolatile memory.Internal memory
It is the example of computer-readable medium.Computer-readable medium includes permanent and impermanency, can
Mobile and non-removable media can be realized that information is stored by any method or technique.Information can be
Computer-readable instruction, data structure, the module of program or other data.The storage medium of computer
Example include, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM),
Dynamic random access memory (DRAM), other kinds of random access memory (RAM), only
Read memory (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank
Or other memory techniques, read-only optical disc read-only storage (CD-ROM), digital versatile disc
(DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storages are set
Standby or any other non-transmission medium, the information that can be accessed by a computing device available for storage.According to
Herein defines, and computer-readable medium does not include non-temporary computer readable media (transitory
Media), such as the data-signal and carrier wave of modulation.
It should be noted that the application can be carried out in the assembly of software and/or software and hardware,
For example, can be using application specific integrated circuit (ASIC), general purpose computer or any other is similar hard
Part equipment is realized.In one embodiment, the software program of the application can pass through computing device
To realize steps described above or function.Similarly, the software program of the application (includes the number of correlation
According to structure) it can be stored in computer readable recording medium storing program for performing, for example, RAM memory, magnetic
Or CD-ROM driver or floppy disc and similar devices.In addition, some steps or function of the application can be used
Hardware realizes, for example, as coordinating with processor so as to performing the circuit of each step or function.
In addition, the part of the application can be applied to computer program product, such as computer program
Instruction, when it is computer-executed, by the operation of the computer, can call or provide basis
The present processes and/or technical scheme.And the programmed instruction of the present processes is called, it may be deposited
Store up in fixed or moveable recording medium, and/or by broadcast or other signal bearing medias
Data flow and be transmitted, and/or be stored according to the computer equipment of described program instruction operation
In working storage.Here, including a device, the device bag according to one embodiment of the application
The memory for storing computer program instructions and the processor for execute program instructions are included, its
In, when the computer program instructions are by the computing device, trigger the plant running and be based on foregoing
According to the methods and/or techniques scheme of multiple embodiments of the application.
It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment
Section, and in the case of without departing substantially from spirit herein or essential characteristic, can be with other specific
Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary
, and be nonrestrictive, scope of the present application is limited by appended claims rather than described above
It is fixed, it is intended that all changes fallen in the implication and scope of the equivalency of claim are included
In the application.The right that any reference in claim should not be considered as involved by limitation will
Ask.Furthermore, it is to be understood that the word of " comprising " one is not excluded for other units or step, odd number is not excluded for plural number.Dress
Software can also be passed through by a unit or device by putting the multiple units stated in claim or device
Or hardware is realized.The first, the second grade word is used for representing title, and is not offered as any specific
Order.
Claims (18)
1. a kind of be used for the method for filter data, wherein, methods described includes:
Initial data to be filtered are obtained, and it is to be filtered that the initial data to be filtered are converted into structuring
Data, wherein, the structuring data to be filtered include data fields mark and key-value to form
Data volume;
Filtering rule is loaded, wherein, each filtering rule includes regular field designation, rule name
Claim and regular operation expression, and set up the mistake using the field designation of the filtering rule as index
Filter the first list of rules of rule;
Structuring data to be filtered are obtained, and are advised according to data fields mark from described first
Some mistakes with the regular field designation corresponding with data fields mark are then obtained in list
Filter rule;
PARALLEL MATCHING is carried out to structuring data to be filtered using acquired some filtering rules
Computing.
2. according to the method described in claim 1, wherein, obtaining initial data to be filtered includes:
The initial data to be filtered are obtained from distributed message middleware.
3. method according to claim 1 or 2, wherein, the initial data to be filtered are turned
Being changed to structuring data to be filtered also includes:
Structuring data to be filtered are sent to obstruction queue;
Obtaining structuring data to be filtered includes:
Structuring data to be filtered are obtained from the obstruction queue.
4. according to the method in any one of claims 1 to 3, wherein, if using acquired
Dry filtering rule carries out PARALLEL MATCHING computing to structuring data to be filtered to be included:
Enter line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up;
Using the data volume of structuring data to be filtered as input parameter, some described can transport is traveled through
Row abstract syntax tree, and carry out PARALLEL MATCHING calculating using some abstract syntax tree that run.
5. method according to claim 4, wherein, line discipline is entered to acquired filtering rule
Compiling, can run abstract syntax tree with foundation includes:
The regular expression of acquired filtering rule is analyzed, to be converted into abstract syntax tree;
Precomputation is carried out to the abstract syntax tree, described abstract syntax tree can be run to obtain;
Wherein, carrying out a precomputation to the abstract syntax tree includes:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming
In the operation storehouse;
When the element is operator, corresponding two operands of the operator are spread out of into the fortune
Row storehouse, calculates to obtain result of calculation;
When the element is special elementses, then the special elementses are converted into programming language information structure
After element in the incoming operation storehouse.
6. the method according to claim 4 or 5, wherein, using it is some it is described run it is abstract
Syntax tree, which carries out PARALLEL MATCHING calculating, to be included:
It is the parameter in the data volume by the variable replacement for running abstract syntax tree;
Matching primitives are carried out to the abstract syntax tree that runs using the operation storehouse.
7. method according to any one of claim 1 to 6, wherein, methods described also includes:
Newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule.
8. method according to claim 7, wherein, set up and marked with the field of the filtering rule
Knowing the first list of rules of the filtering rule for index also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule
Then list;
The newly-increased filtering rule, deletion filtering rule or compiling of being modified to existing filtering rule
Including at least any one of following:
Newly-increased filtering rule is increased in the Second Rule list;
Corresponding filtering rule is deleted from the Second Rule list;
Filtering rule is searched from the Second Rule list, and the filtering rule searched is repaiied
Reorganization is translated.
9. method according to any one of claim 1 to 8, wherein, each filtering rule
Then also include:The information of notifying device bound in the filtering rule;
Methods described also includes:
The structuring data to be filtered for meeting the corresponding filtering rule are sent to the filtering rule institute
The notifying device of binding, in case transmission.
10. a kind of device for filtering data, wherein, the equipment includes:
First device, for obtaining initial data to be filtered, and will the initially data conversion to be filtered
For structuring data to be filtered, wherein, the structuring data to be filtered include data fields mark and
Data volume of the key-value to form;
Second device, for loading filtering rule, wherein, each filtering rule includes rule and led
Domain identifier, rule name and regular operation expression, and set up with the field designation of the filtering rule
For the first list of rules of the filtering rule of index;
3rd device, for obtaining structuring data to be filtered, and according to the data fields mark
Know and obtained from first list of rules with the rule neck corresponding with data fields mark
Some filtering rules of domain identifier;
4th device, for utilizing acquired some filtering rules to structuring data to be filtered
Carry out PARALLEL MATCHING computing.
11. equipment according to claim 10, wherein, the first device includes:
The unit of the initial data to be filtered is obtained from distributed message middleware.
12. the equipment according to claim 10 or 11, wherein, the first device includes:
For structuring data to be filtered to be sent to the unit of obstruction queue;
The 3rd device includes:
The unit of structuring data to be filtered is obtained from the obstruction queue.
13. the equipment according to any one of claim 10 to 12, wherein, the 4th dress
Put including:
For entering line discipline compiling to acquired filtering rule, abstract syntax tree can be run to set up
Unit;
It is some described for the data volume of structuring data to be filtered, as input parameter, to be traveled through
Abstract syntax tree can be run, and PARALLEL MATCHING calculating is carried out using some abstract syntax tree that run
Unit.
14. equipment according to claim 13, wherein, it is described to be used to advise acquired filtering
Then enter line discipline compiling, included with the unit that foundation can run abstract syntax tree:
Analyzed for the regular expression to acquired filtering rule, to be converted into abstract syntax
The module of tree;
For carrying out precomputation to the abstract syntax tree, described abstract syntax tree can be run to obtain
Module, wherein, the module is used for:
Operation storehouse is created according to the abstract syntax tree, the element in the abstract syntax tree is incoming
In the operation storehouse,
When the element is operator, corresponding two operands of operator are spread out of into the operation heap
Stack, is calculated to obtain result of calculation,
For being special elementses when the element, then the special elementses are converted into programming language information
After structural element in the incoming operation storehouse.
15. the equipment according to claim 13 or 14, wherein, it is described to be used for the structure
Change the data volumes of data to be filtered as input parameter, travel through it is some it is described can run abstract syntax tree,
And included using some units for running abstract syntax tree progress PARALLEL MATCHING calculating:
For being the parameter in the data volume by the variable replacement for running abstract syntax tree
Module;
For running the mould that abstract syntax tree carries out matching primitives to described using the operation storehouse
Block.
16. the equipment according to any one of claim 10 to 15, wherein, the equipment is also
Including:
5th device, for increasing filtering rule, deletion filtering rule newly or entering to existing filtering rule
Row modification compiling.
17. equipment according to claim 16, wherein, the second device also includes:
Set up the second rule for the filtering rule of index according to the rule name of the filtering rule
The then unit of list;
5th device includes:
For the unit for increasing to newly-increased filtering rule in the Second Rule list;
Unit for deleting corresponding filtering rule from the Second Rule list;
Enter for searching filtering rule from the Second Rule list, and to the filtering rule searched
The unit of row modification compiling.
18. the equipment according to any one of claim 10 to 17, wherein, each mistake
Filter rule also includes:The information of notifying device bound in the filtering rule;
The equipment also includes:
6th device, for by the structuring data to be filtered for meeting the corresponding filtering rule send to
Notifying device bound in the filtering rule, in case transmission.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510408180.1A CN107038161B (en) | 2015-07-13 | 2015-07-13 | Equipment and method for filtering data |
PCT/CN2016/088302 WO2017008650A1 (en) | 2015-07-13 | 2016-07-04 | Device and method for filtering data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510408180.1A CN107038161B (en) | 2015-07-13 | 2015-07-13 | Equipment and method for filtering data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107038161A true CN107038161A (en) | 2017-08-11 |
CN107038161B CN107038161B (en) | 2021-03-26 |
Family
ID=57757755
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510408180.1A Active CN107038161B (en) | 2015-07-13 | 2015-07-13 | Equipment and method for filtering data |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107038161B (en) |
WO (1) | WO2017008650A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766538A (en) * | 2017-10-28 | 2018-03-06 | 杭州安恒信息技术有限公司 | Data filtering processing module and synchronous, asynchronous filter method based on java |
CN109189807A (en) * | 2018-09-13 | 2019-01-11 | 北京奇虎科技有限公司 | A kind of filter method and device of alert data |
WO2019061913A1 (en) * | 2017-09-29 | 2019-04-04 | 上海望友信息科技有限公司 | Data type identification method and system, computer readable storage medium and device |
CN109672704A (en) * | 2017-10-16 | 2019-04-23 | 阿里巴巴集团控股有限公司 | Processing method, device and the electronic equipment of message |
CN110287174A (en) * | 2019-05-09 | 2019-09-27 | 北京善义善美科技有限公司 | A kind of data filtering engine and system and filter method |
CN110427754A (en) * | 2019-08-12 | 2019-11-08 | 腾讯科技(深圳)有限公司 | Network application attack detection method, device, equipment and storage medium |
CN111427915A (en) * | 2020-03-25 | 2020-07-17 | 京东数字科技控股有限公司 | Information processing method and device, storage medium and electronic equipment |
CN112068933A (en) * | 2020-09-02 | 2020-12-11 | 成都鱼泡科技有限公司 | Real-time distributed data monitoring method |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112565338B (en) * | 2020-11-10 | 2023-06-20 | 中国人民解放军战略支援部队信息工程大学 | Ethernet message capturing, filtering, storing and real-time analyzing method and system |
CN115047835B (en) * | 2022-06-27 | 2024-06-04 | 中国核动力研究设计院 | DCS-based periodic test data acquisition method, device, equipment and medium |
CN116383290B (en) * | 2023-03-22 | 2023-10-31 | 中国华能集团有限公司北京招标分公司 | Data generalization and analysis method |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1953373A (en) * | 2006-09-19 | 2007-04-25 | 清华大学 | A method to filter and verify open real IPv6 source address |
CN101127774A (en) * | 2007-09-19 | 2008-02-20 | 中兴通讯股份有限公司 | Priority processing method for initial filtering rule |
CN101158948A (en) * | 2006-10-08 | 2008-04-09 | 中国科学院软件研究所 | Text content filtering method and system |
CN101282332A (en) * | 2008-05-22 | 2008-10-08 | 上海交通大学 | System for generating assaulting chart facing network safety alarm incident |
CN101304589A (en) * | 2008-04-14 | 2008-11-12 | 中国联合通信有限公司 | Method and system for monitoring and filtering garbage short message transmitted by short message gateway |
CN101414929A (en) * | 2008-11-18 | 2009-04-22 | 华为技术有限公司 | Method, device and system for acquiring information |
CN101860531A (en) * | 2010-04-21 | 2010-10-13 | 北京星网锐捷网络技术有限公司 | Filtering rule matching method of data packet and device thereof |
CN102082728A (en) * | 2010-12-28 | 2011-06-01 | 北京锐安科技有限公司 | Dynamic loading method for filtering rules of network audit system |
CN102231134A (en) * | 2011-07-29 | 2011-11-02 | 哈尔滨工业大学 | Method for detecting redundant code defects based on static analysis |
CN102654864A (en) * | 2011-03-02 | 2012-09-05 | 华北计算机系统工程研究所 | Independent transparent security audit protection method facing real-time database |
CN103116620A (en) * | 2013-01-29 | 2013-05-22 | 中国电力科学研究院 | Unstructured data safe filtering method based on strategy |
CN103338155A (en) * | 2013-07-01 | 2013-10-02 | 安徽中新软件有限公司 | High-efficiency filtering method for data packets |
CN103631966A (en) * | 2013-12-18 | 2014-03-12 | 用友软件股份有限公司 | Configurable multiple-valued matching field analysis method |
CN103780460A (en) * | 2014-01-15 | 2014-05-07 | 珠海市佳讯实业有限公司 | System for realizing hardware filtering of TAP device through FPGA |
US20140282949A1 (en) * | 2013-03-15 | 2014-09-18 | Kaarya Llc | System and Method for Account Access |
CN104331278A (en) * | 2014-10-15 | 2015-02-04 | 南京航空航天大学 | Instruction filtering method and device for specifications of ARINC661 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102467561A (en) * | 2010-11-19 | 2012-05-23 | 金蝶软件(中国)有限公司 | Form data filtering method and device |
US8949371B1 (en) * | 2011-09-29 | 2015-02-03 | Symantec Corporation | Time and space efficient method and system for detecting structured data in free text |
CN103034700B (en) * | 2012-12-05 | 2016-06-29 | 北京奇虎科技有限公司 | The processing method of rich text content and system |
CN103618733B (en) * | 2013-12-06 | 2017-06-27 | 北京中创腾锐技术有限公司 | A kind of data filtering system and method for being applied to mobile Internet |
CN104317947B (en) * | 2014-11-07 | 2017-12-12 | 南京烽火星空通信发展有限公司 | A kind of real-time architecture comparing system based on mass data |
-
2015
- 2015-07-13 CN CN201510408180.1A patent/CN107038161B/en active Active
-
2016
- 2016-07-04 WO PCT/CN2016/088302 patent/WO2017008650A1/en active Application Filing
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1953373A (en) * | 2006-09-19 | 2007-04-25 | 清华大学 | A method to filter and verify open real IPv6 source address |
CN101158948A (en) * | 2006-10-08 | 2008-04-09 | 中国科学院软件研究所 | Text content filtering method and system |
CN101127774A (en) * | 2007-09-19 | 2008-02-20 | 中兴通讯股份有限公司 | Priority processing method for initial filtering rule |
CN101304589A (en) * | 2008-04-14 | 2008-11-12 | 中国联合通信有限公司 | Method and system for monitoring and filtering garbage short message transmitted by short message gateway |
CN101282332A (en) * | 2008-05-22 | 2008-10-08 | 上海交通大学 | System for generating assaulting chart facing network safety alarm incident |
CN101414929A (en) * | 2008-11-18 | 2009-04-22 | 华为技术有限公司 | Method, device and system for acquiring information |
CN101860531A (en) * | 2010-04-21 | 2010-10-13 | 北京星网锐捷网络技术有限公司 | Filtering rule matching method of data packet and device thereof |
CN102082728A (en) * | 2010-12-28 | 2011-06-01 | 北京锐安科技有限公司 | Dynamic loading method for filtering rules of network audit system |
CN102654864A (en) * | 2011-03-02 | 2012-09-05 | 华北计算机系统工程研究所 | Independent transparent security audit protection method facing real-time database |
CN102231134A (en) * | 2011-07-29 | 2011-11-02 | 哈尔滨工业大学 | Method for detecting redundant code defects based on static analysis |
CN103116620A (en) * | 2013-01-29 | 2013-05-22 | 中国电力科学研究院 | Unstructured data safe filtering method based on strategy |
US20140282949A1 (en) * | 2013-03-15 | 2014-09-18 | Kaarya Llc | System and Method for Account Access |
CN103338155A (en) * | 2013-07-01 | 2013-10-02 | 安徽中新软件有限公司 | High-efficiency filtering method for data packets |
CN103631966A (en) * | 2013-12-18 | 2014-03-12 | 用友软件股份有限公司 | Configurable multiple-valued matching field analysis method |
CN103780460A (en) * | 2014-01-15 | 2014-05-07 | 珠海市佳讯实业有限公司 | System for realizing hardware filtering of TAP device through FPGA |
CN104331278A (en) * | 2014-10-15 | 2015-02-04 | 南京航空航天大学 | Instruction filtering method and device for specifications of ARINC661 |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019061913A1 (en) * | 2017-09-29 | 2019-04-04 | 上海望友信息科技有限公司 | Data type identification method and system, computer readable storage medium and device |
CN109672704A (en) * | 2017-10-16 | 2019-04-23 | 阿里巴巴集团控股有限公司 | Processing method, device and the electronic equipment of message |
CN109672704B (en) * | 2017-10-16 | 2022-02-25 | 阿里巴巴集团控股有限公司 | Message processing method and device and electronic equipment |
CN107766538A (en) * | 2017-10-28 | 2018-03-06 | 杭州安恒信息技术有限公司 | Data filtering processing module and synchronous, asynchronous filter method based on java |
CN109189807A (en) * | 2018-09-13 | 2019-01-11 | 北京奇虎科技有限公司 | A kind of filter method and device of alert data |
CN110287174A (en) * | 2019-05-09 | 2019-09-27 | 北京善义善美科技有限公司 | A kind of data filtering engine and system and filter method |
CN110427754A (en) * | 2019-08-12 | 2019-11-08 | 腾讯科技(深圳)有限公司 | Network application attack detection method, device, equipment and storage medium |
CN110427754B (en) * | 2019-08-12 | 2024-02-13 | 腾讯科技(深圳)有限公司 | Network application attack detection method, device, equipment and storage medium |
CN111427915A (en) * | 2020-03-25 | 2020-07-17 | 京东数字科技控股有限公司 | Information processing method and device, storage medium and electronic equipment |
CN112068933A (en) * | 2020-09-02 | 2020-12-11 | 成都鱼泡科技有限公司 | Real-time distributed data monitoring method |
Also Published As
Publication number | Publication date |
---|---|
CN107038161B (en) | 2021-03-26 |
WO2017008650A1 (en) | 2017-01-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107038161A (en) | A kind of device for filtering data and method | |
US11379755B2 (en) | Feature processing tradeoff management | |
US10318882B2 (en) | Optimized training of linear machine learning models | |
CN103678520B (en) | A kind of multi-dimensional interval query method and its system based on cloud computing | |
US8959519B2 (en) | Processing hierarchical data in a map-reduce framework | |
CN103003813B (en) | Columnar storage representations of records | |
CN106960020B (en) | A kind of method and apparatus creating concordance list | |
CN103793493A (en) | Method and system for processing car-mounted terminal mass data | |
Gao et al. | Handling data skew in MapReduce cluster by using partition tuning | |
EP3069271B1 (en) | Dynamic stream computing topology | |
CN110175184A (en) | A kind of lower drill method, system and the electronic equipment of data dimension | |
Dayarathna et al. | Automatic optimization of stream programs via source program operator graph transformations | |
CN108062384A (en) | The method and apparatus of data retrieval | |
WO2014059836A1 (en) | Method and system for blog content search | |
KR101955376B1 (en) | Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method | |
CN109033173A (en) | It is a kind of for generating the data processing method and device of multidimensional index data | |
Theeten et al. | Chive: Bandwidth optimized continuous querying in distributed clouds | |
Gherissi et al. | Object-centric predictive process monitoring | |
CN103324762A (en) | Hadoop-based index creation method and indexing method thereof | |
CN107276912B (en) | Memory, message processing method and distributed storage system | |
Sawyer et al. | Understanding query performance in Accumulo | |
US11017031B2 (en) | System and method of data transformation | |
CN113806466A (en) | Path time query method and device, electronic equipment and readable storage medium | |
CN106940715A (en) | A kind of method and apparatus of the inquiry based on concordance list | |
CN116975052A (en) | Data processing method and related equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |