CN107391532A - The method and apparatus of data filtering - Google Patents

The method and apparatus of data filtering Download PDF

Info

Publication number
CN107391532A
CN107391532A CN201710245934.5A CN201710245934A CN107391532A CN 107391532 A CN107391532 A CN 107391532A CN 201710245934 A CN201710245934 A CN 201710245934A CN 107391532 A CN107391532 A CN 107391532A
Authority
CN
China
Prior art keywords
mark
data
row
screening
filtering rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710245934.5A
Other languages
Chinese (zh)
Other versions
CN107391532B (en
Inventor
张明坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710245934.5A priority Critical patent/CN107391532B/en
Publication of CN107391532A publication Critical patent/CN107391532A/en
Application granted granted Critical
Publication of CN107391532B publication Critical patent/CN107391532B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of method of data filtering, and filtering rule is expressed by one to multiple marks and corresponding to the screening conditions each identified, described to identify at least one row for including initial data, and methods described includes:The row that all marks that the filtering rule is related to are specified are extracted from initial data, obtain source data;According to the expression of filtering rule, the row record of source data is filtered using screening conditions corresponding to each mark successively, obtains target data.After embodiments herein, change the row included by label, you can suitable for the initial data of different pieces of information structure, greatly simplify the work of developer;When carrying out data filtering, corresponding screening conditions are applicable for the row that label is specified one by one, the data volume for reducing processing while computing is handled simplifying, greatly accelerates the speed of data filtering.

Description

The method and apparatus of data filtering
Technical field
The application is related to technical field of data processing, more particularly to a kind of method and apparatus of data filtering.
Background technology
Nowadays the operation of substantially any a company all be unable to do without data.The institute generated during the daily operation of company on network There is categorical data all can be captured and store substantially, the conclusion drawn after these data are excavated and analyzed, turn into and make The important evidence of commercial decision-making.When carrying out data analysis, slightly have scale company usually require from TB (TrillionByte, Terabyte) rank initial data in filter out it is related to some specific analysis target (some indexs of such as a certain business) Data record, then the data to filtering out carry out statistics and analysis.
As dependence of the company management to data increasingly increases, to the ageing requirement of data analysis also more and more higher.It is right Real time data within the as far as possible short time, it is necessary to filter out the data record for analysis caused by constantly.In the prior art, will Initial data from different business systems is first stored in relevant database, then using SQL (Structured Query Language, SQL) sentence filters out required data record from relevant database.Due to different industry The data of business system generally have different structures, and this implementation or needs enter line number when storage is to relevant database According to the conversion of structure, or need to use different SQL statements to different data structures, no matter which kind of will be by developer Pay sizable workload.In addition, when performing SQL statement, by the whole piece data record reading in relevant database Deposit, this is then recorded into each screening conditions using filtering, could determine whether this record is target data, this side The formula speed of service is slow, it is difficult to meets the requirement for filtering mass data in a short time.
The content of the invention
In view of this, the application provides a kind of method of data filtering, and filtering rule is by one to multiple marks and right It should be expressed in the screening conditions of each mark, described to identify at least one row for including initial data, methods described includes:
The row that all marks that the filtering rule is related to are specified are extracted from initial data, obtain source data;
According to the expression of filtering rule, the row record of source data is carried out using screening conditions corresponding to each mark successively Filtering, obtains target data.
Present invention also provides a kind of device of data filtering, filtering rule is by one to multiple marks and corresponding to every The screening conditions of individual mark are expressed, described to identify at least one row for including initial data, and described device includes:
Source data extraction unit, specify for extracting all marks that the filtering rule is related to from initial data Row, obtain source data;
Source data filter element, for the expression according to filtering rule, successively using screening conditions corresponding to each mark The row record of source data is filtered, obtains target data.
From above technical scheme, in embodiments herein, using the mark including one to multiple initial data row Sign and express filtering rule corresponding to the screening conditions of label, after the row that outgoing label is specified are extracted from initial data, Mark and its corresponding screening conditions are applicable one by one for the row extracted, filter out target data.Using the implementation of the application After example, change the row included by label, you can suitable for the initial data of different pieces of information structure, greatly simplify developer Work;When carrying out data filtering, corresponding screening conditions are applicable for the row that label is specified one by one, computing is handled simplifying While reduce the data volume of processing, greatly accelerate the speed of data filtering.
Brief description of the drawings
Fig. 1 is a kind of flow chart of the method for data filtering in the embodiment of the present application;
Fig. 2 is a kind of tag combination in the application application example, label and the relation schematic diagram of mark;
Fig. 3 is a kind of configuration example figure of filtering direction in the application application example;
Fig. 4 is a kind of hardware structure diagram for the equipment for running the embodiment of the present application;
Fig. 5 is a kind of building-block of logic of the device of data filtering in the embodiment of the present application.
Embodiment
Embodiments herein proposes a kind of method of new data filtering, and the one of initial data is represented by identifying To multiple row, and filtering rule is built to multiple marks and corresponding to the screening conditions of mark using one, crossing filter data When can apply screening conditions one by one, to be directed to the row in initial data specified by corresponding mark to carry out processing computing, make Obtain computing to be simplified, and reduce the data volume for participating in computing;Simultaneously developer can by for identify specify row come The data structure of initial data is matched with, without changing sentence or carrying out data structure conversion, so as to solve prior art Present in problem.
Embodiments herein may operate in any equipment with calculating and storage capacity, such as mobile phone, flat board electricity The equipment such as brain, PC (Personal Computer, PC), notebook, server;Can also be by operating in two or two The logical node of individual above equipment realizes the various functions in the embodiment of the present application.
In embodiments herein, the sources of data is an initial data for arriving multiple systems, different system it is original Data can have identical or different data structure.The filtering rule of initial data is set for the row of initial data, , should when some of row record arranges or the value of some row meets established condition i.e. for the row record of an initial data Row record turns into target data by screening.
Filtering rule can be expressed with one to multiple marks and its corresponding screening conditions, wherein, each mark bag At least one row are included, the set for the row that all marks include is all row that the filtering rule is used;Some mark is corresponding Screening conditions specify the value of row (row that i.e. mark includes) to define with the mark, be a composition portion of filtering rule Point.It can be seen that the computing between mark and its corresponding screening conditions, two or more marks and its corresponding screening conditions Relation (for including two and filtering rule identified above) can be used for expressing all filtering rules.Need what is illustrated It is to apply multiple marks in same filtering rule to include identical to arrange, a filtering rule can also repeatedly make With a mark, do not limit.
To two kinds of initial data of different pieces of information structure, developer can be matched with number by being specified for same mark According to the row of structure, to cause filtering rule to go for every kind of initial data, the program of filtering rule is realized without change, Without the conversion for carrying out data structure.
In embodiments herein, the flow of the method for data filtering is as shown in Figure 1.
Step 110, the row that all marks that filtering rule is related to are specified are extracted from initial data, obtain source data.
To the initial data in some source, the row specified according to the mark for the data structure for being matched with the source, will filter In rule using to the row specified of each mark extract, form what is used when entering every trade record filtering in step 120 Source data.Source data is typically a part (it could also be possible that initial data is in itself) for initial data, and source data is every in other words Row record be all initial data often row record it is some or all, source data often row record in all include be used for expressed Filter the value of the row included by any mark of rule.
For example, mark 1 and mark 2 are used in filtering rule, wherein row A and row B of the mark 1 by some raw data table Composition, mark 2 are made up of the row B and row C of the raw data table, then what all marks that the filtering rule is related to were specified is classified as A, row B and row C, these is listed in each row record in raw data table and extracted, obtains the source formed by arranging A, row B and row C Data.
Step 120, according to filtering rule, successively using screening conditions corresponding to each mark the row of source data is recorded into Row filtering, obtains target data.
After the row specified according to mark extract source data, according to the screening bar of each mark used in filtering rule Operation relation between part and the screening conditions of different identification, filtered one by one using screening conditions corresponding to each mark Source data often row record, by by institute tagged screening row record be used as target data.
Can be according to the number of the mark involved by filtering rule in practical application scene and when being related to two or two Operation relation between the screening conditions identified when identified above, to determine to filter the detailed process of the row record of source data, this The embodiment of application does not limit.It is illustrated below.
In first application scenarios, filtering rule is related to a mark.To the often row record in source data, according to the mark Know and specify whether the value of row meets screening conditions corresponding to the mark, to determine the selection result.If the selection result is by this The screening of mark, then the row is recorded as target data, otherwise abandon the row data.
It is related to two or more marks in second application scenarios, in filtering rule, and each identifies It will not be reused in filtering rule (i.e. using each mark during identity expression filtering rule only using once).Assuming that filtering All N (N is the natural number more than 1) that are identified as that rule is related to are individual, if N number of be identified as mark 1, mark 2, until identifying N, then Target data can be obtained using following process:
Pending data using source data as mark 1, the often row in pending data is recorded, in being recorded according to the row Mark 1 specify row value, with mark 1 corresponding screening conditions, determine the row record whether by identify 1 screening;According to being It is no by the selection result and filtering rule the row is recorded to or is used as target data or as reduced data or It is dropped.When the selection result be by mark 1 screening when, the row record be as target data, reduced data or by Abandon, it is related with the operation relation of other mark screening conditions to the screening conditions of mark 1 in filtering rule;
Pending data using the reduced data of mark 1 as mark 2, above-mentioned screened is repeated using mark 2 Journey (is similar to above-mentioned processing procedure when being screened using mark 1), obtains the reduced data of mark 2;
Until identifying pending data of the reduced data of (N-1) as mark N, after the screening by identifying N, obtain Target data.
Below for two example explanations in second application scenarios, how the operation relation between screening conditions influences To the result of row record when being screened according to some mark.
In example one, filtering rule is related to the first mark and the second mark, and filtering rule is:Pass through the first mark Screen and by the screening of the second mark.When carrying out data filtering, first using source data as the pending of the first mark Data, reduced data will be saved as by the row record of the first mark screening in these pending datas, will not pass through first The row record of mark screening abandons;Secondly, the pending data using the reduced data of the first mark as the second mark, will be logical The row record for crossing the second mark screening saves as target data, will not abandoned by the row record of the second mark screening.
In example two, filtering rule is related to the 3rd mark and the 4th mark, and filtering rule is:Pass through the 3rd mark Screening or the screening by the 4th mark.When carrying out data filtering, first using source data as the pending of the 3rd mark Data, target data will be saved as by the row record of the 3rd mark screening in these pending datas, will not pass through the 3rd mark The row record for knowing screening saves as the reduced data of the 3rd mark;Secondly, using the reduced data of the 3rd mark as the 4th The pending data of mark, the row record by the 4th mark screening is saved as into target data, will not sieved by the 4th mark The row record of choosing abandons.
Filtering direction can be applied to mark in filtering rule, to specify screening bar corresponding to involved each mark How part is applicable.Filtering direction includes forward and reverse, when the filtering direction of some mark is positive, if some row records In the mark specify the value of row to meet screening conditions corresponding to the mark, then row record passes through the screening of the mark;When some When the filtering direction of mark is reverse, if the mark specifies the value of row to be unsatisfactory for screening corresponding to the mark in some row record Condition, the then screening that row record passes through the mark.
For the ease of being built and using filtering rule from the angle of business, can be expressed using one to multiple labels Filter rule.Label is used for describing the business for possessing certain feature, by more than one mark and corresponding to the screening bar each identified Part forms, wherein at least one specified row each identified are related to the features described above of business described by label.
When label includes the mark of two and the above, the computing that label is also included between the screening conditions of these marks is closed System;It is similar, when label of the filtering rule using two and the above, it can also specify the operation relation between these labels.Can See, filtering rule is substantially still determined by mark and its operation relation between corresponding screening conditions, screening conditions, and is marked Label are simply used for a kind of form of organization identification from operational angle, do not interfere with the foregoing substantial filtration process to source data.
Application scenarios to preserving mark and its corresponding screening conditions using label, when carrying out data filtering, when When using some mark, screening conditions corresponding to the mark can be searched in the label belonging to the mark, according to finding Screening conditions the row record of source data is filtered, obtain target data after all marks are traveled through according to filtering rule.
In business has the application scenarios of different levels, the subdivision industry for possessing certain feature can be described using label Business, and it is (common i.e. with the subdivision business to describe to include the subdivision business and other same types subdivision business using tag combination Possess certain business general character other with level business) high-level business.Tag combination includes at least two labels, each Label is all the part of tag combination, in other words, belong between the label of same tag combination be or operation relation. For example, industrial and commercial bank's Net silver channel, Construction Bank's Net silver channel, agricultural bank's Net silver channel are 3 labels, and Net silver channel is then to include this 3 The tag combination of label.
In the application scenarios using tag combination, filtering rule can use at least one label and/or at least one Tag combination is expressed., can be in the label or mark belonging to mark when to use some mark when carrying out data filtering Screening conditions corresponding to the mark are searched in tag combination belonging to knowing, the row of source data is remembered according to the screening conditions found Record is filtered;Target data is obtained after all marks are traveled through according to filtering rule.
It can be seen that in embodiments herein, multiple row are arrived to represent the one of initial data by identifying, use label with And filtering rule is expressed corresponding to the screening conditions of label, can be by after outgoing label is extracted from initial data and specifies row It is individual to apply screening conditions, to be directed to the row in initial data specified by corresponding mark to carry out processing computing, handled simplifying Reduce the data volume of processing while computing, accelerate the speed of data filtering;Meanwhile while developer can by for Mark specifies row to be matched with the data structure of initial data, without changing sentence or carrying out data structure conversion, mitigates The workload of developer.
In one of the application application example, in Third-party payment platform, between multiple trade companies, between multiple users, And between trade company and user, some channels provided by Third-party payment platform and multiple banks are mutually paid, The multiple operation systems for being responsible for the different payment transactions of processing generate the daily record of each business, and these daily records will be used as initial data, Classified according to bank, channel (such as quick, Net silver), funds flow (as flowed into, flowing out), dynamically filtered out in daily record Corresponding data, it is presented in the monitoring form such as form, pie chart, block diagram.
In Third-party payment platform, according to the statistical demand to daily record data, described using mark in initial data With a certain or some data filtering of business and the row of statistical correlation, also the row in initial data or these row are defined To identify the row specified.When mark is applied into some data filtering process, meet the data filtering mistake for the flag The screening conditions of journey demand, and several marks associated with the characteristic of a certain business and its screening conditions are combined as marking Label.Be to target data in application scenarios by respectively by two or more labels filtering after data acquisition system situation, The set of these labels can also be generated tag combination.A kind of tag combination, label and the relation of mark are as shown in Figure 2.
Third-party payment platform using label and/or tag combination to express data filtering when filtering rule, expressing During filtering rule, it is possible to specify label or the filtering direction of tag combination, be that will meet all marks in label or tag combination The row record of corresponding screening conditions still will not meet label or set of tags as the row record (forward filter) by screening The row record of the screening conditions of all marks pair is as the row record (reverse filtration) by screening in conjunction.A kind of filtering direction Configuration example is as shown in Figure 3.
The server (hereinafter referred to as filtering server) for being responsible for carrying out data filtering in Third-party payment platform is receiving After coming from the initial data of some operation system, the configuration for the filtering rule for filtering initial data use (including is used Label, tag combination, operation relation between mark and its corresponding screening conditions, screening conditions etc.) write-in caching.
Filtering server reads the tagged specified row of institute that the filtering rule uses from caching, by these specify row from Extracted in initial data, form the source data for filtering.According to the operation relation between being identified in filtering rule, filtering Server the pending data using source data as the mark, is inquired about belonging to the mark in the buffer since identifying first The corresponding screening conditions of its in label or tag combination and filtering direction, judge to treat line by line based on screening conditions and filtering direction Whether the row record of processing data by the screening of this mark, further according to the operation relation identified with other, decision will by and The row record not screened by this mark writes target data, write-in reduced data or abandoned;Using first mark After all rows record for traveling through pending data, filtering server extracts second mark, by the processed number of first mark According to the pending data as second mark, process is repeated the above, until the processing procedure by last mark Afterwards, target data is obtained.
For example, to some mark, its processing procedure can be:A virtual mark is formed according to the specified row of the mark Know, search whether the mark consistent with the virtual identifying in the buffer;If it is not, the filtering rule in may caching can Can be updated, exit filter process;If so, then searching the mark belongs to which label or which complicated label, application In the label or complicated label the processing of the row record of pending data is carried out corresponding to the screening conditions of the mark.
For example, the example that the trading volume data of all channels are obtained from data source is as shown in table 1, wherein dimension 2 is hair The related organization of these raw transaction:
Dimension 1 Dimension 2 Trading volume
Fininflux Icbc9011 100
Fininflux Icbc701 101
supergw ICBC51_ICBCSH010120 102
Fininflux CCb701 103
Table 1
If it is desired to filter out the data that relational structure is industrial and commercial bank, then dimension 1 and dimension 2 are formed into mark A, by label A It is arranged to include mark 1, and it is that dimension 2 is industrial and commercial bank's channel to identify screening conditions corresponding to 1, then and the row in table 1 records When being handled, according to the definition of mark 1, dimension 1, dimension 2 are assembled into dummy index, and search whether exist in the buffer Consistent mark;After consistent mark 1 is found, screening conditions corresponding to mark 1 in label A belonging to mark 1 are searched;Using corresponding In the screening conditions of mark 1, if the dimension 2 of row record is industrial and commercial bank's channel, the row is recorded as target data.
Corresponding with the realization of above-mentioned flow, embodiments herein additionally provides a kind of device of data filtering.The device can To be realized by software, can also be realized by way of hardware or software and hardware combining.Exemplified by implemented in software, as logic Device in meaning, it is by corresponding calculating by the CPU (Central Process Unit, central processing unit) of place equipment Machine programmed instruction reads what operation in internal memory was formed.For hardware view, except the CPU shown in Fig. 4, internal memory and it is non-easily Outside the property lost memory, the equipment where the device of data filtering generally also includes being used to carry out chip of wireless signal transmitting-receiving etc. Other hardware, and/or other hardware such as board for realizing network communicating function.
Fig. 5 show a kind of device of data filtering of the embodiment of the present application offer, and filtering rule is by one to multiple marks Know and expressed corresponding to the screening conditions each identified, it is described to identify at least one row for including initial data, the dress Put including source data extraction unit and source data filter element, wherein:Source data extraction unit is used to extract from initial data The row that all marks that the filtering rule is related to are specified, obtain source data;Source data filter element is used for according to filtering rule Expression, the row record of source data is filtered using screening conditions corresponding to each mark successively, obtains target data.
In a kind of implementation, the filtering rule be related to it is all be identified as it is N number of and each identify expression filter Only using once when regular, N is natural number;The source data filter element is specifically used for:Wait to locate using source data as mark 1 Data are managed, the value of row is specified according to mark 1 in the often row record of pending data, determines institute with 1 corresponding screening conditions of mark State row record whether by mark 1 screening, according to whether by result and filtering rule the row is recorded or is used as Target data or as reduced data or being dropped;Using the reduced data of upper one mark as next mark Pending data, above-mentioned screening process is repeated using next mark;After the screening by identifying N, number of targets is obtained According to.
In above-mentioned implementation, the filtering rule includes:By the first screening identified and pass through the second mark Screening;The source data filter element is specifically used for:The row of the first mark screening will be passed through in the pending data of first mark Record saves as reduced data;Pending data using the reduced data of the first mark as the second mark, will be by the The row record of two mark screenings saves as target data.
In above-mentioned implementation, the filtering rule includes:By the 3rd screening identified or pass through the 4th mark Screening;The source data filter element is specifically used for:The row of the 3rd mark screening will be passed through in the pending data of 3rd mark Record saves as target data, the reduced data of the 3rd mark will not be saved as by the row record of the 3rd mark screening;Will Pending data of the reduced data of 3rd mark as the 4th mark, the row record by the 4th mark screening is saved as Target data.
In above-mentioned implementation, the filtering rule also includes:Filtering direction;When filtering direction is positive, if certain Being identified in individual row record specifies the value of row to meet screening conditions corresponding to the mark, then the row record passes through the mark Screening;When filtering direction is reverse, if the value that specified row are identified in some row record is unsatisfactory for, the mark is corresponding to sieve Condition is selected, then the screening that the row record passes through the mark.
In one example, the filtering rule is expressed by least one label;The label is used for describing to possess necessarily The business of feature, including more than one is identified and corresponding to the screening conditions each identified, what is each identified is at least one specified Row are related to the feature;The source data filter element is specifically used for:To each mark, searched in the label belonging to mark Screening conditions corresponding to mark, the row record of source data is filtered according to the screening conditions;According to filtering rule time Target data is obtained after going through all marks.
In above-mentioned example, the filtering rule is expressed by least one label and/or at least one tag combination, described Tag combination includes at least two labels;The source data filter element is specifically used for:To each mark, in the mark belonging to mark Screening conditions corresponding to searching mark in label or tag combination, were carried out according to the screening conditions to the row record of source data Filter;Target data is obtained after all marks are traveled through according to filtering rule.
The preferred embodiment of the application is the foregoing is only, not limiting the application, all essences in the application God any modification, equivalent substitution and improvements done etc., should be included within the scope of the application protection with principle.
In a typical configuration, computing device includes one or more processors (CPU), input/output interface, net Network interface and internal memory.
Internal memory may include computer-readable medium in volatile memory, random access memory (RAM) and/or The forms such as Nonvolatile memory, such as read-only storage (ROM) or flash memory (flash RAM).Internal memory is computer-readable medium Example.
Computer-readable medium includes permanent and non-permanent, removable and non-removable media can be by any method Or technology come realize information store.Information can be computer-readable instruction, data structure, the module of program or other data. The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), moved State random access memory (DRAM), other kinds of random access memory (RAM), read-only storage (ROM), electric erasable Programmable read only memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc read-only storage (CD-ROM), Digital versatile disc (DVD) or other optical storages, magnetic cassette tape, the storage of tape magnetic rigid disk or other magnetic storage apparatus Or any other non-transmission medium, the information that can be accessed by a computing device available for storage.Define, calculate according to herein Machine computer-readable recording medium does not include temporary computer readable media (transitory media), such as data-signal and carrier wave of modulation.
It should also be noted that, term " comprising ", "comprising" or its any other variant are intended to nonexcludability Comprising so that process, method, commodity or equipment including a series of elements not only include those key elements, but also wrapping Include the other element being not expressly set out, or also include for this process, method, commodity or equipment intrinsic want Element.In the absence of more restrictions, the key element limited by sentence "including a ...", it is not excluded that wanted including described Other identical element also be present in the process of element, method, commodity or equipment.
It will be understood by those skilled in the art that embodiments herein can be provided as method, system or computer program product. Therefore, the application can be using the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Form.Deposited moreover, the application can use to can use in one or more computers for wherein including computer usable program code The shape for the computer program product that storage media is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.

Claims (14)

  1. A kind of 1. method of data filtering, it is characterised in that filtering rule is by one to multiple marks and corresponding to each mark The screening conditions of knowledge are expressed, described to identify at least one row for including initial data, and methods described includes:
    The row that all marks that the filtering rule is related to are specified are extracted from initial data, obtain source data;
    According to the expression of filtering rule, the row record of source data was carried out using screening conditions corresponding to each mark successively Filter, obtains target data.
  2. 2. according to the method for claim 1, it is characterised in that the filtering rule be related to it is all be identified as it is N number of and Each identify when expressing filtering rule only using once, N is natural number;
    The expression according to filtering rule, the row record of source data is carried out using screening conditions corresponding to each mark successively Filtering, obtains target data, including:Pending data using source data as mark 1, recorded according to the often row of pending data Middle mark 1 specifies the value of row, row record is determined with 1 corresponding screening conditions of mark whether by identify 1 screening, according to Whether by result and filtering rule the row is recorded to or is used as target data or as reduced data or It is dropped;Pending data using the reduced data of upper one mark as next mark, repeated using next mark Above-mentioned screening process;After the screening by identifying N, target data is obtained.
  3. 3. according to the method for claim 2, it is characterised in that the filtering rule includes:By first mark screening, And pass through the screening of the second mark;
    It is described the row record of source data to be filtered using screening conditions corresponding to each mark successively according to filtering rule, Target data is obtained, including:Locate being saved as in the pending data of the first mark by the row record of the first mark screening Manage data;Pending data using the reduced data of the first mark as the second mark, the row of the second mark screening will be passed through Record saves as target data.
  4. 4. according to the method for claim 2, it is characterised in that the filtering rule includes:By the 3rd mark screening, Or the screening by the 4th mark;
    It is described the row record of source data to be filtered using screening conditions corresponding to each mark successively according to filtering rule, Target data is obtained, including:Target will be saved as by the row record of the 3rd mark screening in the pending data of 3rd mark Data, the reduced data of the 3rd mark will not be saved as by the row record of the 3rd mark screening;By the place of the 3rd mark Pending data of the data as the 4th mark is managed, the row record by the 4th mark screening is saved as into target data.
  5. 5. according to the method for claim 2, it is characterised in that the filtering rule also includes:Filtering direction;When filtering side To for forward direction when, if some row record in identify specify row value meet screening conditions corresponding to the mark, the row The screening that record passes through the mark;When filtering direction is reverse, the value of row is specified to be discontented with if identified in some row record Screening conditions corresponding to the foot mark, the then screening that the row record passes through the mark.
  6. 6. according to the method for claim 1, it is characterised in that the filtering rule is expressed by least one label;Institute State label and be used for describing the business that possesses certain feature, including more than one mark and corresponding to the screening conditions each identified, At least one specified row each identified are related to the feature;
    It is described the row record of source data to be filtered using screening conditions corresponding to each mark successively according to filtering rule, Target data is obtained, including:To each mark, screening conditions corresponding to mark are searched in the label belonging to mark, according to institute Screening conditions are stated to filter the row record of source data;Target data is obtained after all marks are traveled through according to filtering rule.
  7. 7. according to the method for claim 6, it is characterised in that the filtering rule is by least one label and/or at least One tag combination is expressed, and the tag combination includes at least two labels;
    It is described the row record of source data to be filtered using screening conditions corresponding to each mark successively according to filtering rule, Target data is obtained, including:To each mark, screening bar corresponding to mark is searched in the label belonging to mark or tag combination Part, the row record of source data is filtered according to the screening conditions;Obtained after all marks are traveled through according to filtering rule Target data.
  8. 8. a kind of device of data filtering, it is characterised in that filtering rule is by one to multiple marks and corresponding to each mark The screening conditions of knowledge are expressed, described to identify at least one row for including initial data, and described device includes:
    Source data extraction unit, the row specified for extracting all marks that the filtering rule is related to from initial data, is obtained To source data;
    Source data filter element, for the expression according to filtering rule, successively using screening conditions corresponding to each mark to source The row record of data is filtered, and obtains target data.
  9. 9. device according to claim 8, it is characterised in that the filtering rule be related to it is all be identified as it is N number of and Each identify when expressing filtering rule only using once, N is natural number;
    The source data filter element is specifically used for:Pending data using source data as mark 1, according to pending data Often whether mark 1 specifies the value of row, determines the row record by mark 1 with 1 corresponding screening conditions of mark in row record Screening, according to whether by result and filtering rule the row is recorded to or is used as target data or as processed Data or it is dropped;Pending data using the reduced data of upper one mark as next mark, use are next Mark repeats above-mentioned screening process;After the screening by identifying N, target data is obtained.
  10. 10. device according to claim 9, it is characterised in that the filtering rule includes:Pass through the sieve of the first mark Select and by the screening of the second mark;
    The source data filter element is specifically used for:It will be remembered in the pending data of first mark by the row of the first mark screening Record saves as reduced data;Pending data using the reduced data of the first mark as the second mark, will pass through second The row record of mark screening saves as target data.
  11. 11. device according to claim 9, it is characterised in that the filtering rule includes:Pass through the sieve of the 3rd mark Choosing or the screening by the 4th mark;
    The source data filter element is specifically used for:It will be remembered in the pending data of 3rd mark by the row of the 3rd mark screening Record saves as target data, the reduced data of the 3rd mark will not be saved as by the row record of the 3rd mark screening;By Pending data of the reduced data of three marks as the 4th mark, the row record by the 4th mark screening is saved as into mesh Mark data.
  12. 12. device according to claim 9, it is characterised in that the filtering rule also includes:Filtering direction;Work as filtering When direction is positive, the value of row is specified to meet screening conditions corresponding to the mark if identified in some row record, it is described The screening that row record passes through the mark;When filtering direction is reverse, the value arranged is specified not if identified in some row record Meet screening conditions corresponding to the mark, then the screening that the row record passes through the mark.
  13. 13. device according to claim 8, it is characterised in that the filtering rule is expressed by least one label;Institute State label and be used for describing the business that possesses certain feature, including more than one mark and corresponding to the screening conditions each identified, At least one specified row each identified are related to the feature;
    The source data filter element is specifically used for:To each mark, search corresponding to mark and sieve in the label belonging to mark Condition is selected, the row record of source data is filtered according to the screening conditions;After all marks are traveled through according to filtering rule Obtain target data.
  14. 14. device according to claim 13, it is characterised in that the filtering rule is by least one label and/or extremely Lack a tag combination to express, the tag combination includes at least two labels;
    The source data filter element is specifically used for:To each mark, mark is searched in the label belonging to mark or tag combination Screening conditions corresponding to knowledge, the row record of source data is filtered according to the screening conditions;Traveled through according to filtering rule Target data is obtained after all marks.
CN201710245934.5A 2017-04-14 2017-04-14 Data filtering method and device Active CN107391532B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710245934.5A CN107391532B (en) 2017-04-14 2017-04-14 Data filtering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710245934.5A CN107391532B (en) 2017-04-14 2017-04-14 Data filtering method and device

Publications (2)

Publication Number Publication Date
CN107391532A true CN107391532A (en) 2017-11-24
CN107391532B CN107391532B (en) 2020-08-04

Family

ID=60338289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710245934.5A Active CN107391532B (en) 2017-04-14 2017-04-14 Data filtering method and device

Country Status (1)

Country Link
CN (1) CN107391532B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728113A (en) * 2018-06-28 2020-01-24 北京金山办公软件股份有限公司 Information screening method and device of electronic forms and terminal equipment
CN110909149A (en) * 2018-09-17 2020-03-24 北京国双科技有限公司 Data filtering method and device
CN111177700A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Method and device for controlling row-level authority
CN112311559A (en) * 2019-07-24 2021-02-02 中兴通讯股份有限公司 Counter user-defined filtering method and device and computer readable storage medium
CN112749200A (en) * 2020-12-21 2021-05-04 北京百分点科技集团股份有限公司 Crowd screening method and device
CN113778502A (en) * 2020-06-29 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method, device, system and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1725219A (en) * 2004-07-22 2006-01-25 国际商业机器公司 A method and system for managing access by multiple users to persistently stored queries
CN101739453A (en) * 2009-12-17 2010-06-16 中国电力科学研究院 Method and device for carrying out condition query on database table
CN102236659A (en) * 2010-04-27 2011-11-09 中国银联股份有限公司 Method and system for filtering data from data source by using complex conditions
CN102541811A (en) * 2010-12-27 2012-07-04 中国银联股份有限公司 On-demand computing-based data analysis device and method for analysis factors
CN102567413A (en) * 2010-12-31 2012-07-11 中国银联股份有限公司 System and method for data filtering
CN102567481A (en) * 2011-12-19 2012-07-11 华为技术有限公司 Data query filter method and device
CN104765731A (en) * 2014-01-02 2015-07-08 国际商业机器公司 Database query optimization method and equipment
US20170060967A1 (en) * 2015-08-31 2017-03-02 International Business Machines Corporation Bloom filter utilization for join processing

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1725219A (en) * 2004-07-22 2006-01-25 国际商业机器公司 A method and system for managing access by multiple users to persistently stored queries
CN101739453A (en) * 2009-12-17 2010-06-16 中国电力科学研究院 Method and device for carrying out condition query on database table
CN102236659A (en) * 2010-04-27 2011-11-09 中国银联股份有限公司 Method and system for filtering data from data source by using complex conditions
CN102541811A (en) * 2010-12-27 2012-07-04 中国银联股份有限公司 On-demand computing-based data analysis device and method for analysis factors
CN102567413A (en) * 2010-12-31 2012-07-11 中国银联股份有限公司 System and method for data filtering
CN102567481A (en) * 2011-12-19 2012-07-11 华为技术有限公司 Data query filter method and device
CN104765731A (en) * 2014-01-02 2015-07-08 国际商业机器公司 Database query optimization method and equipment
US20170060967A1 (en) * 2015-08-31 2017-03-02 International Business Machines Corporation Bloom filter utilization for join processing

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728113A (en) * 2018-06-28 2020-01-24 北京金山办公软件股份有限公司 Information screening method and device of electronic forms and terminal equipment
CN110909149A (en) * 2018-09-17 2020-03-24 北京国双科技有限公司 Data filtering method and device
CN110909149B (en) * 2018-09-17 2022-06-03 北京国双科技有限公司 Data filtering method and device
CN112311559A (en) * 2019-07-24 2021-02-02 中兴通讯股份有限公司 Counter user-defined filtering method and device and computer readable storage medium
US11777828B2 (en) 2019-07-24 2023-10-03 Zte Corporation Self-definable counter-based filtering method and device, and computer readable storage medium
CN111177700A (en) * 2019-12-31 2020-05-19 北京明略软件系统有限公司 Method and device for controlling row-level authority
CN113778502A (en) * 2020-06-29 2021-12-10 北京沃东天骏信息技术有限公司 Data processing method, device, system and storage medium
CN112749200A (en) * 2020-12-21 2021-05-04 北京百分点科技集团股份有限公司 Crowd screening method and device

Also Published As

Publication number Publication date
CN107391532B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN107391532A (en) The method and apparatus of data filtering
US9652467B2 (en) Inline tree data structure for high-speed searching and filtering of large datasets
CN107783734A (en) A kind of resource allocation methods, device and terminal based on super fusion storage system
Kosowska-Stamirowska et al. Evolving structure of the maritime trade network: evidence from the Lloyd’s Shipping Index (1890–2000)
US10579589B2 (en) Data filtering
CN110046168A (en) A kind of incremental data consistency implementation method and device
CN110109910A (en) Data processing method and system, electronic equipment and computer readable storage medium
CN109902250A (en) Sharing method, sharing means, computer equipment and the storage medium of questionnaire survey
CN109189861A (en) Data stream statistics method, server and storage medium based on index
US7302442B2 (en) Method for recording, identification, selection, and reporting network transversal paths
Naik et al. Role of Big Data in various sectors
CN112434884A (en) Method and device for establishing supplier classified portrait
CN107729330B (en) Method and apparatus for acquiring data set
CN111367956A (en) Data statistical method and device
CN105335886A (en) Method and device for processing financial data
CN111383072A (en) User credit scoring method, storage medium and server
US11308130B1 (en) Constructing ground truth when classifying data
CN107807795A (en) Multidimensional classification Hash coding, coding/decoding method and equipment, storage medium
Lehtomäki et al. Running a zonation planning project
CN106227644A (en) A kind of magnanimity information processing device
CN113240489B (en) Article recommendation method and device based on big data statistical analysis
CN115630070A (en) Information pushing method, computer-readable storage medium and electronic device
CN113610629A (en) Method and device for screening client data features from large-scale feature set
CN113485987A (en) Enterprise information tag generation method and device
Alghamdi et al. Big data for C4I systems: goals, applications, challenges and tools

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.