CN110909149B - Data filtering method and device - Google Patents

Data filtering method and device Download PDF

Info

Publication number
CN110909149B
CN110909149B CN201811082516.XA CN201811082516A CN110909149B CN 110909149 B CN110909149 B CN 110909149B CN 201811082516 A CN201811082516 A CN 201811082516A CN 110909149 B CN110909149 B CN 110909149B
Authority
CN
China
Prior art keywords
filtering
data
hash table
filtering condition
conditional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811082516.XA
Other languages
Chinese (zh)
Other versions
CN110909149A (en
Inventor
左思图
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201811082516.XA priority Critical patent/CN110909149B/en
Publication of CN110909149A publication Critical patent/CN110909149A/en
Application granted granted Critical
Publication of CN110909149B publication Critical patent/CN110909149B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data filtering method and a data filtering device, wherein a conditional hash table and a full hash table are constructed by converting a data format of a filtering rule, and each data content in an original data line record is filtered in a space time-converting mode according to the complexity of O (1). The time complexity of M rows of data items recorded in the rows and N filtering conditions in the filtering rule is reduced from the worst O (M x N) to O (2M + N), and the final filtering speed is not influenced by the increase of the filtering conditions, so that the timeliness of data analysis is ensured.

Description

Data filtering method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data filtering method and apparatus.
Background
With the increasing data dependency of enterprise operation, the timeliness requirement of data analysis is higher and higher. For real-time data that is generated continuously, it is desirable to filter out the data records for analysis in as short a time as possible.
In the prior art, when original data is filtered, each line record needs to be sequentially filtered by combining with a filtering rule. The filtering efficiency decreases with the increase of the filtering conditions in the filtering rule, and the data content of each data item needs to be compared with all the filtering conditions in the worst case. Assuming that a row records M columns of data items and N filtering conditions in the filtering rule, the worst case time complexity for filtering the row record is O (M × N), which seriously affects the timeliness of data analysis.
Disclosure of Invention
In view of the above problems, the present invention is proposed to provide a data filtering method and apparatus that overcomes or at least partially solves the above problems, and the technical solution is as follows:
a method of data filtering, comprising:
acquiring a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item;
performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content;
counting a target filtering condition number which is met by a row record of the original data based on the conditional hash table and the full hash table;
and if the statistical quantity of the target filtering condition numbers is equal to the maximum filtering condition number in the filtering rule, outputting the line record.
Preferably, the counting, based on the conditional hash table and the full hash table, a target filtering condition number that a row record of the raw data conforms to includes:
acquiring data content under a data item from the line record of the original data;
taking the data content as a first key of the conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table;
taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table;
determining the first filtering condition number and the second filtering condition number as filtering condition numbers of the data contents;
and determining the intersection of the filtering condition numbers of the data contents in the row record as the target filtering condition number which is met by the row record.
Preferably, the acquiring data content under data items from the line record of the original data includes:
and determining a data item corresponding to the filtering item from the line record of the original data, and acquiring the data content under the determined data item.
Preferably, the method further comprises:
generating a numbered hash table; and the key of the numbered hash table is the number of the target filtering condition, and the value of the numbered hash table is the statistical number of the target filtering condition.
A data filtering device, comprising:
the rule obtaining module is used for obtaining a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item;
the format conversion module is used for performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content;
the number counting module is used for counting the target filtering condition numbers which are met by the row records of the original data based on the conditional hash table and the full hash table;
and the output module is used for outputting the line record if the statistical quantity of the target filtering condition numbers is equal to the maximum filtering condition number in the filtering rule.
Preferably, the number statistics module is specifically configured to:
acquiring data content under a data item from the line record of the original data; taking the data content as a first key of the conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table; taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table; determining the first filtering condition number and the second filtering condition number as filtering condition numbers of the data contents; and determining the intersection of the filtering condition numbers of the data contents in the row record as the target filtering condition number which is met by the row record.
Preferably, the number statistics module, configured to obtain data content under a data item from the line record of the original data, is specifically configured to:
and determining a data item corresponding to the filtering item from the line record of the original data, and acquiring the data content under the determined data item.
Preferably, the apparatus further comprises:
the generation module is used for generating a serial number hash table; and the key of the numbered hash table is the number of the target filtering condition, and the value of the numbered hash table is the statistical number of the target filtering condition.
A storage medium comprising a stored program, wherein the program performs the data filtering method of any one of the preceding claims.
A processor for running a program, wherein the program when running performs the data filtering method of any preceding claim.
By means of the technical scheme, the data filtering method and the data filtering device provided by the invention construct the conditional hash table and the full hash table by performing data format conversion on the filtering rule, and filter each data content in the original data row record in a space time-changing mode with the complexity of O (1). The time complexity of M columns of data items recorded in the row and N filtering conditions in the filtering rule is reduced from the worst O (M x N) to O (2M + N), and the final filtering speed is not influenced by the increase of the filtering conditions, so that the timeliness of data analysis is ensured.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 illustrates a method flow diagram of a data filtering method;
FIG. 2 illustrates a partial method flow diagram of a data filtering method;
FIG. 3 illustrates another method flow diagram of a data filtering method;
FIG. 4 shows a schematic of the structure of a data filtering device;
fig. 5 shows another schematic diagram of the data filtering device.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Hash table (Hash table, also called Hash table): is a data structure that is directly accessed from a Key value. That is, it accesses the record by computing a mapping function for the key value to map the data of the query to a location in the table, which speeds up the lookup.
As a popular example, to find a person's number in a phone book, a table may be created that is arranged in the order of the initials (i.e., a functional relationship is established from the person's name x to the initials f (x)), and the phone number of the "king" surname is found in the table with the initials W, obviously much faster than it is found directly. Here, the "initials" are the function rule F () of the mapping function using the name of a person as a key, and the table storing the initials corresponds to a hash table.
Based on the related content of the hash table, an embodiment of the present invention provides a data filtering method, where a method flowchart of the method is shown in fig. 1, and the method includes the following steps:
s10, acquiring a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item.
Take the following filtering rules as shown in table 1 as an example:
filter Condition numbering Name (I) Nationality Sex
1 Zhang San For male
2 Zhang San Chinese character input method
TABLE 1
The filtering rule is composed of two filtering conditions; the filter condition number 1 indicates "name of zhang san, gender of man, and no limitation to nationality", and the filter condition number 2 indicates "name of zhang san, ethnicity of chinese, and no limitation to gender".
S20, performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content.
Continuing with the filtering rule shown in step S10 as an example, for a scenario with a non-null condition, in the process of constructing the conditional hash table, the following conditional hash table is obtained by sequentially using the non-null filtering contents "zhang san", "man", and "han" as keys and the filtering condition number corresponding to the non-null filtering contents as a value: zhangtri- >1, 2; han- > 2; male- > 1. In order to process the scene of the null condition, in the process of constructing the full hash table, the filtering items "nationality" and "gender" corresponding to the null filtering content are used as keys, and the filtering condition number corresponding to the null filtering content is used as a value, so that the following full hash table is obtained: ethnic group- > 1; sex- > 2.
And S30, counting the target filtering condition numbers which are met by the row records of the original data based on the conditional hash table and the full hash table.
Take the raw data as an example as shown in table 2 below:
Figure BDA0001802315560000051
Figure BDA0001802315560000061
TABLE 2
Continuing with the filtering rule shown in step S10 as an example, for the raw data, the raw data is sequentially filtered in units of row records, where the row records are used to represent a complete piece of information in the database. The first line is recorded with "zhang san, han, man" as an example for explanation:
the "zhang san", "han" and "man" are used as keys of the conditional hash table in order to determine the filtering condition numbers "1, 2" corresponding to the "zhang san", the filtering condition number "2" corresponding to the "han" and the filtering condition number "1" corresponding to the "man".
The "name", "ethnicity", and "gender" are used as keys of the full-scale hash table in this order to determine the filter condition number "1" corresponding to "ethnicity" and the filter condition number "2" corresponding to "gender".
Therefore, by combining the above results, the filter condition numbers "1, 2" corresponding to "zhangsan", the filter condition numbers "1, 2" corresponding to "han", and the filter condition numbers "1, 2" corresponding to "man" can be specified. Therefore, for the first line record, the number of the target filtering condition met is "1, 2".
In a specific implementation process, in step S30, "counting the target filtering condition number met by the row record of the original data based on the conditional hash table and the full hash table" may specifically adopt the following steps, and a part of the method flowchart is shown in fig. 2:
s301, acquiring data content under the data item from the line record of the original data.
In the process of executing step S301, if the filter item contains a plurality of data items, data contents under the plurality of data items corresponding to the filter item are acquired.
S302, the data content is used as a first key of the conditional hash table, and a first filtering condition number corresponding to the first key is searched in the conditional hash table.
And S303, taking the data item as a second key of the full-size hash table, and searching a second filtering condition number corresponding to the second key in the full-size hash table.
S304, the first filter condition number and the second filter condition number are determined as the filter condition number of the data content.
In the process of executing step S304, for a certain data content, the union of the first filtering condition number obtained in step S302 and the second filtering condition number obtained in step S303 is used as the filtering condition number of the data content.
S305, determining an intersection of the filter condition numbers of the data contents in the line record as a target filter condition number met by the line record.
In the process of executing step S305, if the row record contains a plurality of data contents, the intersection of the filter condition numbers of the data contents obtained in step S305 is used as the target filter condition number to which the row record conforms.
In some other embodiments, to obtain valid data in the original data, step S301 shown in fig. 2 specifically includes:
and determining the data items corresponding to the filter items from the line records of the original data, and acquiring the data contents under the determined data items.
For example, a filter item 1 and a filter item 2 are used in the filter rule, where the filter item 1 is composed of a data item a and a data item B of original data, and the filter item 2 is composed of a data item C and a data item D of the original data, and then data contents under the data item a, the data item B, the data item C and the data item D in the line record of the original data are extracted and combined as valid data.
S40, if the statistical number of the target filter condition numbers is equal to the maximum filter condition number in the filter rule, a line record is output.
In the process of executing step S40, if the statistical data of the target filtering condition is equal to the maximum filtering condition number in the filtering rule, it indicates that the line record conforms to all the filtering conditions in the filtering rule, and the line record is output.
In some other embodiments, in order to facilitate the user to trace the filtering result later, on the basis of the data filtering method shown in fig. 1, the following steps are further included, and a flowchart of the method is shown in fig. 3:
s50, generating a numbered hash table; and the key of the numbered hash table is the target filtering condition number, and the value of the numbered hash table is the statistical number of the target filtering condition number.
According to the data filtering method provided by the embodiment of the invention, the conditional hash table and the full hash table are constructed by converting the data format of the filtering rule, and each data content in the original data line record is filtered in a space time-converting mode with the complexity of O (1). The time complexity of M rows of data items recorded in the rows and N filtering conditions in the filtering rule is reduced from the worst O (M x N) to O (2M + N), and the final filtering speed is not influenced by the increase of the filtering conditions, so that the timeliness of data analysis is ensured.
Based on the data filtering method provided in the foregoing embodiment, an embodiment of the present invention correspondingly provides an apparatus for executing the data filtering method, where a schematic structural diagram of the apparatus is shown in fig. 4, and the apparatus includes:
a rule obtaining module 10, configured to obtain a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item.
The format conversion module 20 is used for performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content.
And a number counting module 30, configured to count, based on the conditional hash table and the full hash table, a target filtering condition number that the row record of the original data conforms to.
And the output module 40 is used for outputting the line record if the statistical number of the target filtering condition numbers is equal to the maximum filtering condition number in the filtering rule.
Optionally, the number statistics module 30 is specifically configured to:
acquiring data content under the data items from the line records of the original data; taking the data content as a first key of a conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table; taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table; determining the first filtering condition number and the second filtering condition number as the filtering condition number of the data content; and determining the intersection of the filtering condition numbers of the data contents in the line records as the target filtering condition number which is met by the line records.
Optionally, the number statistics module 30 is configured to obtain data content under the data item from the line record of the original data, and specifically configured to:
and determining the data items corresponding to the filter items from the line records of the original data, and acquiring the data contents under the determined data items.
Optionally, on the basis of the data filtering apparatus shown in fig. 4, the apparatus further includes the following modules, and a schematic structural diagram is shown in fig. 5:
a generating module 50, configured to generate a numbered hash table; and the key of the numbered hash table is the target filtering condition number, and the value of the numbered hash table is the statistical number of the target filtering condition number.
The data filtering device provided by the embodiment of the invention constructs the conditional hash table and the full hash table by performing data format conversion on the filtering rule, and filters each data content in the original data line record in a space time-switching mode with the complexity of O (1). The time complexity of M rows of data items recorded in the rows and N filtering conditions in the filtering rule is reduced from the worst O (M x N) to O (2M + N), and the final filtering speed is not influenced by the increase of the filtering conditions, so that the timeliness of data analysis is ensured.
The data filtering device comprises a processor and a memory, wherein the rule acquisition module, the format conversion module, the number counting module, the output module and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor comprises a kernel, and the kernel calls the corresponding program unit from the memory. One or more than one kernel can be set, and the timeliness of data analysis is ensured by adjusting kernel parameters.
The memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
An embodiment of the present invention provides a storage medium having a program stored thereon, which when executed by a processor implements the data filtering method.
The embodiment of the invention provides a processor, which is used for running a program, wherein the data filtering method is executed when the program runs.
The embodiment of the invention provides equipment, which comprises a processor, a memory and a program which is stored on the memory and can run on the processor, wherein the processor executes the program and realizes the following steps:
acquiring a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item;
performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content;
counting a target filtering condition number which is met by a row record of the original data based on the conditional hash table and the full hash table;
and if the statistical quantity of the target filtering condition numbers is equal to the maximum filtering condition number in the filtering rule, outputting the line record.
Wherein, the counting of the target filtering condition number met by the row record of the original data based on the conditional hash table and the full hash table includes:
acquiring data content under a data item from the line record of the original data;
taking the data content as a first key of the conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table;
taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table;
determining the first filtering condition number and the second filtering condition number as filtering condition numbers of the data contents;
and determining the intersection of the filtering condition numbers of the data contents in the row record as the target filtering condition number which is met by the row record.
Wherein, the acquiring the data content under the data item from the line record of the original data comprises:
and determining a data item corresponding to the filter item from the line record of the original data, and acquiring the determined data content under the data item.
Further, the method further comprises:
generating a numbered hash table; and the key of the numbered hash table is the number of the target filtering condition, and the value of the numbered hash table is the statistical number of the target filtering condition.
The device herein may be a server, a PC, a PAD, a mobile phone, etc.
The present application further provides a computer program product adapted to perform a program for initializing the following method steps when executed on a data processing device:
acquiring a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item;
performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is a non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full-volume hash table is a filtering item corresponding to an empty filtering content, and the value of the full-volume hash table is a filtering condition number corresponding to the empty filtering content;
counting a target filtering condition number which is met by a row record of the original data based on the conditional hash table and the full hash table;
and if the statistical quantity of the target filtering condition numbers is equal to the maximum filtering condition number in the filtering rule, outputting the line record.
Wherein, the counting of the target filtering condition number met by the row record of the original data based on the conditional hash table and the full hash table includes:
acquiring data content under a data item from the line record of the original data;
taking the data content as a first key of the conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table;
taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table;
determining the first filtering condition number and the second filtering condition number as filtering condition numbers of the data contents;
and determining the intersection of the filtering condition numbers of the data contents in the line record as a target filtering condition number which is met by the line record.
Wherein the acquiring of the data content under the data item from the line record of the original data comprises:
and determining a data item corresponding to the filtering item from the line record of the original data, and acquiring the data content under the determined data item.
Further, the method further comprises:
generating a number hash table; and the key of the numbered hash table is the number of the target filtering condition, and the value of the numbered hash table is the statistical number of the target filtering condition.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (8)

1. A method of filtering data, comprising:
acquiring a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item;
performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content;
counting a target filtering condition number which is accorded with a row record of the original data based on the conditional hash table and the full hash table;
if the statistical number of the target filtering condition numbers is equal to the maximum filtering condition number in the filtering rule, outputting the line record;
the counting of the target filtering condition number met by the row record of the original data based on the conditional hash table and the full hash table comprises the following steps:
acquiring data content under a data item from the line record of the original data;
taking the data content as a first key of the conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table;
taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table;
determining the first filtering condition number and the second filtering condition number as filtering condition numbers of the data contents;
and determining the intersection of the filtering condition numbers of the data contents in the line record as a target filtering condition number which is met by the line record.
2. The method of claim 1, wherein the obtaining data content under data items from the row record of the original data comprises:
and determining a data item corresponding to the filtering item from the line record of the original data, and acquiring the data content under the determined data item.
3. The method of claim 1, further comprising:
generating a numbered hash table; and the key of the numbered hash table is the number of the target filtering condition, and the value of the numbered hash table is the statistical number of the target filtering condition.
4. A data filtering device, comprising:
the rule obtaining module is used for obtaining a filtering rule; the filtering rule is composed of a filtering condition number and a filtering condition corresponding to the filtering condition number, and the filtering condition is composed of a filtering item and filtering contents under the filtering item;
the format conversion module is used for performing data format conversion on the filtering rule to obtain a conditional hash table and a full hash table; the key of the conditional hash table is non-empty filtering content, the value of the conditional hash table is a filtering condition number corresponding to the non-empty filtering content, the key of the full hash table is a filtering item corresponding to the empty filtering content, and the value of the full hash table is a filtering condition number corresponding to the empty filtering content;
the number counting module is used for counting the target filtering condition numbers which are met by the row records of the original data based on the conditional hash table and the full hash table;
an output module, configured to output the line record if a statistical number of the target filtering condition numbers is equal to a maximum filtering condition number in the filtering rule;
the number counting module is specifically configured to:
acquiring data content under a data item from the line record of the original data; taking the data content as a first key of the conditional hash table, and searching a first filtering condition number corresponding to the first key in the conditional hash table; taking the data item as a second key of the full-scale hash table, and searching a second filtering condition number corresponding to the second key in the full-scale hash table; determining the first filtering condition number and the second filtering condition number as filtering condition numbers of the data contents; and determining the intersection of the filtering condition numbers of the data contents in the row record as the target filtering condition number which is met by the row record.
5. The apparatus according to claim 4, wherein the numbering statistics module, configured to obtain data content under a data item from the row record of the raw data, is specifically configured to:
and determining a data item corresponding to the filtering item from the line record of the original data, and acquiring the data content under the determined data item.
6. The apparatus of claim 4, further comprising:
the generation module is used for generating a serial number hash table; and the key of the numbered hash table is the number of the target filtering condition, and the value of the numbered hash table is the statistical number of the target filtering condition.
7. A storage medium characterized by comprising a stored program, wherein the program executes the data filtering method of any one of claims 1 to 3.
8. A processor, characterized in that the processor is configured to run a program, wherein the program is configured to execute the data filtering method according to any one of claims 1 to 3 when running.
CN201811082516.XA 2018-09-17 2018-09-17 Data filtering method and device Active CN110909149B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811082516.XA CN110909149B (en) 2018-09-17 2018-09-17 Data filtering method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811082516.XA CN110909149B (en) 2018-09-17 2018-09-17 Data filtering method and device

Publications (2)

Publication Number Publication Date
CN110909149A CN110909149A (en) 2020-03-24
CN110909149B true CN110909149B (en) 2022-06-03

Family

ID=69813106

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811082516.XA Active CN110909149B (en) 2018-09-17 2018-09-17 Data filtering method and device

Country Status (1)

Country Link
CN (1) CN110909149B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976316A (en) * 2010-10-27 2011-02-16 杭州新中大软件股份有限公司 Information access authority control method
CN102857493A (en) * 2012-06-30 2013-01-02 华为技术有限公司 Content filtering method and device
CN104331278A (en) * 2014-10-15 2015-02-04 南京航空航天大学 Instruction filtering method and device for specifications of ARINC661
CN106202235A (en) * 2016-06-28 2016-12-07 微梦创科网络科技(中国)有限公司 A kind of data processing method and device
CN106790170A (en) * 2016-12-29 2017-05-31 杭州迪普科技股份有限公司 A kind of packet filtering method and device
CN107391532A (en) * 2017-04-14 2017-11-24 阿里巴巴集团控股有限公司 The method and apparatus of data filtering
CN108111885A (en) * 2017-12-25 2018-06-01 北京奇艺世纪科技有限公司 A kind of cooperation data determination method, device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2013214801B2 (en) * 2012-02-02 2018-06-21 Visa International Service Association Multi-source, multi-dimensional, cross-entity, multimedia database platform apparatuses, methods and systems
US20140180904A1 (en) * 2012-03-27 2014-06-26 Ip Reservoir, Llc Offload Processing of Data Packets Containing Financial Market Data
US10963810B2 (en) * 2014-06-30 2021-03-30 Amazon Technologies, Inc. Efficient duplicate detection for machine learning data sets
US20160005128A1 (en) * 2014-07-03 2016-01-07 Elsen, Inc. Systems and methods of applying high performance computational techniques to analysis and execution of financial strategies

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976316A (en) * 2010-10-27 2011-02-16 杭州新中大软件股份有限公司 Information access authority control method
CN102857493A (en) * 2012-06-30 2013-01-02 华为技术有限公司 Content filtering method and device
CN104331278A (en) * 2014-10-15 2015-02-04 南京航空航天大学 Instruction filtering method and device for specifications of ARINC661
CN106202235A (en) * 2016-06-28 2016-12-07 微梦创科网络科技(中国)有限公司 A kind of data processing method and device
CN106790170A (en) * 2016-12-29 2017-05-31 杭州迪普科技股份有限公司 A kind of packet filtering method and device
CN107391532A (en) * 2017-04-14 2017-11-24 阿里巴巴集团控股有限公司 The method and apparatus of data filtering
CN108111885A (en) * 2017-12-25 2018-06-01 北京奇艺世纪科技有限公司 A kind of cooperation data determination method, device and electronic equipment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
An Algorithm of Managing the TCP Stream Based on Two-Level Hash Tables;Shichang Xuan 等;《2015 International Conference on Network and Information Systems for Computers》;20160114;90-96 *
On-line popularity monitoring method based on bloom filters and hash tables for differentiated traffic;Guo Zhang 等;《China Communications》;20160907;第13卷;72-86 *
基于关联规则挖掘的网络告警关联;朱秋艳;《中国优秀硕士学位论文全文数据库 信息科技辑》;20081015(第10(2008)期);I138-488 *
基于自动机的XML数据过滤研究;沈洁;《中国博士学位论文全文数据库 信息科技辑》;20120515(第05(2012)期);I138-24 *

Also Published As

Publication number Publication date
CN110909149A (en) 2020-03-24

Similar Documents

Publication Publication Date Title
CN105630972A (en) Data processing method and device
CN111680063B (en) Method and device for paging query data by elastic search
CN106462633B (en) Efficiently storing related sparse data in a search index
CN112214472B (en) Meteorological lattice data storage and query method, device and storage medium
CN106933897B (en) Data query method and device
CN114564620A (en) Graph data storage method and system and computer equipment
CN115905630A (en) Graph database query method, device, equipment and storage medium
CN108241620B (en) Query script generation method and device
US10007692B2 (en) Partition filtering using smart index in memory
CN110909149B (en) Data filtering method and device
CN110083731B (en) Image retrieval method, device, computer equipment and storage medium
CN111125157B (en) Query data processing method and device, storage medium and processor
CN107391533A (en) Generate the method and device of graphic data base Query Result
CN110019544B (en) Data query method and system
CN111159192A (en) Data storage method and device based on big data, storage medium and processor
CN111125087A (en) Data storage method and device
CN107273430B (en) Data storage method and device
CN108241622B (en) Query script generation method and device
US9230022B1 (en) Customizable result sets for application program interfaces
CN111382220A (en) POI data dividing method and device
CN113821514A (en) Data splitting method and device, electronic equipment and readable storage medium
CN110020227B (en) Data sorting method and device
CN110019198B (en) Data query method and device
CN110019507B (en) Data synchronization method and device
CN108073596B (en) Data deletion method and device for OLAP database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant