CN115062002A - Streaming data processing method and device - Google Patents
Streaming data processing method and device Download PDFInfo
- Publication number
- CN115062002A CN115062002A CN202210524751.8A CN202210524751A CN115062002A CN 115062002 A CN115062002 A CN 115062002A CN 202210524751 A CN202210524751 A CN 202210524751A CN 115062002 A CN115062002 A CN 115062002A
- Authority
- CN
- China
- Prior art keywords
- analysis result
- global
- flow analysis
- target
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 28
- 238000005206 flow analysis Methods 0.000 claims abstract description 82
- 238000012545 processing Methods 0.000 claims abstract description 54
- 238000010219 correlation analysis Methods 0.000 claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 19
- 230000002087 whitening effect Effects 0.000 claims abstract description 10
- 230000004044 response Effects 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims description 20
- 239000013598 vector Substances 0.000 claims description 17
- 238000004590 computer program Methods 0.000 claims description 14
- UPMXNNIRAGDFEH-UHFFFAOYSA-N 3,5-dibromo-4-hydroxybenzonitrile Chemical compound OC1=C(Br)C=C(C#N)C=C1Br UPMXNNIRAGDFEH-UHFFFAOYSA-N 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000012098 association analyses Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 5
- 238000012217 deletion Methods 0.000 description 5
- 230000037430 deletion Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000010223 real-time analysis Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- VYZAMTAEIAYCRO-UHFFFAOYSA-N Chromium Chemical compound [Cr] VYZAMTAEIAYCRO-UHFFFAOYSA-N 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002688 persistence Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the invention provides a streaming data processing method and device. Wherein, the method comprises the following steps: determining corresponding configuration parameters according to the current target log; in response to receiving at least one rule selected by a user, determining a target model according to the at least one rule and configuration parameters, wherein the data processing capacity of the target model is matched with the target log; processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result; and reading the flow analysis result of the preset time period, carrying out global weight judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database. The method and the device realize the judgment, merging, counting, whitening and grouping of real-time streaming data.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a streaming data processing method and apparatus.
Background
The bloom filter is a relatively smart probabilistic data structure (binary vector) in nature, and stores either 0 or 1.
Currently, bloom filter techniques are mostly used for deduplication based on limited data volume, such as: google's distributed database Bigtable uses bloom filters to find non-existent rows or columns; the Google Chrome browser uses a bloom filter to accelerate secure browsing services; the SPIN model detector uses a bloom filter to track reachable state space during large-scale verification; the Venti document storage system also employs a bloom filter to detect previously stored data. Therefore, the problems that the limit of the data storage amount of the bloom filter is limited, the expansibility is insufficient and the like are caused.
Disclosure of Invention
To solve the problems in the prior art, embodiments of the present invention provide a streaming data processing method and apparatus.
Specifically, the embodiment of the invention provides the following technical scheme:
in a first aspect, an embodiment of the present invention provides a streaming data processing method, including: determining corresponding configuration parameters according to the current target log; in response to receiving at least one rule selected by a user, determining a target model according to the at least one rule and the configuration parameters, wherein the data processing capacity of the target model is matched with the size of the target log; processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result; reading a flow analysis result of a preset time period, carrying out global duplication judgment and global statistics on the flow analysis result of the preset time period through a bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database.
Further, the association analysis engine includes a Sabre engine.
Further, the configuration parameters include at least one of: the flow corresponding to the target log, the blacklist corresponding to the target log, the storage address corresponding to the target log, the target field corresponding to the target log and the merging field corresponding to the target log.
Further, before determining the corresponding configuration parameter according to the current target log, the method further includes: at least one initial rule is preset, the at least one initial rule being for selection by a user.
Further, reading a traffic analysis result of a preset time period, performing global weight judgment and global statistics on the traffic analysis result of the preset time period through a bloom filter to obtain a global traffic analysis result, and storing the global traffic analysis result in a database, including: reading a flow analysis result of a preset time period, and performing global weight judgment and global statistics on the flow analysis result of the preset time period through a bloom filter to obtain a global flow analysis result; converting the global flow analysis result into a corresponding binary vector and storing the binary vector in the bloom filter; and storing the global flow analysis result in a database.
Further, the method further comprises: and setting a timed deleting task, clearing the data in the database according to the timed deleting task, and setting the binary vector corresponding to the data in the bloom filter to zero.
Further, the processing the target log through a correlation analysis engine according to the target model includes: and judging, merging, counting, whitening and grouping the target logs through a correlation analysis engine according to the target model.
In a second aspect, an embodiment of the present invention further provides a streaming data processing apparatus, including: the first processing module is used for determining corresponding configuration parameters according to the current target log; the second processing module is used for responding to at least one rule selected by a user, determining a target model according to the at least one rule and the configuration parameters, and the data processing capacity of the target model is matched with the size of the target log; the third processing module is used for processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result; and the fourth processing module is used for reading the flow analysis result of the preset time period, performing global duplication judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in the database.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the streaming data processing method according to the first aspect when executing the program.
In a fourth aspect, the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the streaming data processing method according to the first aspect.
In a fifth aspect, the present invention further provides a computer program product, on which executable instructions are stored, and when executed by a processor, the instructions cause the processor to implement the steps of the streaming data processing method according to the first aspect.
According to the streaming data processing method and device provided by the embodiment of the invention, the streaming data is subjected to real-time calculation and real-time statistics through the correlation analysis engine according to different constructed target models, so that the screening and filtering of the data set are realized. Then, the current streaming data is processed through the bloom filter, so that the data processing capacity of the bloom filter is expanded; the data are efficiently marked through the bloom filter, so that the global statistics and the global judgment on the streaming data in the preset time period are realized; the bloom filter only carries out global weight judgment and global statistics on the flow analysis result to obtain a global flow analysis result, and data corresponding to the global flow analysis result is stored in the database, so that the problem that the data stored by the bloom filter is limited is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow chart of an embodiment of a streaming data processing method of the present invention;
FIG. 2 is a block diagram of a streaming data processing method;
FIG. 3 is a schematic diagram of a business module of a streaming data processing method;
FIG. 4 is a schematic structural diagram of an embodiment of a streaming data processing apparatus according to the present invention;
FIG. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flow chart of an embodiment of a streaming data processing method according to the present invention. As shown in fig. 1, the streaming data processing method according to the embodiment of the present invention includes:
and S101, determining corresponding configuration parameters according to the current target log.
The target log is a flow log, that is, a log for recording flow, and includes the size of the flow recorded in the inlet and outlet, click stream data, and the like, and the flow data is large, and the corresponding flow log is also large.
By way of example, the traffic log may include timestamps, source IP, destination IP, source port, destination port, ingress and egress traffic, and the like. Usually, a stream is integrated into a record and sent to the log server. A flow refers to the same source IP, destination IP and destination port.
As an example, the current target log may be the previous second traffic log.
As an example, if the traffic log only includes a timestamp and an ingress and egress traffic size, and the current target log is a traffic log of a previous second, the configuration parameters corresponding to the current target log may be obtained from the traffic log: destination field, time, ingress and egress traffic size.
S102, responding to at least one rule selected by a user, determining a target model according to the at least one rule and configuration parameters, wherein the data processing capacity of the target model is matched with the size of a target log.
The configuration parameters are dynamically adjusted based on changes in traffic, and the at least one rule is based on a combination of different rules selected by a user, such that the target model is determined based on two dynamic factors, the at least one rule and the configuration parameters.
As an example, the human-computer interaction interface may display names of a plurality of scenes, and the user may select the at least one rule by selecting the at least one scene through the human-computer interaction interface. The human-machine interface may also present the names of the rules and the purpose of the rules as preset (e.g., a rule for judging, merging, counting, whitening, or grouping) for user selection.
As an example, the user may select scene 1 (rule 1 corresponding to scene 1 is used to determine the flow data) and scene 2 (rule 2 corresponding to scene 2 is used to merge the flow data) through the human-computer interaction interface, the configuration parameter may be the size of the incoming and outgoing flow (for example, 10MB) obtained from the flow log, and the target model determined according to rule 1 and rule 2 and the configuration parameter may be a model capable of determining and merging the data with the flow size of 10 MB.
As an example, the target model may be determined based on at least one rule selected by a user, configuration parameters, and authentication information. The authentication information is used for security authentication in the data interaction process.
And determining a target model according to at least one rule and configuration parameters, so that various rules are packaged to construct different models, various data structures are screened out, and different scenes are adapted.
And S103, processing the target log through the correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result.
A correlation analysis engine (e.g., Sabre engine) is a large dataflow-wise distributed correlation analysis engine that can implement real-time computations and real-time statistics.
As an example, according to a current target log, determining a target model corresponding to the current target log, then, performing, by the association analysis engine, re-determination, merging, counting, whitening and grouping on the current target log according to the target model, completing real-time analysis of the pair, and storing a traffic analysis result of the real-time analysis. The storage mode is not limited in the present application.
The correlation analysis engine can be a big data stream type correlation analysis engine capable of real-time calculation and real-time statistics.
And S104, reading the flow analysis result of the preset time period, performing global duplication judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database.
In some examples, the flow analysis result is stored in real time, for example, the flow analysis result is stored every second in real time, and when a certain amount of flow analysis result is stored, the flow analysis result of a preset time period may be read. As an example, if the flow analysis results of the first two hours have been stored and the preset time period is two hours, the flow analysis results of the two hours are read, global re-determination and global statistics (i.e., re-determination and statistical results of the first two hours) are performed on the flow analysis results of the preset time period through the bloom filter to obtain global flow analysis results, and the global flow analysis results are stored in the database. The type of the database and the storage mode of the global flow analysis result are not limited.
Bloom filters, which are essentially binary data structures, are used to determine whether an element (key) is in a set. The bloom algorithm adopted by the bloom filter is a re-judging algorithm based on a binary data set. And realizing global judging and global statistics of the streaming data through the bloom filter.
According to the streaming data processing method and device provided by the embodiment of the invention, the streaming data is subjected to real-time calculation and real-time statistics through the correlation analysis engine according to different constructed target models, so that the screening and filtering of the data set are realized. Then, the current streaming data is processed through the bloom filter, so that the data processing capacity of the bloom filter is expanded; the data are efficiently marked through the bloom filter, so that the global statistics and the global judgment on the streaming data in the preset time period are realized; the bloom filter only carries out global weight judgment and global statistics on the flow analysis result to obtain a global flow analysis result, and data corresponding to the global flow analysis result is stored in the database, so that the problem that the data stored by the bloom filter is limited is solved.
On the basis of the above embodiment, the association analysis engine may be a Sabre engine.
The Sabre engine is a big data stream type correlation analysis engine which can calculate in real time and count in real time.
The streaming data processing method provided by the embodiment of the invention adopts the advantages of real-time calculation and real-time big data statistics of the Sabre engine, and is further convenient for real-time calculation and real-time statistics of the current target log.
On the basis of the above embodiment, the configuration parameter may include at least one of: the flow of the corresponding target log, the blacklist of the corresponding target log, the storage address of the corresponding target log, the target field of the corresponding target log and the merging field of the corresponding target log.
The configuration parameters may be obtained from the current target log. For example, the traffic size, the storage address, and the target field may be directly read in the corresponding target log, and the blacklist may be determined according to the target field after analysis. Different types of target logs correspond to different blacklists. And acquiring information according to the target field in the target log. And integrating the acquired information according to the merging field. Likewise, the configuration parameters may also include a white list of corresponding target logs. The configuration parameters of the cyclic regulation and control data processing are realized, and the target models with different data volumes are adapted.
On the basis of the foregoing embodiment, before determining the corresponding configuration parameter according to the current target log, the method may further include: at least one initial rule is preset, the at least one initial rule being for selection by a user.
The preset at least one initial rule may be a predicate, a merge, a count, a whitening or a grouping for different types of target logs. Based on different initial rules selected by the user, new rules can be combined, thereby dynamically extending the selectable rules continuously.
On the basis of the foregoing embodiment, reading a traffic analysis result in a preset time period, performing global duplication judgment and global statistics on the traffic analysis result in the preset time period through a bloom filter to obtain a global traffic analysis result, and storing the global traffic analysis result in a database, where the method may include: reading a flow analysis result of a preset time period, and performing global weight judgment and global statistics on the flow analysis result of the preset time period through a bloom filter to obtain a global flow analysis result; converting the global flow analysis result into a corresponding binary vector and storing the binary vector in a bloom filter; and storing the global flow analysis result in a database.
The binary vectors stored in the bloom filter correspond to the global traffic analysis results stored in the database.
On the basis of the above embodiment, the method may further include: and setting a timed deleting task, clearing data in the database according to the timed deleting task, and setting the binary vector of the corresponding data in the bloom filter to zero.
The timed deletion task may be a deletion task set according to time. For example, the bloom filter and the database are configured to store data for only one week, and if data for the next week needs to be stored, the data with the earliest storage time needs to be deleted.
The timed deletion task can also set the maximum quantity of the data stored in the bloom filter and the database, and if the bloom filter and the database reach the storage limit, the data with the longest storage time can be deleted correspondingly when new data needs to be stored.
The delete (reset) function of the bloom filter is achieved by zeroing out the binary vector of the corresponding data in the bloom filter.
On the basis of the above embodiment, processing the target log by a correlation analysis engine according to the target model includes: and judging, merging, counting, whitening and grouping the target logs through a correlation analysis engine according to the target model.
The repeated judgment may be to judge whether the data in the log is repeated, and as an example, the contents of the target fields in the log are compared to eliminate the repeated contents.
Merging may be to integrate the eligible data in the log.
The count may be a count of how many data segments are present in the log, or a sum of duplicate data in the log, or the like. The data for a given feature may be counted as desired.
The whitening may be to directly filter out data in the white list according to a preset white list, or to retain data in the black list according to a preset black list. The log traffic may also be screened according to the white list and the black list, which is not limited in the present invention. The contents of the white list and the black list can be set according to specific needs.
And grouping, namely grouping the log traffic data according to the target field in the log. For example, the target field is id, and the field of the log traffic data includes id and data content corresponding to id. The target fields are all even-numbered ids, all even-numbered fields can be grouped together, and all odd-numbered fields can be grouped together.
As shown in fig. 2, according to the framework diagram of the solution of the foregoing embodiment, the bloom filter is used in combination with a flow analysis engine (sabre engine), so as to implement the functions of filtering flow (traffic) data, performing global re-determination, and performing global statistics. The system initializes a number of (configurable) rules for assembly into a data source for other rules. The configuration parameters are automatically adjusted depending on the change of the streaming data volume and are synchronized into the model at regular time, and the model is updated along with the change of the rules and the configuration. Aiming at the change of the streaming data volume, the model can be automatically adjusted, and the corresponding data processing capacity and the corresponding storage capacity can be correspondingly adjusted until the accessed data volume and the processing capacity reach balance.
The design of the service module according to the solution of the above embodiment may refer to fig. 3. The content of the Request in fig. 3 may be a subject attribute (e.g., Request format), a guest resource attribute (e.g., id of the selected rule), a rule name, and start-stop information of the rule. The contents of Response may be allowed, denied and error reported.
1. The system presets the rule supported by default, and the front end can directly select different scenes for combination;
2. different scenes are selected by a user to combine different rules, and the sabre engine receives the rules to construct different data processing models;
3. the engine processes data (judging weight, merging, counting, whitening and grouping) according to the constructed model, and different models output different data sets;
4. calculating the output data into corresponding binary vectors according to a bloom algorithm, and storing the binary vectors into a bloom filter;
5. the warehousing object stores the filtered and repeated data in a database for persistence and synchronously records the data in a bloom filter for marking;
6. the service module queries a database to obtain a latest data set;
7. the timing task object synchronously cleans the database and deletes the data in the bloom filter (set to zero) according to the data amount and the storage time stored in the database;
8. the change of the flow data volume is monitored, the corresponding configuration parameters are adaptively adjusted, the sabre engine applies the latest model and the latest configuration parameters to dynamically regulate and control the output data of data processing, and the balance between the flow data volume and the data processing capacity is achieved.
In conclusion, based on the preset rules, a plurality of data processing models are constructed, so that a more efficient and convenient means is provided, and the data required by the streaming data are screened out to adapt to different scenes. And aiming at the change of the streaming data volume, the model is automatically adjusted, so that the effect of quickly processing the streaming data is achieved.
Fig. 4 is a schematic structural diagram of a streaming data processing apparatus according to an embodiment of the present invention. As shown in fig. 4, the streaming data processing apparatus includes:
a first processing module 401, configured to determine a corresponding configuration parameter according to a current target log;
a second processing module 402, configured to, in response to receiving at least one rule selected by a user, determine a target model according to the at least one rule and a configuration parameter, where a data processing capability of the target model matches a size of a target log;
a third processing module 403, configured to process the target log through the association analysis engine according to the target model, to obtain and store a corresponding traffic analysis result;
the fourth processing module 404 is configured to read a traffic analysis result of a preset time period, perform global duplication judgment and global statistics on the traffic analysis result of the preset time period through the bloom filter to obtain a global traffic analysis result, and store the global traffic analysis result in the database.
Optionally, the association analysis engine comprises a Sabre engine.
Optionally, the configuration parameters include at least one of: the flow of the corresponding target log, the blacklist of the corresponding target log, the storage address of the corresponding target log, the target field of the corresponding target log and the merging field of the corresponding target log.
Optionally, the apparatus further comprises:
a fifth processing module 405, configured to preset at least one initial rule, where the at least one initial rule is used to be selected by a user.
Optionally, the fourth processing module 404 is configured to:
reading a flow analysis result of a preset time period, and performing global weight judgment and global statistics on the flow analysis result of the preset time period through a bloom filter to obtain a global flow analysis result;
converting the global flow analysis result into a corresponding binary vector and storing the binary vector in a bloom filter;
and storing the global flow analysis result in a database.
Optionally, the apparatus further comprises:
and a sixth processing module 406, configured to set a timed deletion task, clean data in the database according to the timed deletion task, and set a binary vector of corresponding data in the bloom filter to zero.
Optionally, the third processing module 403 is further configured to: and judging, merging, counting, whitening and grouping the target logs through a correlation analysis engine according to the target model.
An example is as follows:
fig. 5 illustrates a physical structure diagram of an electronic device, and as shown in fig. 5, the electronic device may include: a processor (processor)501, a communication Interface (Communications Interface)502, a memory (memory)503 and a communication bus 504, wherein the processor 501, the communication Interface 502 and the memory 503 are communicated with each other through the communication bus 504. The processor 501 may call logic instructions in the memory 503 to perform the following method: determining corresponding configuration parameters according to the current target log; in response to receiving at least one rule selected by a user, determining a target model according to the at least one rule and configuration parameters, wherein the data processing capacity of the target model is matched with the size of a target log; processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result; and reading the flow analysis result of the preset time period, carrying out global weight judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database.
In addition, the logic instructions in the memory 503 may be implemented in the form of software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, and the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the streaming data processing method provided in the foregoing embodiments, for example, including: determining corresponding configuration parameters according to the current target log; in response to receiving at least one rule selected by a user, determining a target model according to the at least one rule and configuration parameters, wherein the data processing capacity of the target model is matched with the size of a target log; processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result; and reading the flow analysis result of the preset time period, carrying out global weight judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database.
In yet another aspect, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to execute the streaming data processing method provided in the foregoing embodiments, for example, the computer program includes: determining corresponding configuration parameters according to the current target log; in response to receiving at least one rule selected by a user, determining a target model according to the at least one rule and configuration parameters, wherein the data processing capacity of the target model is matched with the size of a target log; processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result; and reading the flow analysis result of the preset time period, carrying out global weight judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods of the various embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (11)
1. A method of streaming data processing, the method comprising:
determining corresponding configuration parameters according to the current target log;
in response to receiving at least one rule selected by a user, determining a target model according to the at least one rule and the configuration parameters, wherein the data processing capacity of the target model is matched with the size of the target log;
processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result;
reading a flow analysis result of a preset time period, carrying out global duplication judgment and global statistics on the flow analysis result of the preset time period through a bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in a database.
2. The streaming data processing method of claim 1, wherein the correlation analysis engine comprises a Sabre engine.
3. The streaming data processing method according to any of claims 1 to 2, wherein the configuration parameters include at least one of: the flow corresponding to the target log, the blacklist corresponding to the target log, the storage address corresponding to the target log, the target field corresponding to the target log and the merging field corresponding to the target log.
4. The streaming data processing method according to any one of claims 1 to 2, wherein before determining the corresponding configuration parameter according to the current target log, the method further comprises:
at least one initial rule is preset, the at least one initial rule being for selection by a user.
5. The streaming data processing method according to any one of claims 1 to 2, wherein the reading of the traffic analysis result in the preset time period, performing global re-judgment and global statistics on the traffic analysis result in the preset time period through a bloom filter to obtain a global traffic analysis result, and storing the global traffic analysis result in a database includes:
reading a flow analysis result of a preset time period, and performing global weight judgment and global statistics on the flow analysis result of the preset time period through a bloom filter to obtain a global flow analysis result;
converting the global flow analysis result into a corresponding binary vector and storing the binary vector in the bloom filter;
and storing the global flow analysis result in a database.
6. The streaming data processing method of claim 5, wherein the method further comprises:
and setting a timed deleting task, clearing the data in the database according to the timed deleting task, and setting the binary vector corresponding to the data in the bloom filter to zero.
7. The streaming data processing method of claim 1, wherein the processing the target log by a correlation analysis engine according to the target model comprises:
and judging, merging, counting, whitening and grouping the target logs through a correlation analysis engine according to the target model.
8. A streaming data processing apparatus, characterized in that the method comprises:
the first processing module is used for determining corresponding configuration parameters according to the current target log;
the second processing module is used for responding to at least one rule selected by a user, determining a target model according to the at least one rule and the configuration parameters, and the data processing capacity of the target model is matched with the size of the target log;
the third processing module is used for processing the target log through a correlation analysis engine according to the target model to obtain and store a corresponding flow analysis result;
and the fourth processing module is used for reading the flow analysis result of the preset time period, performing global duplication judgment and global statistics on the flow analysis result of the preset time period through the bloom filter to obtain a global flow analysis result, and storing the global flow analysis result in the database.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the streaming data processing method according to any of claims 1 to 6 are implemented when the processor executes the program.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the streaming data processing method according to any one of claims 1 to 6.
11. A computer program product having stored thereon executable instructions, characterized in that the instructions, when executed by a processor, cause the processor to carry out the steps of the streaming data processing method according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210524751.8A CN115062002A (en) | 2022-05-13 | 2022-05-13 | Streaming data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210524751.8A CN115062002A (en) | 2022-05-13 | 2022-05-13 | Streaming data processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115062002A true CN115062002A (en) | 2022-09-16 |
Family
ID=83198879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210524751.8A Pending CN115062002A (en) | 2022-05-13 | 2022-05-13 | Streaming data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115062002A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116016361A (en) * | 2022-12-12 | 2023-04-25 | 深圳依时货拉拉科技有限公司 | A/B experiment shunting method and device, storage medium and computer equipment |
-
2022
- 2022-05-13 CN CN202210524751.8A patent/CN115062002A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116016361A (en) * | 2022-12-12 | 2023-04-25 | 深圳依时货拉拉科技有限公司 | A/B experiment shunting method and device, storage medium and computer equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106570465B (en) | A kind of people flow rate statistical method and device based on image recognition | |
CN104091276B (en) | The method of on-line analysis clickstream data and relevant apparatus and system | |
CN110839016A (en) | Abnormal flow monitoring method, device, equipment and storage medium | |
CN104731816A (en) | Method and device for processing abnormal business data | |
CN113765881A (en) | Method and device for detecting abnormal network security behavior, electronic equipment and storage medium | |
CN107832333B (en) | Method and system for constructing user network data fingerprint based on distributed processing and DPI data | |
CN115062002A (en) | Streaming data processing method and device | |
CN108876644B (en) | Similar account calculation method and device based on social network | |
CN111740868A (en) | Alarm data processing method and device and storage medium | |
CN112256734A (en) | Big data processing method, device, system, equipment and storage medium | |
CN110677269B (en) | Method and device for determining communication user relationship and computer readable storage medium | |
CN109344243A (en) | A kind of real-time stream calculation alarm analysis method and system | |
CN104778177A (en) | Data processing method and device | |
CN108923967B (en) | Duplication-removing flow recording method, duplication-removing flow recording device, server and storage medium | |
CN110019152A (en) | A kind of big data cleaning method | |
CN106155913A (en) | The method and apparatus that cache hit rate is analyzed | |
CN108228752B (en) | Data total export method, data export task allocation device and data export node device | |
CN113572721B (en) | Abnormal access detection method and device, electronic equipment and storage medium | |
CN108351940B (en) | System and method for high frequency heuristic data acquisition and analysis of information security events | |
CN115952398B (en) | Traditional calculation method, system and storage medium based on data of Internet of things | |
CN110324588B (en) | Video analysis warning event information storage method based on dictionary structure | |
WO2023071367A1 (en) | Processing method and apparatus for communication service data, and computer storage medium | |
CN115994830A (en) | Method for constructing fetch model, method for collecting data and related device | |
CN115269519A (en) | Log detection method and device and electronic equipment | |
CN115499514A (en) | Data storage service access method, computing device and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |