CN106997394B - A kind of data random ordering arrival processing method and system - Google Patents
A kind of data random ordering arrival processing method and system Download PDFInfo
- Publication number
- CN106997394B CN106997394B CN201710236101.2A CN201710236101A CN106997394B CN 106997394 B CN106997394 B CN 106997394B CN 201710236101 A CN201710236101 A CN 201710236101A CN 106997394 B CN106997394 B CN 106997394B
- Authority
- CN
- China
- Prior art keywords
- data
- field
- time window
- user
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Abstract
The invention discloses a kind of data random orderings to reach processing method and system.Time window field is processed into date type data the following steps are included: extraction actual time window field by method;Judge that user's flow data whether there is specified time window field, and does and correspondingly handle;Mark the time window field where time slicing field, and from extracting data set before the time window field in Redis repository;Whether the time slicing data of the time window field of judge mark are done and are correspondingly handled in the data set of extraction;User's flow data is stored into Redis repository, updates Redis repository;System includes data processing module, first judgment module, mark module, the second judgment module and Redis storage library module.The present invention solves the problems, such as that the out-of-order of real-time streaming data reaches, and is particularly suitable for solving data source being not the scene serialized, improves the validity and timing of data.
Description
Technical field
The present invention relates to big data analysis processing technology field, specifically a kind of data random ordering reaches processing method and is
System.
Background technique
At present under big data industrial background, real-time Flow Technique is a kind of serialized data by batch, orderly, neat,
The data processing technique fixedly pushed into analyzer.Since analyzer is stringent to data format requirement, this is directly resulted in greatly
In most cases, data format is single, and serializing requires stringent.However, in real-time streaming data source, data are all not often
Scene from high degree of sequence, because data random ordering arrival cause data cleansing result with original data result often present it is different
It causes, data time sequence is poor, and the quality of data is low.
Summary of the invention
Processing method is reached it is an object of the invention to overcome the deficiencies of the prior art and provide a kind of data random ordering and is
System, the present invention are made on Spark processing node by the principle distributed real-time streaming data time slicing in batches in storage Redis
Carry out assignment of logical in the process, to solve the problems, such as the out-of-order arrival of real-time streaming data, improve data validity and when
Sequence.
The purpose of the present invention is achieved through the following technical solutions: a kind of data random ordering arrival processing method, it is wrapped
Include following steps:
S103: actual time window field is extracted, time window field is processed into date type data;
S104: judging that user's flow data whether there is specified time window field,
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has specified time window field, and specified time window field is not in legal window
It is interior, then the field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has a specified time window field, and specified time window field is in legal window,
Then use time slicing field of the field as this user data;
S105: the time window field where time slicing field obtained in markers step S4, and from Redis repository
Data set of the middle extraction before the time window field;
S106: the data set whether the time slicing data of the time window field of judge mark are extracted in step s 5
In,
(1) if the time slicing data are in data set, by the time slicing data and data acquisition system
And and the data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata
List, and newdata list is added in Redis repository;
S107: storing this user flow data into Redis repository, updates Redis repository.
Further, further comprising the steps of before step S103:
S101: user's flow data is received;
S102: pre-processing the user's flow data received, judges the validity of data, and convert to valid data
Data type.
User's flow data includes sensing data, operation system data and server log.
A kind of data random ordering arrival processing system, it includes:
Data processing module, for time window field to be processed into date type data;
First judgment module, for judging user's flow data with the presence or absence of specified time window field;
Mark module, for marking the time window field where time slicing field;
Second judgment module, the time slicing data of the time window field for judge mark whether extraction data
It concentrates;
Redis stores library module, is used for storage time fragment data and user's flow data.
Further, the system further includes data reception module, for receiving user's flow data.
Further, the system further includes preprocessing module, for being pre-processed to the user's flow data received,
Judge the validity of data, and to valid data change data type.
Further, the system it further include configuration module, be used for configuration data cleaning process.
The beneficial effects of the present invention are:
(1) present invention employs the distributed treatment advantages of Spark cluster environment, and combine in Redis repository
NoSQL stores fast feature, can rapidly construct the storage address of data fragmentation, so that out-of-order time slicing be made to be had
Effect ground is divided and is effectively distinguished;
(2) the present invention is based on the principle for distributing real-time streaming data time slicing in batches, make depositing on Spark processing node
Assignment of logical is carried out during storage Redis, solves the problems, such as that the out-of-order of real-time streaming data reaches, is particularly suitable for solving number
It is not the scene serialized according to source, has been obviously improved the validity and timing of data;
(3) present invention solves the bottleneck problem that existing stream data can not be handled the message of random ordering arrival, can
To be integrated into existing serialized data processing system, the processing capacity to real-time streaming data is improved;
(4) present invention solves in real-time streaming data source, because the arrival of data random ordering leads to data cleansing result and former data
As a result inconsistent problem is presented, after the present invention, can effectively according to time slicing come to random ordering reach data into
Capable allocation processing in batches.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is the functional module construction figure of present system.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to
It is as described below.
As shown in Figure 1, a kind of data random ordering reaches processing method, it the following steps are included:
S103: actual time window field is extracted, time window field is processed into date type data;
S104: judging that user's flow data whether there is specified time window field,
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has specified time window field, and specified time window field is not in legal window
It is interior, then the field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has a specified time window field, and specified time window field is in legal window,
Then use time slicing field of the field as this user data;
S105: the time window field where time slicing field obtained in markers step S4, and from Redis repository
Data set of the middle extraction before the time window field;
S106: the data set whether the time slicing data of the time window field of judge mark are extracted in step s 5
In,
(1) if the time slicing data are in data set, by the time slicing data and data acquisition system
And and the data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata
List, and newdata list is added in Redis repository;
S107: storing this user flow data into Redis repository, updates Redis repository.
Further, further comprising the steps of before step S103:
S101: user's flow data is received;
S102: pre-processing the user's flow data received, judges the validity of data, and convert to valid data
Data type.
User's flow data includes sensing data, operation system data and server log.
As shown in Fig. 2, a kind of data random ordering reaches processing system, it includes:
Data processing module, for time window field to be processed into date type data;
First judgment module, for judging user's flow data with the presence or absence of specified time window field;
Mark module, for marking the time window field where time slicing field;
Second judgment module, the time slicing data of the time window field for judge mark whether extraction data
It concentrates;
Redis stores library module, is used for storage time fragment data and user's flow data.
Further, the system further includes data reception module, for receiving user's flow data.
Further, the system further includes preprocessing module, for being pre-processed to the user's flow data received,
Judge the validity of data, and to valid data change data type.
Further, the system it further include configuration module, be used for configuration data cleaning process.
Embodiment:
The present invention is to be stored data in batches according to Redis storage principle, is provided with external interface and is called to look into
It askes, currently, a kind of SDC Sream big data real-time streams product has used the present invention to carry out data random ordering arrival processing, passes through reality
Border test is used, and effect is ideal.Further detailed description of the present invention technology storage scheme with reference to the accompanying drawing:
As shown in Figure 1, mainly including stream data reception, data conversion, data in storing step flow chart of the present invention
Time slice judgement and data are stored to several steps such as Redis.In the present embodiment, data storage logic judgement is to pass through
What Java code combination Spark scheduling and Redis storage were completed:
Step 1: receiving stream-oriented data, and it is transferred to out-of-order arrival processing module;
Step 2: stream data is pre-processed, such as tentatively cleans data, judge to confirm the effective of data
Property;
Step 3: the value of selection actual time window field setting;
Step 4: doing system logic judgement, judges whether user has and formulate time field;
1. if the not specified time window field of user, system will use current time as effective time slicing word
Section;
2. if time window field specified by user, not in legal window, system can not carry out fragment to field
Processing then discards the data and record log;
3. using the field as this fluxion if the time window field that user specifies can be used simultaneously effectively
According to time slicing data;
Step 5: the time window field that the 4th step obtains is marked, and the time window is taken out from the library Redis
The data acquisition system of the pre-treatment of field;
Step 6: this time slicing data are compared with the data acquisition system currently taken out:
1. if there are in data acquisition system, do merging treatment and be re-recorded to Redis for this time slicing data
In;
2. if this time slicing data not in data acquisition system, do newdata sheet disposal, and will be newly-built
Data form is added in Redis;
Step 7: doing data storage to this flow data fragment according to the assembly function, refreshing Redis again and being stored in;
In Technical Architecture of the invention, big data real-time streaming data has been laid in first, then in a kind of SDC Stream
Presentation layer in system utilizes visible process configurator, the Integral cleaning process of configuration data, when data enter total system
Process layer preferentially reaches adapter resource, and system removes data resource of taking according to resource content, obtains rear instantiation message parsing
Device, by splitting real-time streaming data source, so that corresponding data model is generated, after obtaining the data model that Spark can be recognized,
Start to call Spark scheduler, be based on core library coupling unit library, carries out data analysis, handle current flow data;By that will work as
Preceding time slicing data carry out logic judgment, take time window field to be compared with Redis fragment memory, by current fluxion
According to storage into corresponding Redis memory, result data is exported.
Present invention employs the distributed treatment advantages of Spark cluster environment, and combine in Redis repository
NoSQL stores fast feature, can rapidly construct the storage address of data fragmentation, so that out-of-order time slicing be made to be had
Effect ground is divided and is effectively distinguished, and based on the principle distributed real-time streaming data time slicing in batches, is made on Spark processing node
Assignment of logical is carried out during storing Redis, solves the problems, such as that the out-of-order of real-time streaming data reaches, is particularly suitable for solving
Certainly data source is not the scene serialized, has been obviously improved the validity and timing of data.
The present invention solves the bottleneck problem that existing stream data can not be handled the message of random ordering arrival, Ke Yiji
At into existing serialized data processing system, the processing capacity to real-time streaming data is improved;It, can be effective after the present invention
The data that ground reaches random ordering according to time slicing carry out allocation processing in batches, can to avoid in real-time streaming data source, by
It is reached in data random ordering, causes data cleansing result and former data result that inconsistent problem is presented.
It, can be with it will be appreciated by those of skill in the art that in conjunction with the method and step and module of description disclosed herein
The combination of computer software and electronic hardware is realized.These functions are implemented in hardware or software actually, are depended on
The specific application and design constraint of technical solution.Professional technician can use not Tongfang to each specific application
Method realizes described function, but this realization should not exceed the scope of the present invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, the side of foregoing description
The specific work process of method, system and module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Disclosed herein method, module and system, may be implemented in other ways.For example, described above
Embodiment be only illustrative, it is practical to realize for example, the division of the module, can be only a kind of logical function partition
When there may be another division manner, such as multiple module or components can be combined or can be integrated into another system, or
Some features can be ignored or not executed.Another point, shown or discussed mutual coupling or direct-coupling or communication
Connection is it may be said that through some interfaces, the indirect coupling or communication connection of system or module can be electrical property, mechanical or other
Form.
The module that the discrete parts illustrates may or may not be physically separated, and show as module
Component may or may not be physical module, it can and it is in one place, or may be distributed over multiple network moulds
On block.It can select some or all of the modules therein according to the actual needs to realize the scheme purpose of the present embodiment.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in a processing module
It is that modules physically exist alone, can also be integrated in two or more modules in a module.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially right in other words
The part of part or the technical solution that the prior art contributes can be embodied in the form of software products, the calculating
Machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be individual
Computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And
Storage medium above-mentioned includes: USB flash disk, mobile hard disk, system memory (Read-Only Memory, ROM), random access memory
The various media that can store program code such as device (Random Access Memory, RAM), magnetic or disk.
The above is only a preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein
Form should not be regarded as an exclusion of other examples, and can be used for other combinations, modifications, and environments, and can be at this
In the text contemplated scope, modifications can be made through the above teachings or related fields of technology or knowledge.And those skilled in the art institute into
Capable modifications and changes do not depart from the spirit and scope of the present invention, then all should be in the protection scope of appended claims of the present invention
It is interior.
Claims (7)
1. a kind of data random ordering reaches processing method, which is characterized in that it the following steps are included:
S103: actual time window field is extracted, time window field is processed into date type data;
S104: judging that user's flow data whether there is specified time window field,
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has a specified time window field, and specified time window field is in legal window, then
The field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has specified time window field, and specified time window field then makes in legal window
Use the field as the time slicing field of this user's flow data;
S105: the time window field where time slicing field obtained in markers step S4, and mentioned from Redis repository
Take the data set before the time window field;
S106: in the data set whether the time slicing data of the time window field of judge mark are extracted in step s 5,
(1) if the time slicing data are in data set, the time slicing data are merged with data set, and
Data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata list,
And newdata list is added in Redis repository;
S107: storing this user flow data into Redis repository, updates Redis repository.
2. a kind of data random ordering according to claim 1 reaches processing method, it is characterised in that: before step S103,
It is further comprising the steps of:
S101: user's flow data is received;
S102: pre-processing the user's flow data received, judges the validity of data, and to valid data change data
Type.
3. a kind of data random ordering according to claim 1 reaches processing method, it is characterised in that: user's flow data
Including sensing data, operation system data and server log.
4. a kind of data random ordering reaches processing system, which is characterized in that it includes:
Data processing module, for time window field to be processed into date type data;
First judgment module, for judging user's flow data with the presence or absence of specified time window field;
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has a specified time window field, and specified time window field is in legal window, then
The field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has specified time window field, and specified time window field then makes in legal window
Use the field as the time slicing field of this user's flow data;
Mark module, for marking the time window field where time slicing field;And it extracts from Redis repository at this
Data set before time window field;
Second judgment module, the time slicing data of the time window field for judge mark whether extraction data set
In;
(1) if the time slicing data are in data set, the time slicing data are merged with data set, and
Data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata list,
And newdata list is added in Redis repository;
Redis stores library module, is used for storage time fragment data and user's flow data.
5. a kind of data random ordering according to claim 4 reaches processing system, it is characterised in that: it further includes data receiver
Module, for receiving user's flow data.
6. a kind of data random ordering according to claim 4 reaches processing system, it is characterised in that: it further includes pretreatment mould
Block judges the validity of data, and to valid data change data class for pre-processing to the user's flow data received
Type.
7. a kind of data random ordering according to claim 4 reaches processing system, it is characterised in that: it further includes configuration mould
Block is used for configuration data cleaning process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710236101.2A CN106997394B (en) | 2017-04-12 | 2017-04-12 | A kind of data random ordering arrival processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710236101.2A CN106997394B (en) | 2017-04-12 | 2017-04-12 | A kind of data random ordering arrival processing method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106997394A CN106997394A (en) | 2017-08-01 |
CN106997394B true CN106997394B (en) | 2019-06-14 |
Family
ID=59433959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710236101.2A Active CN106997394B (en) | 2017-04-12 | 2017-04-12 | A kind of data random ordering arrival processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106997394B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019386B (en) * | 2017-09-05 | 2021-01-15 | 中国移动通信有限公司研究院 | Stream data processing method and device |
CN108959480B (en) * | 2018-06-21 | 2020-07-14 | 江苏赛睿信息科技股份有限公司 | Method and device for realizing data visualization of stream data |
CN110362600B (en) * | 2019-07-22 | 2022-03-11 | 广西大学 | Out-of-order data stream distributed aggregation query method, system and medium |
CN112861195A (en) * | 2021-03-13 | 2021-05-28 | 张曼 | Data storage protection system for out-of-order storage and storage method thereof |
CN116193511B (en) * | 2023-04-21 | 2023-07-21 | 广东南方电信规划咨询设计院有限公司 | 5G data traffic out-of-order processing method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156893A (en) * | 2011-03-24 | 2011-08-17 | 大连海事大学 | Cleaning system and method thereof for data acquired by RFID device under network |
CN104090889A (en) * | 2013-12-12 | 2014-10-08 | 深圳市腾讯计算机系统有限公司 | Method and system for data processing |
CN104091276A (en) * | 2013-12-10 | 2014-10-08 | 深圳市腾讯计算机系统有限公司 | Click stream data online analyzing method and related device and system |
CN104468722A (en) * | 2014-11-10 | 2015-03-25 | 四川川大智胜软件股份有限公司 | Method for classified storage of training data in navigation management training system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5337447B2 (en) * | 2008-10-28 | 2013-11-06 | 株式会社日立製作所 | Stream data processing method and system |
-
2017
- 2017-04-12 CN CN201710236101.2A patent/CN106997394B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102156893A (en) * | 2011-03-24 | 2011-08-17 | 大连海事大学 | Cleaning system and method thereof for data acquired by RFID device under network |
CN104091276A (en) * | 2013-12-10 | 2014-10-08 | 深圳市腾讯计算机系统有限公司 | Click stream data online analyzing method and related device and system |
CN104090889A (en) * | 2013-12-12 | 2014-10-08 | 深圳市腾讯计算机系统有限公司 | Method and system for data processing |
CN104468722A (en) * | 2014-11-10 | 2015-03-25 | 四川川大智胜软件股份有限公司 | Method for classified storage of training data in navigation management training system |
Also Published As
Publication number | Publication date |
---|---|
CN106997394A (en) | 2017-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106997394B (en) | A kind of data random ordering arrival processing method and system | |
CN108062367B (en) | Data list uploading method and terminal thereof | |
CN107153535B (en) | Method and device for operating elastic search | |
CN106021315A (en) | Log management method and system for application program | |
CN103139157A (en) | Network communication method based on socket, device and system | |
CN104679596A (en) | Message processing method and system for improving concurrence performance of server-side | |
CN111679886A (en) | Heterogeneous computing resource scheduling method, system, electronic device and storage medium | |
CN105704177A (en) | UA identification method and device | |
CN110928681A (en) | Data processing method and device, storage medium and electronic device | |
CN108989365B (en) | Information processing method, server, terminal equipment and storage medium | |
WO2016101446A1 (en) | Data analysis method, apparatus, system, and terminal, and server | |
CN110874301B (en) | Method and device for acquiring program pause information | |
CN109327499B (en) | Service interface management method and device, storage medium and terminal | |
CN110990350A (en) | Log analysis method and device | |
CN112671878B (en) | Block chain information subscription method, device, server and storage medium | |
CN107870921B (en) | Log data processing method and device | |
CN110516220B (en) | Report data input method, system and related equipment | |
CN106294457B (en) | Network information pushing method and device | |
CN108509255A (en) | The treating method and apparatus of hardware interrupts | |
CN109788034B (en) | Configuration method for gateway access equipment, electronic equipment and storage medium | |
CN111859127A (en) | Subscription method and device of consumption data and storage medium | |
CN111046077A (en) | Data acquisition method and device, storage medium and terminal | |
CN105302557A (en) | Thread establishing and processing method and apparatus | |
CN116204428A (en) | Test case generation method and device | |
CN110308901A (en) | Handle data variable method, apparatus, equipment and storage medium in front end page |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |