CN106997394B - A kind of data random ordering arrival processing method and system - Google Patents

A kind of data random ordering arrival processing method and system Download PDF

Info

Publication number
CN106997394B
CN106997394B CN201710236101.2A CN201710236101A CN106997394B CN 106997394 B CN106997394 B CN 106997394B CN 201710236101 A CN201710236101 A CN 201710236101A CN 106997394 B CN106997394 B CN 106997394B
Authority
CN
China
Prior art keywords
data
field
time window
user
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710236101.2A
Other languages
Chinese (zh)
Other versions
CN106997394A (en
Inventor
李广
王纯斌
曹洹太
覃进学
刘旻哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Sefon Software Co Ltd
Original Assignee
Chengdu Sefon Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Sefon Software Co Ltd filed Critical Chengdu Sefon Software Co Ltd
Priority to CN201710236101.2A priority Critical patent/CN106997394B/en
Publication of CN106997394A publication Critical patent/CN106997394A/en
Application granted granted Critical
Publication of CN106997394B publication Critical patent/CN106997394B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

The invention discloses a kind of data random orderings to reach processing method and system.Time window field is processed into date type data the following steps are included: extraction actual time window field by method;Judge that user's flow data whether there is specified time window field, and does and correspondingly handle;Mark the time window field where time slicing field, and from extracting data set before the time window field in Redis repository;Whether the time slicing data of the time window field of judge mark are done and are correspondingly handled in the data set of extraction;User's flow data is stored into Redis repository, updates Redis repository;System includes data processing module, first judgment module, mark module, the second judgment module and Redis storage library module.The present invention solves the problems, such as that the out-of-order of real-time streaming data reaches, and is particularly suitable for solving data source being not the scene serialized, improves the validity and timing of data.

Description

A kind of data random ordering arrival processing method and system
Technical field
The present invention relates to big data analysis processing technology field, specifically a kind of data random ordering reaches processing method and is System.
Background technique
At present under big data industrial background, real-time Flow Technique is a kind of serialized data by batch, orderly, neat, The data processing technique fixedly pushed into analyzer.Since analyzer is stringent to data format requirement, this is directly resulted in greatly In most cases, data format is single, and serializing requires stringent.However, in real-time streaming data source, data are all not often Scene from high degree of sequence, because data random ordering arrival cause data cleansing result with original data result often present it is different It causes, data time sequence is poor, and the quality of data is low.
Summary of the invention
Processing method is reached it is an object of the invention to overcome the deficiencies of the prior art and provide a kind of data random ordering and is System, the present invention are made on Spark processing node by the principle distributed real-time streaming data time slicing in batches in storage Redis Carry out assignment of logical in the process, to solve the problems, such as the out-of-order arrival of real-time streaming data, improve data validity and when Sequence.
The purpose of the present invention is achieved through the following technical solutions: a kind of data random ordering arrival processing method, it is wrapped Include following steps:
S103: actual time window field is extracted, time window field is processed into date type data;
S104: judging that user's flow data whether there is specified time window field,
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has specified time window field, and specified time window field is not in legal window It is interior, then the field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has a specified time window field, and specified time window field is in legal window, Then use time slicing field of the field as this user data;
S105: the time window field where time slicing field obtained in markers step S4, and from Redis repository Data set of the middle extraction before the time window field;
S106: the data set whether the time slicing data of the time window field of judge mark are extracted in step s 5 In,
(1) if the time slicing data are in data set, by the time slicing data and data acquisition system And and the data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata List, and newdata list is added in Redis repository;
S107: storing this user flow data into Redis repository, updates Redis repository.
Further, further comprising the steps of before step S103:
S101: user's flow data is received;
S102: pre-processing the user's flow data received, judges the validity of data, and convert to valid data Data type.
User's flow data includes sensing data, operation system data and server log.
A kind of data random ordering arrival processing system, it includes:
Data processing module, for time window field to be processed into date type data;
First judgment module, for judging user's flow data with the presence or absence of specified time window field;
Mark module, for marking the time window field where time slicing field;
Second judgment module, the time slicing data of the time window field for judge mark whether extraction data It concentrates;
Redis stores library module, is used for storage time fragment data and user's flow data.
Further, the system further includes data reception module, for receiving user's flow data.
Further, the system further includes preprocessing module, for being pre-processed to the user's flow data received, Judge the validity of data, and to valid data change data type.
Further, the system it further include configuration module, be used for configuration data cleaning process.
The beneficial effects of the present invention are:
(1) present invention employs the distributed treatment advantages of Spark cluster environment, and combine in Redis repository NoSQL stores fast feature, can rapidly construct the storage address of data fragmentation, so that out-of-order time slicing be made to be had Effect ground is divided and is effectively distinguished;
(2) the present invention is based on the principle for distributing real-time streaming data time slicing in batches, make depositing on Spark processing node Assignment of logical is carried out during storage Redis, solves the problems, such as that the out-of-order of real-time streaming data reaches, is particularly suitable for solving number It is not the scene serialized according to source, has been obviously improved the validity and timing of data;
(3) present invention solves the bottleneck problem that existing stream data can not be handled the message of random ordering arrival, can To be integrated into existing serialized data processing system, the processing capacity to real-time streaming data is improved;
(4) present invention solves in real-time streaming data source, because the arrival of data random ordering leads to data cleansing result and former data As a result inconsistent problem is presented, after the present invention, can effectively according to time slicing come to random ordering reach data into Capable allocation processing in batches.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is the functional module construction figure of present system.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to It is as described below.
As shown in Figure 1, a kind of data random ordering reaches processing method, it the following steps are included:
S103: actual time window field is extracted, time window field is processed into date type data;
S104: judging that user's flow data whether there is specified time window field,
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has specified time window field, and specified time window field is not in legal window It is interior, then the field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has a specified time window field, and specified time window field is in legal window, Then use time slicing field of the field as this user data;
S105: the time window field where time slicing field obtained in markers step S4, and from Redis repository Data set of the middle extraction before the time window field;
S106: the data set whether the time slicing data of the time window field of judge mark are extracted in step s 5 In,
(1) if the time slicing data are in data set, by the time slicing data and data acquisition system And and the data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata List, and newdata list is added in Redis repository;
S107: storing this user flow data into Redis repository, updates Redis repository.
Further, further comprising the steps of before step S103:
S101: user's flow data is received;
S102: pre-processing the user's flow data received, judges the validity of data, and convert to valid data Data type.
User's flow data includes sensing data, operation system data and server log.
As shown in Fig. 2, a kind of data random ordering reaches processing system, it includes:
Data processing module, for time window field to be processed into date type data;
First judgment module, for judging user's flow data with the presence or absence of specified time window field;
Mark module, for marking the time window field where time slicing field;
Second judgment module, the time slicing data of the time window field for judge mark whether extraction data It concentrates;
Redis stores library module, is used for storage time fragment data and user's flow data.
Further, the system further includes data reception module, for receiving user's flow data.
Further, the system further includes preprocessing module, for being pre-processed to the user's flow data received, Judge the validity of data, and to valid data change data type.
Further, the system it further include configuration module, be used for configuration data cleaning process.
Embodiment:
The present invention is to be stored data in batches according to Redis storage principle, is provided with external interface and is called to look into It askes, currently, a kind of SDC Sream big data real-time streams product has used the present invention to carry out data random ordering arrival processing, passes through reality Border test is used, and effect is ideal.Further detailed description of the present invention technology storage scheme with reference to the accompanying drawing:
As shown in Figure 1, mainly including stream data reception, data conversion, data in storing step flow chart of the present invention Time slice judgement and data are stored to several steps such as Redis.In the present embodiment, data storage logic judgement is to pass through What Java code combination Spark scheduling and Redis storage were completed:
Step 1: receiving stream-oriented data, and it is transferred to out-of-order arrival processing module;
Step 2: stream data is pre-processed, such as tentatively cleans data, judge to confirm the effective of data Property;
Step 3: the value of selection actual time window field setting;
Step 4: doing system logic judgement, judges whether user has and formulate time field;
1. if the not specified time window field of user, system will use current time as effective time slicing word Section;
2. if time window field specified by user, not in legal window, system can not carry out fragment to field Processing then discards the data and record log;
3. using the field as this fluxion if the time window field that user specifies can be used simultaneously effectively According to time slicing data;
Step 5: the time window field that the 4th step obtains is marked, and the time window is taken out from the library Redis The data acquisition system of the pre-treatment of field;
Step 6: this time slicing data are compared with the data acquisition system currently taken out:
1. if there are in data acquisition system, do merging treatment and be re-recorded to Redis for this time slicing data In;
2. if this time slicing data not in data acquisition system, do newdata sheet disposal, and will be newly-built Data form is added in Redis;
Step 7: doing data storage to this flow data fragment according to the assembly function, refreshing Redis again and being stored in;
In Technical Architecture of the invention, big data real-time streaming data has been laid in first, then in a kind of SDC Stream Presentation layer in system utilizes visible process configurator, the Integral cleaning process of configuration data, when data enter total system Process layer preferentially reaches adapter resource, and system removes data resource of taking according to resource content, obtains rear instantiation message parsing Device, by splitting real-time streaming data source, so that corresponding data model is generated, after obtaining the data model that Spark can be recognized, Start to call Spark scheduler, be based on core library coupling unit library, carries out data analysis, handle current flow data;By that will work as Preceding time slicing data carry out logic judgment, take time window field to be compared with Redis fragment memory, by current fluxion According to storage into corresponding Redis memory, result data is exported.
Present invention employs the distributed treatment advantages of Spark cluster environment, and combine in Redis repository NoSQL stores fast feature, can rapidly construct the storage address of data fragmentation, so that out-of-order time slicing be made to be had Effect ground is divided and is effectively distinguished, and based on the principle distributed real-time streaming data time slicing in batches, is made on Spark processing node Assignment of logical is carried out during storing Redis, solves the problems, such as that the out-of-order of real-time streaming data reaches, is particularly suitable for solving Certainly data source is not the scene serialized, has been obviously improved the validity and timing of data.
The present invention solves the bottleneck problem that existing stream data can not be handled the message of random ordering arrival, Ke Yiji At into existing serialized data processing system, the processing capacity to real-time streaming data is improved;It, can be effective after the present invention The data that ground reaches random ordering according to time slicing carry out allocation processing in batches, can to avoid in real-time streaming data source, by It is reached in data random ordering, causes data cleansing result and former data result that inconsistent problem is presented.
It, can be with it will be appreciated by those of skill in the art that in conjunction with the method and step and module of description disclosed herein The combination of computer software and electronic hardware is realized.These functions are implemented in hardware or software actually, are depended on The specific application and design constraint of technical solution.Professional technician can use not Tongfang to each specific application Method realizes described function, but this realization should not exceed the scope of the present invention.
It is apparent to those skilled in the art that for convenience of description and succinctly, the side of foregoing description The specific work process of method, system and module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Disclosed herein method, module and system, may be implemented in other ways.For example, described above Embodiment be only illustrative, it is practical to realize for example, the division of the module, can be only a kind of logical function partition When there may be another division manner, such as multiple module or components can be combined or can be integrated into another system, or Some features can be ignored or not executed.Another point, shown or discussed mutual coupling or direct-coupling or communication Connection is it may be said that through some interfaces, the indirect coupling or communication connection of system or module can be electrical property, mechanical or other Form.
The module that the discrete parts illustrates may or may not be physically separated, and show as module Component may or may not be physical module, it can and it is in one place, or may be distributed over multiple network moulds On block.It can select some or all of the modules therein according to the actual needs to realize the scheme purpose of the present embodiment.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in a processing module It is that modules physically exist alone, can also be integrated in two or more modules in a module.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially right in other words The part of part or the technical solution that the prior art contributes can be embodied in the form of software products, the calculating Machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be individual Computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.And Storage medium above-mentioned includes: USB flash disk, mobile hard disk, system memory (Read-Only Memory, ROM), random access memory The various media that can store program code such as device (Random Access Memory, RAM), magnetic or disk.
The above is only a preferred embodiment of the present invention, it should be understood that the present invention is not limited to described herein Form should not be regarded as an exclusion of other examples, and can be used for other combinations, modifications, and environments, and can be at this In the text contemplated scope, modifications can be made through the above teachings or related fields of technology or knowledge.And those skilled in the art institute into Capable modifications and changes do not depart from the spirit and scope of the present invention, then all should be in the protection scope of appended claims of the present invention It is interior.

Claims (7)

1. a kind of data random ordering reaches processing method, which is characterized in that it the following steps are included:
S103: actual time window field is extracted, time window field is processed into date type data;
S104: judging that user's flow data whether there is specified time window field,
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has a specified time window field, and specified time window field is in legal window, then The field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has specified time window field, and specified time window field then makes in legal window Use the field as the time slicing field of this user's flow data;
S105: the time window field where time slicing field obtained in markers step S4, and mentioned from Redis repository Take the data set before the time window field;
S106: in the data set whether the time slicing data of the time window field of judge mark are extracted in step s 5,
(1) if the time slicing data are in data set, the time slicing data are merged with data set, and Data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata list, And newdata list is added in Redis repository;
S107: storing this user flow data into Redis repository, updates Redis repository.
2. a kind of data random ordering according to claim 1 reaches processing method, it is characterised in that: before step S103, It is further comprising the steps of:
S101: user's flow data is received;
S102: pre-processing the user's flow data received, judges the validity of data, and to valid data change data Type.
3. a kind of data random ordering according to claim 1 reaches processing method, it is characterised in that: user's flow data Including sensing data, operation system data and server log.
4. a kind of data random ordering reaches processing system, which is characterized in that it includes:
Data processing module, for time window field to be processed into date type data;
First judgment module, for judging user's flow data with the presence or absence of specified time window field;
(1) if user's flow data does not have specified time window field, use current time as time slicing field;
(2) if user's flow data has a specified time window field, and specified time window field is in legal window, then The field and record log are abandoned, using log as time slicing field;
(3) if user's flow data has specified time window field, and specified time window field then makes in legal window Use the field as the time slicing field of this user's flow data;
Mark module, for marking the time window field where time slicing field;And it extracts from Redis repository at this Data set before time window field;
Second judgment module, the time slicing data of the time window field for judge mark whether extraction data set In;
(1) if the time slicing data are in data set, the time slicing data are merged with data set, and Data after merging are stored again in Redis repository;
(2) if the time slicing data are not in data set, to the time slicing data newdata list, And newdata list is added in Redis repository;
Redis stores library module, is used for storage time fragment data and user's flow data.
5. a kind of data random ordering according to claim 4 reaches processing system, it is characterised in that: it further includes data receiver Module, for receiving user's flow data.
6. a kind of data random ordering according to claim 4 reaches processing system, it is characterised in that: it further includes pretreatment mould Block judges the validity of data, and to valid data change data class for pre-processing to the user's flow data received Type.
7. a kind of data random ordering according to claim 4 reaches processing system, it is characterised in that: it further includes configuration mould Block is used for configuration data cleaning process.
CN201710236101.2A 2017-04-12 2017-04-12 A kind of data random ordering arrival processing method and system Active CN106997394B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710236101.2A CN106997394B (en) 2017-04-12 2017-04-12 A kind of data random ordering arrival processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710236101.2A CN106997394B (en) 2017-04-12 2017-04-12 A kind of data random ordering arrival processing method and system

Publications (2)

Publication Number Publication Date
CN106997394A CN106997394A (en) 2017-08-01
CN106997394B true CN106997394B (en) 2019-06-14

Family

ID=59433959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710236101.2A Active CN106997394B (en) 2017-04-12 2017-04-12 A kind of data random ordering arrival processing method and system

Country Status (1)

Country Link
CN (1) CN106997394B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019386B (en) * 2017-09-05 2021-01-15 中国移动通信有限公司研究院 Stream data processing method and device
CN108959480B (en) * 2018-06-21 2020-07-14 江苏赛睿信息科技股份有限公司 Method and device for realizing data visualization of stream data
CN110362600B (en) * 2019-07-22 2022-03-11 广西大学 Out-of-order data stream distributed aggregation query method, system and medium
CN112861195A (en) * 2021-03-13 2021-05-28 张曼 Data storage protection system for out-of-order storage and storage method thereof
CN116193511B (en) * 2023-04-21 2023-07-21 广东南方电信规划咨询设计院有限公司 5G data traffic out-of-order processing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156893A (en) * 2011-03-24 2011-08-17 大连海事大学 Cleaning system and method thereof for data acquired by RFID device under network
CN104090889A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method and system for data processing
CN104091276A (en) * 2013-12-10 2014-10-08 深圳市腾讯计算机系统有限公司 Click stream data online analyzing method and related device and system
CN104468722A (en) * 2014-11-10 2015-03-25 四川川大智胜软件股份有限公司 Method for classified storage of training data in navigation management training system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5337447B2 (en) * 2008-10-28 2013-11-06 株式会社日立製作所 Stream data processing method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102156893A (en) * 2011-03-24 2011-08-17 大连海事大学 Cleaning system and method thereof for data acquired by RFID device under network
CN104091276A (en) * 2013-12-10 2014-10-08 深圳市腾讯计算机系统有限公司 Click stream data online analyzing method and related device and system
CN104090889A (en) * 2013-12-12 2014-10-08 深圳市腾讯计算机系统有限公司 Method and system for data processing
CN104468722A (en) * 2014-11-10 2015-03-25 四川川大智胜软件股份有限公司 Method for classified storage of training data in navigation management training system

Also Published As

Publication number Publication date
CN106997394A (en) 2017-08-01

Similar Documents

Publication Publication Date Title
CN106997394B (en) A kind of data random ordering arrival processing method and system
CN108062367B (en) Data list uploading method and terminal thereof
CN107153535B (en) Method and device for operating elastic search
CN106021315A (en) Log management method and system for application program
CN103139157A (en) Network communication method based on socket, device and system
CN104679596A (en) Message processing method and system for improving concurrence performance of server-side
CN111679886A (en) Heterogeneous computing resource scheduling method, system, electronic device and storage medium
CN105704177A (en) UA identification method and device
CN110928681A (en) Data processing method and device, storage medium and electronic device
CN108989365B (en) Information processing method, server, terminal equipment and storage medium
WO2016101446A1 (en) Data analysis method, apparatus, system, and terminal, and server
CN110874301B (en) Method and device for acquiring program pause information
CN109327499B (en) Service interface management method and device, storage medium and terminal
CN110990350A (en) Log analysis method and device
CN112671878B (en) Block chain information subscription method, device, server and storage medium
CN107870921B (en) Log data processing method and device
CN110516220B (en) Report data input method, system and related equipment
CN106294457B (en) Network information pushing method and device
CN108509255A (en) The treating method and apparatus of hardware interrupts
CN109788034B (en) Configuration method for gateway access equipment, electronic equipment and storage medium
CN111859127A (en) Subscription method and device of consumption data and storage medium
CN111046077A (en) Data acquisition method and device, storage medium and terminal
CN105302557A (en) Thread establishing and processing method and apparatus
CN116204428A (en) Test case generation method and device
CN110308901A (en) Handle data variable method, apparatus, equipment and storage medium in front end page

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant