CN113535643A - Data processing method and device and server - Google Patents

Data processing method and device and server Download PDF

Info

Publication number
CN113535643A
CN113535643A CN202110823981.XA CN202110823981A CN113535643A CN 113535643 A CN113535643 A CN 113535643A CN 202110823981 A CN202110823981 A CN 202110823981A CN 113535643 A CN113535643 A CN 113535643A
Authority
CN
China
Prior art keywords
data
processing
delay
time
delay data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110823981.XA
Other languages
Chinese (zh)
Inventor
李虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110823981.XA priority Critical patent/CN113535643A/en
Publication of CN113535643A publication Critical patent/CN113535643A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention provides a data processing method, a data processing device and a server, wherein delay data are determined from acquired data based on time information of the acquired data; then processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data; and further performing fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data, so as to obtain the final processing result corresponding to the delay data and the normal data. According to the invention, the intermediate result is obtained by processing the delay data, and then the intermediate result and the processing result obtained based on the normal data are subjected to fusion processing to obtain the final processing result, so that the data processing efficiency is improved, and the computing resources are saved.

Description

Data processing method and device and server
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, an apparatus, and a server.
Background
In the related art, due to abnormal acquisition of the edge node or abnormal jitter of the communication network, a part of data is sent to the data center in a delayed manner. The data center generally processes data normally sent by the edge node and delayed data as overall data, and calculates a corresponding service index based on a preset service requirement. The method can repeatedly calculate the normally sent service data with larger quantity, thereby wasting calculation resources and having higher cost.
Disclosure of Invention
In view of the above, the present invention provides a data processing method, an apparatus and a server, so as to avoid repeated calculation of normal data, improve data processing efficiency and reduce processing cost.
In a first aspect, an embodiment of the present invention provides a data processing method, including: determining delay data from the acquired data based on time information of the acquired data; wherein delaying the data comprises: after the data is generated, the data reaches the current equipment after a specified time period is exceeded; processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data; performing fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data to obtain a final processing result corresponding to the delay data and the normal data; wherein, the normal data comprises: after data is generated, the data arrives at the current device within a specified time period.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the time information includes a generation time of the data and a time when the data arrives at the current device; the step of determining delay data from the acquired data based on the time information of the acquired data includes: judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is greater than a specified time period; and if the time period is greater than the specified time period, determining the acquired data as delay data.
With reference to the first possible implementation manner of the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where time information of the acquired data is determined by: receiving a data log sent by preset edge node equipment; recording the arrival time of the data log to the current equipment; analyzing the data log to obtain the generation time of the data in the data log; and determining the arrival time of the data log as the time of the data in the data log to the current device.
With reference to the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the time information includes a generation time of the data and a time when the data arrives at the current device; the processing result of the normal data corresponding to the delay data is obtained by the following mode: judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is less than or equal to a specified time period; if the time is less than or equal to the specified time period, determining the acquired data as normal data; and processing the normal data based on a preset data processing mode to obtain a processing result corresponding to the normal data.
With reference to the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the time information includes a generation time of data; before the step of performing fusion processing on the intermediate result corresponding to the delayed data and the processing result of the normal data corresponding to the delayed data to obtain the final processing result corresponding to the delayed data and the normal data, the method further includes: and determining normal data which has the same data type with the delay data and is in the same time period with the generation time of the delay data as the normal data corresponding to the delay data.
With reference to the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where before the step of performing fusion processing on the intermediate result corresponding to the delayed data and the processing result of the normal data corresponding to the delayed data to obtain the final processing result corresponding to the delayed data and the normal data, the method further includes: if new delay data is received, the intermediate result corresponding to the delay data is updated based on the new delay data.
With reference to the fifth possible implementation manner of the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where the step of updating the intermediate result corresponding to the delay data based on the new delay data includes: updating the new delay data into the delay data to obtain updated delay data; and processing the updated delay data based on a preset data processing mode to obtain an intermediate result corresponding to the updated delay data.
With reference to the first aspect, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where the data processing manner includes multiple types; each data processing mode corresponds to a service requirement; based on a preset data processing mode, processing the delay data to obtain an intermediate result corresponding to the delay data, wherein the step comprises the following steps: and processing the delay data based on a data processing mode corresponding to the service requirement according to different service requirements to obtain an intermediate result corresponding to the delay data.
With reference to the first aspect, an embodiment of the present invention provides an eighth possible implementation manner of the first aspect, where the obtained data includes multiple pieces of information; the data processing mode comprises the steps of specifying information summation; based on a preset data processing mode, processing the delay data to obtain an intermediate result corresponding to the delay data, wherein the step comprises the following steps: extracting the specified information from the plurality of information in the delay data; summing the extracted specified information to obtain a summing result; and determining the summation result as an intermediate result corresponding to the delay data.
In a second aspect, an embodiment of the present invention further provides a data processing apparatus, including: the delay data determining module is used for determining delay data from the acquired data based on the time information of the acquired data; wherein delaying the data comprises: after the data is generated, the data reaches the current equipment after a specified time period is exceeded; the delay data processing module is used for processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data; the data fusion module is used for fusing an intermediate result corresponding to the delay data and a processing result of normal data corresponding to the delay data to obtain a final processing result corresponding to the delay data and the normal data; wherein, the normal data comprises: after data is generated, the data arrives at the current device within a specified time period.
In a third aspect, an embodiment of the present invention further provides a server, including a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to implement the data processing method.
In a fourth aspect, embodiments of the present invention also provide a machine-readable storage medium storing machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the data processing method described above.
According to the data processing method, the data processing device and the server, firstly, delay data are determined from the acquired data based on the time information of the acquired data; then processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data; and further performing fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data, so as to obtain the final processing result corresponding to the delay data and the normal data. According to the method, the intermediate result is obtained by processing the delay data, and then the intermediate result and the processing result obtained based on the normal data are subjected to fusion processing to obtain the final processing result, so that the data processing efficiency is improved, and the computing resources are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is an application scenario of a data processing method according to an embodiment of the present invention;
fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 3 is a flow chart of another data processing method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the related art, after receiving data sent by an edge node, a data center calculates a service index corresponding to the data based on a preset service requirement. For example, after a user logs in a set website at a certain time through a personal computer, the personal computer sends an access record to a data center, and the data center counts the access amount of the website in a certain time period. For ease of understanding, fig. 1 illustrates the application scenario, which includes a data center and a plurality of edge nodes; the data center is connected with a plurality of edge nodes to form a Content Delivery Network (CDN). The data center can also be called a center server, and the edge node can also be called an edge node device; the edge node device may be a personal computer, an industrial device, or the like.
However, due to the reasons of abnormal acquisition of the edge node or abnormal jitter of the communication network, a small amount of data may be sent to the data center with a delay, for example, an access record generated when the user accesses the website 10 o 'clock in the morning may be sent to the data center along with an access record generated at 12 o' clock for the reasons described above. For this situation, after obtaining the delayed data, the data center generally processes the data normally sent by the edge node and the delayed data as the whole data, and calculates a corresponding service index. The method can repeatedly calculate the normally sent data with larger quantity, thereby wasting calculation resources and having higher cost.
Based on this, the data processing method, the data processing device and the server provided by the embodiment of the invention can be used in the data processing process of various service systems, such as a sales service system, a website service system and the like.
To facilitate understanding of the embodiment, a detailed description will be given to a data processing method disclosed in the embodiment of the present invention.
Referring first to the flow chart of a data processing method shown in fig. 2; the method comprises the following steps:
step S200, determining delay data from the acquired data based on the time information of the acquired data; wherein delaying the data comprises: after data is generated, data that arrives at the current device beyond a specified time period.
The data can be sent to the data center by the edge node, and the current equipment can refer to the data center. The time information of the data may include a generation time of the data and a time when the data center receives the data. In general, in order to make the data center know the working state or the service condition of the edge node in real time, the edge node may send data to the data center at a set frequency, or the edge node may send data to the data center at a predetermined time.
The specified time period may be a time period in which the time of sending data twice adjacent to the edge node is an end point, or a time period in which the time of receiving data twice adjacent to the data center is an end point, or only the length of the specified time period may be limited, and the end point of the specified time period is not limited; the time length of the specified time period can be equal to the time difference between two adjacent data transmissions of the edge node. Due to the fact that the transmission speed of the network data is high, the time when the data center receives the data can be approximately used as the time when the edge node sends the data. Data whose difference between the arrival time of the data at the data center and the generation time thereof is larger than the above-described specified time period may be regarded as delay data.
Normally, after data is generated in an edge node, the data is sent to the data center at a specified time period after it is generated. For example, suppose that after the edge node sends data to the data center at 10 points, the edge node sends data to the data center again at 10 points and 30 points; the data generated in the edge node between 10 o ' clock and 30 o ' clock should be transmitted to the data center at 10 o ' clock and 30 o ' clock, and the specified time period of the data received by the data center this time should be 10 o ' clock to 10 o ' clock and 30 o ' clock. However, due to abnormal transmission of data or network problems, the data received by the data center may include data generated in 9 o 'clock and 30 o' clock, which arrives at the data center beyond a specified time period, and may be determined as delayed data.
In some cases, a maximum delay time threshold is also set, that is, when data reaches the current device after a specified time period is exceeded, but the time difference between the arrival time and the generation time exceeds the maximum delay time threshold, the data can be discarded without being subjected to correlation processing. For example, if it is desired to count a website visit record from 10 am to 12 am on a certain day, but several website visit records of the time period are received after two weeks, if the maximum delay time threshold is one week, the difference between the arrival time and the generation time of the data exceeds the maximum delay time threshold, and the data can be discarded.
Step S202, processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data.
The data processing method is usually preset in the data center by the relevant personnel, and is related to the received data type and the information of the edge node which the relevant personnel want to know. For example, when the edge node is a personal computer, it is desirable to know the traffic usage of the personal computer in a certain time period in a certain area, such as the total traffic, the average traffic of the personal computer, etc., in real time; at this time, the data received by the data center is the flow data of a plurality of personal computers in the time period in the area. For example, when the content that the data center wants to know is a certain mobile phone sales situation, the received data may be mobile phone sales records of various retail outlets, and the like.
In the above case, the data processing method may be summation processing, where first, the traffic in the received traffic data sent normally in the time period is summed to obtain a preliminary total traffic result; when the delay data is received, the traffic in the delay data may also be summed, and the summed result is used as an intermediate result corresponding to the delay data. The data processing method may also be averaging, and after receiving a plurality of pieces of delay data, the flow in the delay data may be averaged, and the average result and the number of the pieces of delay data are used as the intermediate result corresponding to the delay data. The data processing method may be various, and the same delay data may be processed for a plurality of times based on different data processing methods, to obtain a plurality of intermediate results corresponding to the delay data corresponding to the data processing method.
After the delay data is received again, the delay data received this time and the delay data received before can be processed together according to the preset processing mode, and intermediate results corresponding to all the delay data are generated. The delay data received this time can also be processed according to a preset processing mode to obtain an intermediate result corresponding to the delay data this time, and then the intermediate result corresponding to the delay data this time and the intermediate result corresponding to the delay data obtained by the previous processing are fused to obtain intermediate results corresponding to all the delay data. Both of the above two methods are feasible, and in consideration of the simplicity of the data processing flow, the first processing method is usually adopted when the delay data is received again.
Step S204, carrying out fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data to obtain the final processing result corresponding to the delay data and the normal data; wherein, the normal data comprises: after data is generated, the data arrives at the current device within a specified time period.
The normal data generally refers to data that is of the same data type as the delayed data and is transmitted to the current device in time, i.e., data that arrives at the current device within a specified time period. The same data type can refer to data for the same service requirement. For example, when the delay data is traffic data of the edge node device, the normal data is also traffic data of the edge node device, and the generation time of the delay data and the generation time of the normal data are both within a time range (i.e., the same service requirement) corresponding to the traffic usage situation that the data center wants to know.
In order to simplify the process of the fusion processing, the processing result of the normal data is usually obtained by processing the normal data in a preset data processing manner. The fusion processing mode corresponds to a preset data processing mode, for example, when the preset data processing mode is summation, the processing result corresponding to the normal data is the summation result of the normal data, and the intermediate result corresponding to the delayed data is the summation result of the delayed data, and the fusion process is the summation process of the processing result corresponding to the normal data and the intermediate result of the delayed data, so as to obtain the final processing result corresponding to the delayed data and the normal data.
The data processing method comprises the steps of firstly, determining delay data from acquired data based on time information of the acquired data; then processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data; and further performing fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data, so as to obtain the final processing result corresponding to the delay data and the normal data. According to the method, the intermediate result is obtained by processing the delay data, and then the intermediate result and the processing result obtained based on the normal data are subjected to fusion processing to obtain the final processing result, so that the data processing efficiency is improved, and the computing resources are saved.
The embodiment of the invention also provides another data processing method which is realized on the basis of the method in the embodiment; the method mainly describes a specific implementation process for determining the time information of the acquired data (see steps S300-S306); when the time information includes the generation time of the data and the time when the data reaches the current device, determining normal data and delayed data from the acquired data and a specific implementation process for processing the normal data and the delayed data based on the time information of the acquired data (see steps S308-S316), and determining a specific implementation process for the normal data corresponding to the delayed data (see step S318); as shown in fig. 3, the method comprises the steps of:
step S300, receiving a data log sent by a preset edge node device.
The data log may be a log file. A log file typically includes multiple logs. Each piece of data is typically kept as a log in the form of a string. When sending data to the data center, the edge node device typically sends the data in the form of a log file.
Step S302, recording the arrival time of the data log to the current device. Wherein the arrival time can be recorded in the form of a timestamp; a time stamp (time stamp), which is typically a sequence of characters, uniquely identifies a time of a certain moment to unify the time of edge node devices and data centers.
Step S304, analyzing the data log to obtain the generation time of the data in the data log. When a piece of data is generated, the generation time of the piece of data is also a part of the data, and other information and the generation time of the piece of data are used for generating a log based on a preset algorithm, wherein the log is usually represented by a string of character strings. The log parsing process is equivalent to the inverse of the log generation process. The log is analyzed to obtain the generation time and other information of the data stored in the log, for example, when the data is traffic data, the usage time and usage amount of the traffic can be obtained, and information such as traffic flow can be included. The generation information of the data may also be saved in the form of a time stamp in the log.
Step S306, determining the arrival time of the data log as the time of the data in the data log reaching the current device. Since the data is already saved in the data log in the form of a log, the arrival time of the data log at the current device is the same as the arrival time of the data at the current device.
Step S308, judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is greater than a specified time period; if the time period is less than or equal to the designated time period, executing step S310; if it is greater than the specified time period, step S314 is performed.
In a specific implementation, when the specified time period includes the time node, the time length of the specified time period may be obtained based on the time node. And comparing the time difference between the time when the calculated data reaches the current equipment and the generation time with the time length of the specified time period. When the specified time period does not include the time node, but only includes the time length, the time difference between the time when the calculated data reaches the current device and the generation time can be directly compared with the specified time period, so as to judge whether the data reaches the current device in time.
In step S310, it is determined that the acquired data is normal data. Specifically, when the acquired data reaches the current device in a specified time period, the data is normal data.
Step S312, based on the preset data processing mode, processing the normal data to obtain a processing result corresponding to the normal data, and then executing step S300. Although normal data may not include all data that needs to be processed, most of the data is normally sent to the current device, and in order to process the data in time, the received normal data may be processed based on a preset data processing manner to obtain a processing result, which is a preliminary result of a service requirement corresponding to the data processing manner. And then storing the processing result to a preset processing result storage position of normal data. If there is a need, the data can also be presented to the relevant personnel as a preliminary result of the corresponding business need.
In step S314, the acquired data is determined to be delay data. Specifically, when the acquired data arrives at the current device later than a specified time period, the data is delay data.
And step S316, processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data.
In a specific implementation, the data processing manner may include multiple manners; each data processing mode corresponds to a service requirement; for example, for mobile phone sales data, the service demand may be to count sales of a certain brand of mobile phone, to count sales of all brands of mobile phones, to count sales of a certain brand of mobile phones in a certain area, and so on. In specific implementation, the delayed data can be processed based on the data processing mode corresponding to the service requirement according to different service requirements, so as to obtain an intermediate result corresponding to the delayed data.
The obtained data usually includes a plurality of information, for example, the mobile phone sales data may include brands, sales numbers, sales areas, time of sellers, and the like; when the service demand is the sales amount of a certain brand of mobile phone, the corresponding data processing mode can sum the specified information; at this time, the above step S316 may be implemented by:
(1) the specifying information is extracted from a plurality of information in the delay data. Wherein, corresponding to the service requirement, the specified information is the sales volume of the brand mobile phone in each data.
(2) And performing summation processing on the extracted specified information to obtain a summation result.
(3) And determining the summation result as an intermediate result corresponding to the delay data.
After receiving the delay data and processing the delay data according to the preset data processing method by using the above method, the delay data is usually stored in a preset delay data storage location, and an intermediate result corresponding to the delay data is stored in a preset intermediate result storage location. If new delay data is received again, the intermediate result corresponding to the delay data may be updated based on the new delay data. At this time, the new delay data may be updated into the delay data to obtain the updated delay data.
Specifically, the delay data which is the same as the new delay data in data type and is generated in the same time period can be read from the preset delay data storage position; then determining the read delay data and the new delay data as updated delay data; and then processing the updated delay data based on a preset data processing mode to obtain an intermediate result corresponding to the updated delay data. The intermediate result may be obtained by processing the delay data generated in the same time period and the same data type received a plurality of times.
In step S318, the normal data having the same data type as the delayed data and being in the same time period as the generation time of the delayed data is determined as the normal data corresponding to the delayed data.
When the current device receives the normal data, the normal data is processed according to a preset data processing mode, and the normal data and the processing result are stored in a preset storage position. When the delay data is received, normal data which is the same as the data type of the delay data and has the same generation time in the same time period can be searched in a preset storage position according to the data type of the delay data and the generation time of the delay data, and the normal data is used as normal data corresponding to the delay data.
Step S320, performing fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data to obtain a final processing result corresponding to the delay data and the normal data.
The final processing result is a final processing result corresponding to the current delay data and the normal data. If new delay data is received and the intermediate result is updated based on the new delay data, the step S320 is continuously performed, and fusion processing is performed based on the updated intermediate result and the processing result corresponding to the normal data, so as to obtain an updated final data result.
In the method, the generation time of the data is obtained by analyzing the received data log, and the data is determined as delay data when the time difference between the arrival time and the generation time of the data is greater than a specified time period; and then processing the delayed data based on a preset data processing mode to obtain an intermediate result, and fusing the processing result of the normal data corresponding to the delayed data with the intermediate result to obtain a final result, thereby avoiding the repeated calculation of the normal data and improving the calculation efficiency.
The embodiment of the invention also provides another data processing method which is realized on the basis of the method shown in the figure 2.
The data delay is usually caused by abnormal acquisition of a data acquisition device (equivalent to the above edge node device) and abnormal jitter of a transmission network, and the main manifestation of the data delay is that a large deviation exists between the data generation time and the data processing time, so that delayed data is omitted during processing of a calculation task, and the accuracy of a final result is not high.
In the related technology, in order to solve the problem of low accuracy of the final result, analysis processing of delay data can be completed through repeated calculation of full data; the total amount of data includes data transmitted normally and data transmitted with delay, so the data transmitted normally usually needs to be calculated repeatedly. The way of the full data repeated calculation wastes calculation resources, has high cost, and can cause the instability of the system. The disadvantage of this approach is even more pronounced in the case of multiple receptions of delayed data.
The method provided by the embodiment of the invention separately stores the delay data and the normal data by identifying and classifying the delay data, processes the delay data and the normal data through a calculation task, separately stores the processing result obtained by the normal data and the intermediate result obtained by the delay data, and combines the intermediate result of the delay data and the processing result of the normal data when entering a final storage system to obtain a final processing result.
The method can also be called as a data fusion method for delayed data, and the delayed data is processed independently mainly through multiple heterogeneous executions of calculation tasks (the heterogeneous executions are that different original data are read through multiple runs and are mainly referred to as delayed data); the method is applied to a central server connected with edge node equipment, and comprises the following specific steps:
(1) normal data and delayed data are separately written into different data storages through an analysis link of a log (equivalent to the data log); specifically, normal data and delayed data are distinguished by log time (i.e. the generation time of the data) and reference time of task operation (the time for starting processing of the data, which can also be regarded as the time for the data to reach the central server due to real-time processing of the data), and the delayed data is the data with the time for reaching the central server and the generation time greater than a preset time threshold.
(2) The calculation task is divided into two parts, wherein the first part carries out coarse-grained aggregation operation to reduce the data volume (which is equivalent to the process of processing normal data to obtain the processing result of normal data based on a preset data processing mode and processing delayed data to obtain the intermediate result of the delayed data based on the preset data processing mode), and the second part mainly aims at different requirements to construct a data mart.
A Data Mart (Data Mart), also called a Data market, may be regarded as a database for storing service indexes corresponding to a plurality of service demands, where the service indexes may be the final processing results obtained by the above-mentioned normal Data and delayed Data.
In the first part, the task operates for the first time to read normal data (the normal data is obtained firstly), the result data is written into a normal result storage, the delayed data is read in the non-first-time operation, the result data is written into the delayed result storage, then in the second part of the calculation task, the data in the normal result storage and the data in the delayed result storage are read out simultaneously for processing, the final result is output to a corresponding storage of a data mart layer, the fusion of the normal data and the delayed data is realized while the calculation cost is reduced, and the requirements of various types of data are met.
The data processing method distinguishes the normal data from the delayed data through the arrival time of the data, carries out one-time calculation on the normal data, carries out multiple updating and calculation on the delayed data, reduces data errors caused by data delay, and then merges the delayed data and the normal data and writes the merged data into a storage system to finish the processing of the delayed data. The method improves the efficiency of data processing and saves computing resources.
Corresponding to the foregoing data processing method embodiment, an embodiment of the present invention provides a data processing apparatus, as shown in fig. 4, the apparatus including:
a delay data determination module 400, configured to determine delay data from the acquired data based on time information of the acquired data; wherein delaying the data comprises: after the data is generated, the data reaches the current equipment after a specified time period is exceeded;
a delayed data processing module 402, configured to process the delayed data based on a preset data processing manner, to obtain an intermediate result corresponding to the delayed data;
a data fusion module 404, configured to perform fusion processing on the intermediate result corresponding to the delayed data and the processing result of the normal data corresponding to the delayed data to obtain a final processing result corresponding to the delayed data and the normal data; wherein, the normal data comprises: after data is generated, the data arrives at the current device within a specified time period.
The data processing device determines delay data from the acquired data based on time information of the acquired data; then processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data; and further performing fusion processing on the intermediate result corresponding to the delay data and the processing result of the normal data corresponding to the delay data, so as to obtain the final processing result corresponding to the delay data and the normal data. According to the method, the intermediate result is obtained by processing the delay data, and then the intermediate result and the processing result obtained based on the normal data are subjected to fusion processing to obtain the final processing result, so that the data processing efficiency is improved, and the computing resources are saved.
Further, the time information includes the generation time of the data and the time when the data reaches the current device; the delay data determination module is further configured to: judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is greater than a specified time period; and if the time period is greater than the specified time period, determining the acquired data as delay data.
Further, the apparatus further includes a time acquisition module, configured to: receiving a data log sent by preset edge node equipment; recording the arrival time of the data log to the current equipment; analyzing the data log to obtain the generation time of the data in the data log; and determining the arrival time of the data log as the time of the data in the data log to the current device.
Further, the time information includes the generation time of the data and the time when the data reaches the current device; the device further comprises a processing result acquisition module, which is used for: judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is less than or equal to a specified time period; if the time is less than or equal to the specified time period, determining the acquired data as normal data; and processing the normal data based on a preset data processing mode to obtain a processing result corresponding to the normal data.
Further, the time information includes a generation time of the data; the apparatus further comprises a normal data determination module configured to: and determining normal data which has the same data type with the delay data and is in the same time period with the generation time of the delay data as the normal data corresponding to the delay data.
Further, the apparatus further includes an intermediate result updating module, configured to: if new delay data is received, the intermediate result corresponding to the delay data is updated based on the new delay data.
Further, the intermediate result updating module is further configured to: updating the new delay data into the delay data to obtain updated delay data; and processing the updated delay data based on a preset data processing mode to obtain an intermediate result corresponding to the updated delay data.
Furthermore, the data processing modes comprise a plurality of modes; each data processing mode corresponds to a service requirement; the delayed data processing module is further configured to: and processing the delay data based on a data processing mode corresponding to the service requirement according to different service requirements to obtain an intermediate result corresponding to the delay data.
Further, the acquired data includes a plurality of pieces of information; the data processing mode comprises the steps of specifying information summation; the delayed data processing module is further configured to: extracting the specified information from the plurality of information in the delay data; summing the extracted specified information to obtain a summing result; and determining the summation result as an intermediate result corresponding to the delay data.
The data processing apparatus provided in the embodiment of the present invention has the same implementation principle and technical effect as those of the foregoing data processing method embodiment, and for brief description, reference may be made to corresponding contents in the foregoing data processing method embodiment for a part not mentioned in the data processing apparatus embodiment.
An embodiment of the present invention further provides a server, as shown in fig. 5, the server includes a processor 130 and a memory 131, the memory 131 stores machine executable instructions capable of being executed by the processor 130, and the processor 130 executes the machine executable instructions to implement the data processing method.
Further, the server shown in fig. 5 further includes a bus 132 and a communication interface 133, and the processor 130, the communication interface 133 and the memory 131 are connected through the bus 132.
The Memory 131 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 133 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 132 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.
The processor 130 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 130. The Processor 130 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 131, and the processor 130 reads the information in the memory 131 and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.
The embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the data processing method.
The data processing method and apparatus provided in the embodiments of the present invention, and the computer program product of the server include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementation may refer to the method embodiments, and will not be described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A data processing method, comprising:
determining delay data from the acquired data based on time information of the acquired data; wherein the delay data comprises: after the data is generated, the data reaches the current equipment after a specified time period is exceeded;
processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data;
performing fusion processing on an intermediate result corresponding to the delay data and a processing result of normal data corresponding to the delay data to obtain a final processing result corresponding to the delay data and the normal data; wherein the normal data includes: after the data is generated, the data reaching the current device within the specified time period.
2. The method of claim 1, wherein the time information comprises a generation time of the data and a time of arrival of the data at a current device;
the step of determining delay data from the acquired data based on the time information of the acquired data includes:
judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is greater than the specified time period or not;
and if the time period is larger than the specified time period, determining the acquired data as delay data.
3. The method according to claim 2, wherein the time information of the acquired data is determined by:
receiving a data log sent by preset edge node equipment;
recording the arrival time of the data log to the current equipment;
analyzing the data log to obtain the generation time of the data in the data log;
and determining the arrival time of the data log as the time of the data in the data log arriving at the current device.
4. The method of claim 1, wherein the time information comprises a generation time of the data and a time of arrival of the data at a current device;
the processing result of the normal data corresponding to the delay data is obtained by the following mode:
judging whether the time difference between the time when the acquired data reaches the current equipment and the generation time is less than or equal to the specified time period;
if the time period is less than or equal to the specified time period, determining the acquired data as the normal data;
and processing the normal data based on a preset data processing mode to obtain a processing result corresponding to the normal data.
5. The method of claim 1, wherein the time information includes a generation time of the data;
before the step of performing fusion processing on the intermediate result corresponding to the delayed data and the processing result of the normal data corresponding to the delayed data to obtain the final processing result corresponding to the delayed data and the normal data, the method further includes:
and determining normal data which has the same data type with the delay data and is in the same time period with the generation time of the delay data as the normal data corresponding to the delay data.
6. The method according to claim 1, wherein before the step of performing fusion processing on the intermediate result corresponding to the delayed data and the processing result of the normal data corresponding to the delayed data to obtain the final processing result corresponding to the delayed data and the normal data, the method further comprises:
and if new delay data are received, updating an intermediate result corresponding to the delay data based on the new delay data.
7. The method of claim 6, wherein updating the intermediate result corresponding to the delay data based on the new delay data comprises:
updating the new delay data into the delay data to obtain updated delay data;
and processing the updated delay data based on a preset data processing mode to obtain an intermediate result corresponding to the updated delay data.
8. The method of claim 1, wherein the data processing means comprises a plurality of types; each data processing mode corresponds to a service requirement;
the step of processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data comprises the following steps:
and processing the delay data based on the data processing mode corresponding to the service requirement according to different service requirements to obtain an intermediate result corresponding to the delay data.
9. The method of claim 1, wherein the obtained data comprises a plurality of information; the data processing mode comprises the summation of specified information;
the step of processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data comprises the following steps:
extracting specified information from a plurality of information in the delay data;
summing the extracted specified information to obtain a summing result;
and determining the summation result as an intermediate result corresponding to the delay data.
10. A data processing apparatus, comprising:
the delay data determining module is used for determining delay data from the acquired data based on the time information of the acquired data; wherein the delay data comprises: after the data is generated, the data reaches the current equipment after a specified time period is exceeded;
the delay data processing module is used for processing the delay data based on a preset data processing mode to obtain an intermediate result corresponding to the delay data;
the data fusion module is used for carrying out fusion processing on an intermediate result corresponding to the delay data and a processing result of normal data corresponding to the delay data to obtain a final processing result corresponding to the delay data and the normal data; wherein the normal data includes: after the data is generated, the data reaching the current device within the specified time period.
11. A server comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the data processing method of any one of claims 1 to 9.
12. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to carry out the data processing method of any one of claims 1 to 9.
CN202110823981.XA 2021-07-21 2021-07-21 Data processing method and device and server Pending CN113535643A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110823981.XA CN113535643A (en) 2021-07-21 2021-07-21 Data processing method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110823981.XA CN113535643A (en) 2021-07-21 2021-07-21 Data processing method and device and server

Publications (1)

Publication Number Publication Date
CN113535643A true CN113535643A (en) 2021-10-22

Family

ID=78100648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110823981.XA Pending CN113535643A (en) 2021-07-21 2021-07-21 Data processing method and device and server

Country Status (1)

Country Link
CN (1) CN113535643A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103546514A (en) * 2012-07-13 2014-01-29 阿里巴巴集团控股有限公司 Method and system for processing delay-transmitted log data
CN107704373A (en) * 2017-10-31 2018-02-16 北京奇艺世纪科技有限公司 A kind of data processing method and device
CN111444172A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Data monitoring method, device, medium and equipment
CN112231296A (en) * 2020-09-30 2021-01-15 北京金山云网络技术有限公司 Distributed log processing method, device, system, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103546514A (en) * 2012-07-13 2014-01-29 阿里巴巴集团控股有限公司 Method and system for processing delay-transmitted log data
CN107704373A (en) * 2017-10-31 2018-02-16 北京奇艺世纪科技有限公司 A kind of data processing method and device
CN111444172A (en) * 2019-01-17 2020-07-24 北京京东尚科信息技术有限公司 Data monitoring method, device, medium and equipment
CN112231296A (en) * 2020-09-30 2021-01-15 北京金山云网络技术有限公司 Distributed log processing method, device, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN109104336B (en) Service request processing method and device, computer equipment and storage medium
CN106815254B (en) Data processing method and device
US9832280B2 (en) User profile configuring method and device
WO2021017884A1 (en) Data processing method and apparatus, and gateway server
CN108492150B (en) Method and system for determining entity heat degree
CN112434039A (en) Data storage method, device, storage medium and electronic device
CN110022259B (en) Message arrival rate determining method and device, data statistics server and storage medium
US11062350B2 (en) Method, apparatus, and device for monitoring promotion status data, and non-volatile computer storage medium
CN110990438A (en) Data processing method and device, electronic equipment and storage medium
CN113596078B (en) Service problem positioning method and device
CN110807050B (en) Performance analysis method, device, computer equipment and storage medium
CN117675866A (en) Data processing method, device, equipment and medium based on Bayesian inference
CN111401959B (en) Risk group prediction method, apparatus, computer device and storage medium
WO2020224242A1 (en) Blockchain data processing method and apparatus, server and storage medium
CN113535643A (en) Data processing method and device and server
CN108519909B (en) Stream data processing method and device
CN113094241B (en) Method, device, equipment and storage medium for determining accuracy of real-time program
CN115328734A (en) Cross-service log processing method and device and server
CN114579416A (en) Index determination method, device, server and medium
US20190004885A1 (en) Method and system for aiding maintenance and optimization of a supercomputer
CN110442572B (en) User characteristic value determining method and device
JP7119484B2 (en) Information aggregation device, information aggregation method, and program
CN113037420A (en) Reading time stamp obtaining method and device, electronic equipment and storage medium
CN112131276A (en) Data statistics method, electronic equipment and readable storage medium
WO2019169696A1 (en) Platform client data backflow method, electronic apparatus, device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination