CN110874268B - Data processing method, device and equipment - Google Patents

Data processing method, device and equipment Download PDF

Info

Publication number
CN110874268B
CN110874268B CN201811027346.5A CN201811027346A CN110874268B CN 110874268 B CN110874268 B CN 110874268B CN 201811027346 A CN201811027346 A CN 201811027346A CN 110874268 B CN110874268 B CN 110874268B
Authority
CN
China
Prior art keywords
data
data processing
processing system
processed
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811027346.5A
Other languages
Chinese (zh)
Other versions
CN110874268A (en
Inventor
高顺路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201811027346.5A priority Critical patent/CN110874268B/en
Publication of CN110874268A publication Critical patent/CN110874268A/en
Application granted granted Critical
Publication of CN110874268B publication Critical patent/CN110874268B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention provides a data processing method, a device and equipment, wherein the method comprises the following steps: the method comprises the steps of firstly obtaining to-be-processed data and operation state information corresponding to a plurality of data processing systems. And respectively determining the processing capacity of each data processing system according to the acquired running state information corresponding to each of the plurality of data processing systems. And determining a target data processing system corresponding to a first data block in the data to be processed according to the processing capacity of each data processing system, and sending the first data block to the target data processing system so that the target data processing system processes the first data block. By arranging the plurality of data processing systems, each data block can be processed by the data processing system with the most suitable processing capacity, so that the data blocks are processed in time, and the data to be processed can be completely processed in the preset time. The user can obtain the data processing result corresponding to all the data to be processed, and the usability of the data processing result is improved.

Description

Data processing method, device and equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method, apparatus, and device.
Background
For scenarios where there is a large amount of data to be processed, it is common practice to process this large amount of data using a data processing system. After the processing, the data processing system further sends the processing result of the data to be processed to the terminal device used by the user, so that the user can know the processing result.
But due to network instability or internal reasons of the data processing system such as excessive load, the data processing system cannot process all data in a predetermined time in a timely manner. At this time, the data processing result obtained by the user is only the processing result corresponding to part of the processed data, but not the processing result corresponding to all the data, and the usability of the user for obtaining the data processing result is not high.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method, apparatus and device, so as to ensure timeliness of a data processing result, thereby improving availability of the data processing result.
In a first aspect, an embodiment of the present invention provides a data processing method, including:
acquiring data to be processed;
acquiring running state information corresponding to a plurality of data processing systems;
determining the processing capacity corresponding to each of the plurality of data processing systems according to the running state information;
determining a target data processing system corresponding to a first data block in the data to be processed according to the processing capacity;
and sending the first data block to the target data processing system for processing.
In a second aspect, an embodiment of the present invention provides a data processing apparatus, including:
the data acquisition module is used for acquiring data to be processed;
the information acquisition module is used for acquiring the running state information corresponding to the plurality of data processing systems;
a processing capacity determining module, configured to determine, according to the running state information, processing capacities corresponding to the plurality of data processing systems, respectively;
the target data processing system determining module is used for determining a target data processing system corresponding to a first data block in the data to be processed according to the processing capacity;
and the sending module is used for sending the first data block to the target data processing system for processing.
In a third aspect, an embodiment of the present invention provides a data processing apparatus, including: the system comprises a coordinator, a data source device and a plurality of data processors, wherein the data source device and the data processors are respectively in communication connection with the coordinator;
the data source device is used for providing data to be processed;
the coordinator is used for acquiring the data to be processed; acquiring running state information corresponding to the plurality of data processors; determining the processing capacity corresponding to each of the plurality of data processors according to the running state information; and determining a target data processor corresponding to a first data block in the data to be processed according to the processing capacity, so that the target data processor processes the first data block.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium for storing and storing a computer program, where the computer program is used to make a computer implement the data processing method in the first aspect when executed.
The data processing method provided by the embodiment of the invention firstly obtains the data to be processed and the running state information corresponding to each of the plurality of data processing systems. And then, respectively determining the processing capacity of each data processing system according to the acquired running state information corresponding to each of the plurality of data processing systems. Then, a target data processing system for processing the first data block in the data to be processed is determined according to the processing capacity of each data processing system. Finally, the first data block is sent to the target data processing system, so that the target data processing system processes the first data block. By arranging the plurality of data processing systems, each data block can be processed by the data processing system with the most suitable processing capacity, so that the data blocks can be processed in time, and the data to be processed can be completely processed in the preset time. At this time, the user can obtain the data processing result corresponding to all the data to be processed, and the usability of the data processing result is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of another data processing method according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "the plural" typically includes at least two, but does not exclude the presence of at least one.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a recognition", depending on the context. Similarly, the phrase "if determined" or "if identified (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (a stated condition or event)" or "in response to an identification (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of additional like elements in a commodity or system comprising the element.
In addition, the sequence of steps in the embodiments of the methods described below is merely an example, and is not strictly limited.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention, where the data processing method according to the embodiment of the present invention may be executed by a coordinator. As shown in fig. 1, the method comprises the steps of:
and S101, acquiring data to be processed.
The data source server may include a large amount of data to be processed, and the coordinator may directly obtain the data to be processed in the data source server. In an alternative manner, the data source server may directly divide the data to be processed into a plurality of data blocks, and send the plurality of data blocks to the coordinator, where the coordinator directly obtains the data to be processed, which is actually the plurality of data blocks. Alternatively, the data source server may not perform any processing on the data to be processed, but directly send it to the coordinator. And then the coordinator divides the data to be processed into data blocks. That is, the coordinator can divide the data to be processed by itself, so as to obtain a plurality of data blocks. Whichever form of the plurality of data blocks is available, any one of them may be the first data block.
For the division of the data blocks, optionally, the coordinator may perform the division of the data blocks according to the time stamps carried by each data itself. For example, data with a timestamp within a preset time duration may be divided into one data block. For example, the data to be processed may be resource data such as video or audio, or may also be service data accumulated by a service platform in a certain period of time. The service data may be the number of accesses to the service platform in a certain period of time, the number of clicks of each part of service content, and the like. When the data to be processed is the resource data, the timestamp carried by the data may be a playing timestamp used for indicating the playing time of the data. Assuming that the preset time duration is 2 minutes, the data with the playing time within the range of 00 to 00.
S102, obtaining the running state information corresponding to the data processing systems.
And S103, determining the processing capacity corresponding to each of the plurality of data processing systems according to the running state information.
The coordinator may then further obtain operating state information for each of the plurality of data processing systems. The operating state information may be obtained by the coordinator or sent to the coordinator by the data processing system. And optionally the operational status information may include network quality information and/or load pressure information. The network quality information may include a network delay time. The load pressure information may include a number of failures of the data processing system and/or a processing delay time of the data processing system within a preset time period.
The acquired running state information can be used for representing the processing capacity of the data processing system. Alternatively, the coordinator may determine respective corresponding processing capabilities from network quality information of the plurality of data processing systems. For example, the shorter the network latency, the better the processing power characterizing the data processing system. Alternatively, the coordinator may determine the respective processing capacities according to the load pressure information of the plurality of data processing systems. For example, the shorter the processing delay time, the better the processing capability of the characterization data processing system; the fewer the number of failures, the better the processing power characterizing the data processing system.
And S104, determining a target data processing system corresponding to the first data block in the data to be processed according to the processing capacity.
And S105, sending the first data block to a target data processing system for processing.
Finally, the coordinator may directly determine the data processing system with the best processing capability as the target data processing system corresponding to the first data block, and send the first data block to the target data processing system, so that the target data processing system processes the first data block, thereby obtaining the processing result corresponding to the first data block.
The above steps 101 to 104 can be understood as a process of allocating one data block in the data to be processed. However, due to network fluctuation or the state of the data processing system itself, the processing capability of each data processing system is in a state of real-time change. Therefore, the above allocation processing is performed on all data blocks, that is, each data block is processed by the data processing system with the best processing capability at that time, so as to ensure that each data block can be processed in time.
In summary, in the embodiments of the present invention, the coordinator first obtains the to-be-processed data and the operation status information corresponding to each of the plurality of data processing systems. And then, respectively determining the processing capacity of each data processing system according to the acquired running state information corresponding to each of the plurality of data processing systems. Then, a target data processing system for processing the first data block in the data to be processed is determined according to the processing capacity of each data processing system. Finally, the first data block is sent to a target data processing system, so that the target data processing system processes the first data block. By arranging a plurality of data processing systems, the coordinator can deliver each data block to the data processing system with the most suitable processing capacity for processing, so that the data blocks can be processed in time, and the data to be processed can be completely processed within the preset time. At this time, the user can obtain the data processing result corresponding to all the data to be processed, and the usability of the data processing result is improved.
In the above embodiments, the data to be processed may be different types of service data in a service platform, for example, the number of accesses to the service platform in a certain time, the number of respective clicks of different service contents, and the like. The total amount of data for different types of traffic data is also different and the processing power of the data processing system itself is limited. Therefore, in consideration of the processing capability of the data processing system itself and the processing efficiency of data, fig. 2 is a flowchart of another data processing method provided by the embodiment of the present invention, and as shown in fig. 2, the method may include the following steps:
s201, according to the service type corresponding to the data to be processed, determining the time granularity for dividing the data blocks.
S202, dividing the data to be processed according to the time granularity to obtain a first data block in the data to be processed.
The data to be processed may carry a timestamp and a service type identifier. The coordinator can determine the service type corresponding to the data to be processed according to the service type identifier, and determine the time granularity according to the preset corresponding relation between the service type and the divided time granularity. And then, performing data block division on the data to be processed of the service type by using the time granularity, namely dividing the data with the time stamp within the time granularity into one data block, so as to divide all the data to be processed into a plurality of data blocks, wherein any one of the plurality of data blocks can be regarded as a first data block. For example, the time granularity may be set to 1 minute, and data with time stamps between 10. This time granularity may be understood as the preset time period mentioned in the above embodiments.
Optionally, the rule for establishing the preset correspondence between the service type and the time granularity may be: the more the total data amount is, the smaller the corresponding time granularity is, namely, the more the service type of the data to be processed is, the less the service type is, namely, the anti-correlation relationship exists between the service type and the time granularity. For example, the access amount of a service platform is usually higher than the click rate of a specific service in the service platform, and the time granularity may be set to 1 minute when the data to be processed is business data representing the access amount of the service platform. And when the data to be processed is traffic data representing a specific service click volume, the time granularity may be set to 10 minutes.
For the data to be processed, it may be streaming data or non-streaming data. When the data to be processed is non-streaming data, the non-streaming data to be processed is already stored in the coordinator in advance. When the data to be processed is streaming data, the streaming data to be processed may be continuously acquired from the data source server, and the data to be processed in the data source server is also continuously generated. When the data to be processed is the traffic data, whether the traffic data is streaming data or non-streaming data, the time stamp thereof may be a generation time stamp indicating a generation time of the data.
S203, acquiring the running state information corresponding to the plurality of data processing systems.
The execution process of step 203 is similar to the corresponding steps in the foregoing embodiment, and reference may be made to the related description in the embodiment shown in fig. 1, which is not repeated herein.
And S204, determining the processing capacity corresponding to each of the plurality of data processing systems according to the running state information.
It has been mentioned in the above embodiments that the operational status information of the data processing system may comprise network quality information and load pressure information. And the coordinator may determine the respective processing capabilities of the data processing systems from the information of the single dimension. In addition to the manner of determining processing capacity provided in the above embodiments, in order to improve the accuracy of processing capacity determination, it is also possible to determine the processing capacity of the data processing system by combining multidimensional information, i.e., by using network quality information and load pressure information at the same time.
One alternative way is to: calculating a health index for any of the data processing systems according to the following formula:
Q=(PD+ND)/T+e F
the PD is the processing delay time of any data processing system, the ND is the network delay time of any data processing system, the T is the time granularity for dividing data blocks, the e is the base of a natural logarithm, and the F is the failure frequency of any data processing system in a preset time period.
The health index calculated in the above manner may be used to represent the processing power of the data processing system, with lower health indices providing better processing power of the data processing system.
And S205, determining a target data processing system corresponding to the first data block according to the processing capacity.
Then, the coordinator may determine a target data processing system corresponding to the first data block according to the calculated processing capability of each data processing system. In addition to the determination of the target data processing system provided in step 104 of the above embodiment, the coordinator may alternatively obtain the processing capability of the data processing system corresponding to the previous data block of the first data block, i.e. the second data block. Wherein the time stamp of the data in the second data block is adjacent to the time stamp of the data in the first data block, and the time stamp of the data in the second data block is earlier than the time stamp of the data in the first data block. Taking streaming data as an example, the generation time stamp of data in the first data block may be between 10.
For the sake of simplicity in the following description, the data processing system with the best processing capability among the plurality of data processing systems may be referred to as a first data processing system, and the data processing system corresponding to the second data block may be referred to as a second data processing system. After obtaining the processing capabilities of the two data processing systems, the coordinator compares the two:
and if the processing capacity difference degree between the second data processing system and the first data processing system is smaller than the threshold value, determining that the second data processing system is the target data processing system corresponding to the first data block.
If the processing capacity difference between the second data processing system and the first data processing system is greater than the threshold, the coordinator further determines the data processing system with the best processing capacity among the plurality of data processing systems, and then determines the first data processing system as the target data processing system corresponding to the first data block.
The process of comparing and determining the target data processing system is essentially: and for a first data block and a second data block of which the generation time stamps of the two data are adjacent, if the processing capacity of a data processing system is more outstanding, determining that the first data block is handed to the data processing system with the more outstanding processing capacity for processing. If there is no data processing system with outstanding processing capacity, it is determined that the processing of the first data block is to be continued by the data processing system that processes the second data block.
And S206, sending the first data block to a target data processing system for processing.
And the coordinator sends the first data block to the determined target data processing system so as to process the first data block, thereby obtaining a processing result corresponding to the first data block.
In order to ensure independence of each data processing system on data block receiving and processing processes and efficiency of data block processing, optionally, a buffer queue is further provided in each data processing system. At this time, the coordinator may send the first data block to a cache queue corresponding to the target data processing system. The target data processing system may sequentially process the data blocks in the buffer queue, thereby obtaining a processing result corresponding to each data block.
In this embodiment, the time granularity used for dividing the data block is determined according to the service type corresponding to the data to be processed, and then the data to be processed is divided according to the determined time granularity, so as to obtain the first data block in the data to be processed. Therefore, the data volume contained in the divided data blocks is reasonable, namely the data blocks are more in line with the processing capacity of the data processing system. Meanwhile, the accuracy of the determined target data processing system of each data block can be greatly improved through the use of the multi-dimensional running state information, so that the distribution process of the data blocks is more accurate. The two aspects can enable the data processing system to smoothly and timely process the data block, so that the data to be processed can be completely processed within the preset time, a user can obtain a data processing result corresponding to all the data to be processed, and the usability of the data processing result is improved.
In addition, for the streaming data to be processed, due to a network reason or a data processing system itself, data with a timestamp within the current granularity of time may be received by the coordinator at a different time, and the coordinator usually divides the data blocks in a timing manner, so that the data with the timestamp within the current granularity of time may be divided into a plurality of data blocks. In this case, the coordinator may hand over multiple data blocks containing data with a generation timestamp within the current time granularity to the same data processing system for processing.
Besides, the data to be processed in the above embodiments correspond to the same service type. In practical applications, however, the data to be processed generated by the data source server may correspond to a plurality of service types. At this time, the coordinator may classify the data to be processed according to the service type tag carried by the data to be processed, and perform the above-mentioned processing on the data to be processed belonging to the same service type.
The data processing apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these data processing devices may each be constructed using commercially available hardware components configured through the steps taught in this scenario.
Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus includes: a data acquisition module 11, an information acquisition module 12, a processing capability determination module 13, a target data processing system determination module 14, and a transmission module 15.
And the data acquisition module 11 is configured to acquire data to be processed.
The information obtaining module 12 is configured to obtain running state information corresponding to each of the plurality of data processing systems.
And a processing capability determining module 13, configured to determine, according to the operation state information, a processing capability corresponding to each of the plurality of data processing systems.
And the target data processing system determining module 14 is configured to determine, according to the processing capability, a target data processing system corresponding to the first data block in the data to be processed.
And the sending module 15 is configured to send the first data block to the target data processing system for processing.
Optionally, the apparatus may further include: and a time granularity determining module 21, configured to determine a time granularity for dividing the data block according to the service type corresponding to the data to be processed.
The data obtaining module 11 is configured to divide the data to be processed according to the time granularity to obtain a first data block.
Optionally, the data to be processed is streaming data; the data obtaining module 11 is specifically configured to: and determining data with the generation time stamp within the current time granularity as a first data block according to the generation time stamp of each data in the streaming data.
Optionally, the target data processing system determining module 14 is specifically configured to: and determining the data processing system with the best processing capacity in the plurality of data processing systems as the target data processing system corresponding to the first data block.
Optionally, the target data processing system determination module 14 is specifically adapted for
If the processing capacity difference degree between the data processing system corresponding to the previous second data block of the first data block and the data processing system with the best processing capacity is smaller than the threshold value, determining that the data processing system corresponding to the second data block is the target data processing system corresponding to the first data block;
if the processing capacity difference degree between the data processing system corresponding to the previous second data block of the first data block and the data processing system with the best processing capacity is larger than the threshold value, determining the data processing system with the best processing capacity in the plurality of data processing systems; and determining the data processing system with the best processing capacity as the target data processing system corresponding to the first data block.
Optionally, the operation state information includes: network quality information and load pressure information.
Optionally, the processing capability determining module 13 is specifically configured to: calculating a health index for the data processing system according to the following formula: q = (PD + ND)/T + e F
Wherein PD is a processing delay time in the load pressure information, ND is a network delay time in the network quality information, T is a time granularity for dividing a data block, e is a base of a natural logarithm, and F is a failure frequency of the data processing system in the load pressure information within a preset time period.
Optionally, the sending module 15 is specifically configured to: and sending the first data block to a cache queue corresponding to the target data processing system.
The apparatus shown in fig. 3 can perform the method of the embodiment shown in fig. 1-2, and the related description of the embodiment shown in fig. 1-2 can be referred to for the part not described in detail in this embodiment. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to 2, and are not described herein again.
Fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and as shown in fig. 4, the data processing apparatus includes: a data source device 31, a coordinator 32, and a plurality of data processors 33. The coordinator 32 is communicatively connected to the data source device 31 and the plurality of data processors 33, respectively.
And the data source device 31 is used for providing data to be processed. Alternatively, the data source device 31 may be understood as the data source server referred to in the above embodiments.
A coordinator 32, configured to acquire data to be processed provided by the data source device 31; acquiring operation state information corresponding to each of the plurality of data processors 33; determining the processing capacity corresponding to each of the plurality of data processors 33 according to the operation state information; determining a target data processor corresponding to a first data block in the data to be processed according to the processing capacity; the first data block is sent to the target data processor for processing of the first data block by the target data processor.
A plurality of data processors 33 for processing the respective received data blocks.
Optionally, the plurality of data processors 33 are further configured to send the operating status information of the data processors to the coordinator 32 at regular time. Each data processor 33 may further have a buffer queue, and the data processors 33 may process the data blocks according to the time sequence of entering the buffer queue. The data processor 33 is also the data processing system in the above embodiments.
Optionally, the data processing apparatus further comprises: and a processing result synthesizer 34 for synthesizing the processing results generated by the plurality of data processors 33 to obtain a data processing result.
After the data processing devices 33 respectively process the respective data blocks, the processing results corresponding to the data blocks may be sent to the processing result synthesizer 34, and the processing results corresponding to the data blocks are synthesized by the processing result synthesizer 34, so as to obtain a complete processing result corresponding to the data to be processed.
For example, the processing result corresponding to the data to be processed may be embodied as a graph, and the processing result corresponding to a certain data block may be regarded as a segment of the graph. The graph may be a graph representing the amount of visits a service platform has during a day. At this time, the processing result synthesizer 34 functions to synthesize segments of a plurality of graphs to obtain one completed graph.
For the parts of the embodiment not described in detail, reference may be made to the related description of the embodiment shown in fig. 1-2. The implementation process and technical effect of the technical solution refer to the description in the embodiment shown in fig. 1 to 2, and are not described herein again.
In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the electronic device, which includes a program for executing the data processing method in the method embodiments shown in fig. 1 to 2.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by adding a necessary general hardware platform, and of course, can also be implemented by a combination of hardware and software. With this understanding in mind, the above-described aspects and portions of the present technology which contribute substantially or in part to the prior art may be embodied in the form of a computer program product, which may be embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including without limitation disk storage, CD-ROM, optical storage, and the like.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (7)

1. A data processing method, comprising:
acquiring data to be processed;
determining time granularity for dividing data blocks according to the service type corresponding to the data to be processed;
dividing the data to be processed according to the time granularity to obtain a first data block;
acquiring running state information corresponding to a plurality of data processing systems respectively, wherein the running state information comprises network quality information and load pressure information;
determining the processing capacity corresponding to each of the plurality of data processing systems according to the running state information;
determining a target data processing system corresponding to a first data block in the data to be processed according to the processing capacity;
sending the first data block to the target data processing system for processing; wherein the determining the processing capabilities corresponding to the plurality of data processing systems according to the operating state information includes:
calculating a health index for the data processing system according to the following formula: q = (PD + ND)/T + eF
Wherein PD is a processing delay time in the load pressure information, ND is a network delay time in the network quality information, T is a time granularity for dividing data blocks, e is a base of a natural logarithm, and F is a failure frequency of the data processing system in the load pressure information within a preset time period.
2. The method of claim 1, wherein the data to be processed comprises streaming data;
and determining data with the generation timestamp within the current time granularity as the first data block according to the generation timestamp of each data in the streaming data.
3. The method of claim 1, wherein:
and determining the data processing system with the best processing capacity in the plurality of data processing systems as the target data processing system corresponding to the first data block.
4. The method of claim 1, wherein:
if the processing capacity difference degree between the data processing system corresponding to the previous second data block of the first data block and the data processing system with the best processing capacity is smaller than a threshold value, determining that the data processing system corresponding to the second data block is the target data processing system corresponding to the first data block;
if the processing capacity difference degree between the data processing system corresponding to the previous second data block of the first data block and the data processing system with the best processing capacity is larger than a threshold value, determining the data processing system with the best processing capacity in the plurality of data processing systems;
and determining the data processing system with the best processing capacity as a target data processing system corresponding to the first data block.
5. The method of claim 1, wherein sending the first data block to the target data processing system for processing comprises:
and sending the first data block to a cache queue corresponding to the target data processing system.
6. A data processing apparatus, characterized by comprising:
the data acquisition module is used for acquiring data to be processed; dividing the data to be processed according to the time granularity obtained by the time granularity determining module to obtain a first data block;
the time granularity determining module is used for determining the time granularity for dividing the data block according to the service type corresponding to the data to be processed;
the information acquisition module is used for acquiring the running state information corresponding to each of the plurality of data processing systems, wherein the running state information comprises network quality information and load pressure information;
a processing capacity determining module, configured to determine, according to the running state information, processing capacities corresponding to the plurality of data processing systems, respectively;
the target data processing system determining module is used for determining a target data processing system corresponding to a first data block in the data to be processed according to the processing capacity;
the sending module is used for sending the first data block to the target data processing system for processing;
wherein the processing capability determining module is specifically configured to: calculating a health index for the data processing system according to the following formula: q = (PD + ND)/T + eF
Wherein PD is a processing delay time in the load pressure information, ND is a network delay time in the network quality information, T is a time granularity for dividing a data block, e is a base of a natural logarithm, and F is a failure frequency of a data processing system in the load pressure information within a preset time period.
7. A data processing apparatus, characterized by comprising: the system comprises a coordinator, a data source device and a plurality of data processors, wherein the data source device and the data processors are respectively in communication connection with the coordinator;
the data source device is used for providing data to be processed;
the coordinator is used for acquiring the data to be processed; determining time granularity for dividing data blocks according to the service type corresponding to the data to be processed; dividing the data to be processed according to the time granularity to obtain a first data block; acquiring running state information corresponding to the plurality of data processors, wherein the running state information comprises network quality information and load pressure information; determining the processing capacity corresponding to each of the plurality of data processors according to the running state information; determining a target data processor corresponding to a first data block in the data to be processed according to the processing capacity, so that the target data processor processes the first data block; wherein the coordinator is specifically configured to:
calculating a health index for the data processing system according to the following formula: q = (PD + ND)/T + eF
Wherein PD is a processing delay time in the load pressure information, ND is a network delay time in the network quality information, T is a time granularity for dividing a data block, e is a base of a natural logarithm, and F is a failure frequency of a data processor in the load pressure information within a preset time period.
CN201811027346.5A 2018-09-04 2018-09-04 Data processing method, device and equipment Active CN110874268B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811027346.5A CN110874268B (en) 2018-09-04 2018-09-04 Data processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811027346.5A CN110874268B (en) 2018-09-04 2018-09-04 Data processing method, device and equipment

Publications (2)

Publication Number Publication Date
CN110874268A CN110874268A (en) 2020-03-10
CN110874268B true CN110874268B (en) 2023-04-18

Family

ID=69716097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811027346.5A Active CN110874268B (en) 2018-09-04 2018-09-04 Data processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN110874268B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434551B (en) * 2021-06-28 2022-05-27 北京百度网讯科技有限公司 Data processing method, device, equipment and computer storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6209054B1 (en) * 1998-12-15 2001-03-27 Cisco Technology, Inc. Reliable interrupt reception over buffered bus
US8055726B1 (en) * 2006-10-31 2011-11-08 Qlogic, Corporation Method and system for writing network data
CN104717545A (en) * 2013-12-17 2015-06-17 乐视网信息技术(北京)股份有限公司 Video playing method and device
CN105159610A (en) * 2015-09-01 2015-12-16 浪潮(北京)电子信息产业有限公司 Large-scale data processing system and method
CN107145307A (en) * 2017-04-27 2017-09-08 郑州云海信息技术有限公司 A kind of dynamic metadata optimization method and system based on distributed storage
CN107180102A (en) * 2017-05-25 2017-09-19 北京环境特性研究所 The storage method and system of a kind of target characteristic data

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020059274A1 (en) * 2000-03-03 2002-05-16 Hartsell Neal D. Systems and methods for configuration of information management systems
US10277477B2 (en) * 2015-09-25 2019-04-30 Vmware, Inc. Load response performance counters

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6209054B1 (en) * 1998-12-15 2001-03-27 Cisco Technology, Inc. Reliable interrupt reception over buffered bus
US8055726B1 (en) * 2006-10-31 2011-11-08 Qlogic, Corporation Method and system for writing network data
CN104717545A (en) * 2013-12-17 2015-06-17 乐视网信息技术(北京)股份有限公司 Video playing method and device
CN105159610A (en) * 2015-09-01 2015-12-16 浪潮(北京)电子信息产业有限公司 Large-scale data processing system and method
CN107145307A (en) * 2017-04-27 2017-09-08 郑州云海信息技术有限公司 A kind of dynamic metadata optimization method and system based on distributed storage
CN107180102A (en) * 2017-05-25 2017-09-19 北京环境特性研究所 The storage method and system of a kind of target characteristic data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林海略 ; 韩燕波 ; .多租户应用的性能管理关键问题研究.计算机学报.2010,(第10期),全文. *
肖子达 ; 朱立谷 ; 冯东煜 ; 张迪 ; .分布式数据库聚合计算性能优化.计算机应用.2017,(第05期),全文. *

Also Published As

Publication number Publication date
CN110874268A (en) 2020-03-10

Similar Documents

Publication Publication Date Title
CN106407207B (en) Real-time newly-added data updating method and device
US20190166192A1 (en) Method and Device for Storage Resource Allocation for Video Cloud Storage
JP2019511054A (en) Distributed cluster training method and apparatus
CN112346829B (en) Method and equipment for task scheduling
CN108279974B (en) Cloud resource allocation method and device
US11057465B2 (en) Time-based data placement in a distributed storage system
US11249987B2 (en) Data storage in blockchain-type ledger
CN110737717A (en) database migration method and device
CN110874268B (en) Data processing method, device and equipment
CN110333984B (en) Interface abnormality detection method, device, server and system
CN112416534A (en) Agent-based task management method and device
CN108664322A (en) Data processing method and system
CN115442262B (en) Resource evaluation method and device, electronic equipment and storage medium
CN110659296A (en) Storage method, device, equipment and computer readable medium
CN114513469A (en) Traffic shaping method and device for distributed system and storage medium
CN111506254A (en) Distributed storage system and management method and device thereof
CN113079062B (en) Resource adjusting method and device, computer equipment and storage medium
CN110377262B (en) Data storage method and device, storage medium and processor
CN109299421A (en) A kind of data-updating method, server, electric terminal
CN113296962B (en) Memory management method, device, equipment and storage medium
CN112422613B (en) Data processing method, data processing platform and computer readable storage medium
CN108279973A (en) A kind of information statistical method, device and electronic equipment
CN115328879A (en) Data storage method and device
CN115617507A (en) Resource processing method, cloud service instance processing method and device
CN117076565A (en) Data storage method and device, nonvolatile storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant