CN113687846B - Method, apparatus, device and readable storage medium for processing data - Google Patents

Method, apparatus, device and readable storage medium for processing data Download PDF

Info

Publication number
CN113687846B
CN113687846B CN202110737890.4A CN202110737890A CN113687846B CN 113687846 B CN113687846 B CN 113687846B CN 202110737890 A CN202110737890 A CN 202110737890A CN 113687846 B CN113687846 B CN 113687846B
Authority
CN
China
Prior art keywords
data
information
state information
target
computing device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110737890.4A
Other languages
Chinese (zh)
Other versions
CN113687846A (en
Inventor
楚振江
宋晓东
冀向阳
侯京超
汪瑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110737890.4A priority Critical patent/CN113687846B/en
Publication of CN113687846A publication Critical patent/CN113687846A/en
Application granted granted Critical
Publication of CN113687846B publication Critical patent/CN113687846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 

Abstract

The present disclosure provides methods, apparatuses, devices and readable storage media for processing data, relating to the field of data processing technology, and in particular to the field of big data and intelligent searching. The specific implementation scheme is as follows: obtaining, at a first server, allocation information of new version data generated by a data source, the allocation information including a plurality of storage addresses of a plurality of data fragments into which the new version data is divided and operation information corresponding to a plurality of ordered operations for the new version data; configuring operation information using a storage address of a data fragment of the plurality of data fragments to generate configured operation information for the data fragment; determining target state information corresponding to a target operation to be completed in the plurality of ordered operations; and sending the configured operation information and the target state information for the data shard to a second server for processing the data shard. By the method, the final state consistency of the data distribution stage can be realized, and the complexity of a data distribution system is reduced.

Description

Method, apparatus, device and readable storage medium for processing data
Technical Field
The present disclosure relates to the field of data processing technology, and in particular, to a method, an apparatus, a device, and a readable storage medium for processing data in the fields of big data, intelligent search, and the like.
Background
With the development of the computer internet, a complex system composed of large-scale computer programs is gradually being used. With the development of such complex systems, the subsystems associated with the systems and the data to be processed are increasing. However, due to the limited capabilities of computing devices, it is now common for the computation of programs in the constituent subsystems and the data for the programs to be stored separately. In the technical scheme of storage separation, a cloud program service running in a cloud server generally comprises a program file and a data file. In the growth of the program service scale, the data information contained in the data file is also gradually increased accordingly. However, there are a number of technical problems to be solved in providing data files for program services
Disclosure of Invention
The present disclosure provides a method, apparatus, device, and storage medium for processing data.
According to a first aspect of the present disclosure, a method for processing data is provided. The method comprises the following steps: obtaining, at a first server, allocation information of new version data generated by a data source, the allocation information including a plurality of storage addresses of a plurality of data fragments into which the new version data is divided and operation information corresponding to a plurality of ordered operations for the new version data; configuring operation information using a storage address of a data fragment of the plurality of data fragments to generate configured operation information for the data fragment; determining target state information corresponding to a target operation to be completed in the plurality of ordered operations; and sending the configured operation information and the target state information for the data shard to a second server for processing the data shard.
According to a second aspect of the present disclosure, a method for processing data is provided. The method comprises the following steps: receiving, at the second server, from the first server, configured operation information and target state information for a data shard from a plurality of data shards generated by partitioning new version data generated by the data source, the configured operation information relating to a plurality of ordered operations for the data shard, the target state corresponding to a target operation to be completed in the plurality of ordered operations; in response to receiving first heartbeat information for the data shard from the first computing device, sending operation information and target state information to the first computing device, the first heartbeat information including a current state for the data shard; and updating the current state information of the first computing device with the current state of the data shard.
According to a third aspect of the present disclosure, a method for processing data is provided. The method comprises the following steps: obtaining, at a first computing device, an identification of a data shard to be processed from a plurality of data shards generated by partitioning new version data generated by a data source; sending a heartbeat message for the data slicing to a second server for receiving operation information and target state information for the data slicing, wherein the operation information relates to a plurality of ordered operations for the data slicing, and the target state corresponds to a target operation to be completed in the plurality of ordered operations; comparing the current state of the data fragment with the target state; and if it is determined that the current state is different from the target state, continuing to perform the plurality of ordered operations to complete the target operation.
According to a fourth aspect of the present disclosure, there is provided an apparatus for processing data. The device comprises: an allocation information acquisition module configured to acquire allocation information of the new version data generated by the data source at the first server, the allocation information including a plurality of storage addresses of a plurality of data fragments into which the new version data is divided and operation information corresponding to a plurality of ordered operations for the new version data; an operation information configuration module configured to configure operation information using a storage address of a data slice of the plurality of data slices to generate configured operation information for the data slice; a target state information determination module configured to determine target state information corresponding to a target operation to be completed among the plurality of ordered operations; and a transmitting module configured to transmit the configured operation information and the target state information for the data shard to the second server for processing the data shard.
According to a fifth aspect of the present disclosure, an apparatus for processing data is provided. The device comprises: an operation information and target state information receiving module configured to receive, at the second server, from the first server, configured operation information and target state information for a data shard from a plurality of data shards generated by dividing new version data generated by the data source, the configured operation information being related to a plurality of ordered operations for the data shard, the target state corresponding to a target operation to be completed in the plurality of ordered operations; a first operation information and target state information transmitting module configured to transmit the operation information and the target state information to the first computing device in response to receiving first heartbeat information for the data shard from the first computing device, the first heartbeat information including a current state for the data shard; and an updating module configured to update current state information of the first computing device with the current state of the data shard.
According to a sixth aspect of the present disclosure, an apparatus for processing data is provided. The device comprises: an identification acquisition module configured to acquire, at a first computing device, an identification of a data shard to be processed from a plurality of data shards generated by partitioning new version data generated by a data source; a heartbeat information sending module configured to send a heartbeat message for the data shard to the second server for receiving operation information for the data shard and target state information, the operation information being related to a plurality of ordered operations for the data shard, the target state corresponding to a target operation to be completed in the plurality of ordered operations; a comparison module configured to compare a current state of the data fragment with a target state; and an operation execution module configured to continue executing the plurality of ordered operations to complete the target operation if the current state is determined to be different from the target state.
According to a seventh aspect of the present disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the first aspect of the present disclosure.
According to an eighth aspect of the present disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the second aspect of the present disclosure.
According to a ninth aspect of the present disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to the third aspect of the present disclosure.
According to a tenth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method according to the first aspect of the present disclosure.
According to an eleventh aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to the second aspect of the present disclosure.
According to a twelfth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method according to the third aspect of the present disclosure.
According to a thirteenth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the first aspect of the present disclosure.
According to a fourteenth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the second aspect of the present disclosure.
According to a fifteenth aspect of the present disclosure there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the third aspect of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 illustrates a schematic diagram of an environment 100 in which various embodiments of the present disclosure can be implemented;
FIG. 2 illustrates a flow chart of a method 200 for processing data according to some embodiments of the present disclosure;
FIG. 3 illustrates a schematic diagram of an example 300 of a slicing mode, according to some embodiments of the present disclosure;
FIG. 4 illustrates a schematic diagram of an example 400 of multiple distribution phases, according to some embodiments of the present disclosure;
FIG. 5 illustrates a flow chart of a method 500 for processing data according to some embodiments of the present disclosure;
FIG. 6 illustrates a schematic diagram of a structure 600 of a hosting service and agent according to some embodiments of the present disclosure;
FIG. 7 illustrates a flow chart of a method 700 for processing data according to some embodiments of the present disclosure;
FIG. 8 illustrates a schematic diagram of an example 800 of multiple ordered operations for data slicing, according to some embodiments of the present disclosure;
FIG. 9 illustrates a schematic diagram of an example 900 of multiple ordered operations for data slicing, according to some embodiments of the disclosure;
FIG. 10 illustrates a schematic diagram of an example 1000 of multiple ordered operations for data slicing, according to some embodiments of the disclosure;
FIG. 11 illustrates a schematic diagram of an example 1100 of migrating data shards for data shards, according to some embodiments of the present disclosure;
FIG. 12 illustrates a schematic diagram of an example system 1200 for processing data in accordance with some embodiments of the disclosure;
fig. 13 illustrates a block diagram of an apparatus 1300 for processing data according to some embodiments of the present disclosure; and
FIG. 14 illustrates a block diagram of an apparatus 1400 for processing data in accordance with some embodiments of the present disclosure; and
FIG. 15 illustrates a block diagram of an apparatus 1500 for processing data according to some embodiments of the present disclosure; and
fig. 16 illustrates a block diagram of a device 1600 capable of implementing various embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In describing embodiments of the present disclosure, the term "comprising" and its like should be taken to be open-ended, i.e., including, but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". The terms "first," "second," and the like, may refer to different or the same object. Other explicit and implicit definitions are also possible below.
As the size of program services increases, the data information contained in the data file increases gradually, and generally exceeds the storage capacity of a physical memory of a computer. At this time, the large data file needs to be cut and fragmented, and one large data file is split into hundreds to thousands of data fragments. Therefore, in a distributed system solution, how to distribute numerous pieces of data into a distributed computer program becomes a technical problem to be solved.
In order to solve the above-mentioned problem, in a conventional scheme, a central control service program is used to notify and push a delivery address on a target computer, and after data is acquired on the computer, the data is put on a file path used by the computer program, so that the data is delivered one by one. However, this approach requires a central push system throughout the data distribution mechanism to distribute and push the data. After the push of the single-point computer fails, only limited re-push can be performed, and finally, the state consistency cannot be achieved. Meanwhile, for the new computer nodes generated in the distribution pushing process, the latest data files cannot be acquired in time, a central pushing system is required to sense the existence of the new nodes and initiate additional new pushing, and obvious defects exist in timeliness and fault tolerance.
Another conventional scheme is to split an oversized data file into a plurality of small data files, and establish corresponding data distribution in a one-to-one correspondence manner, so as to realize a data distribution function of the plurality of small data files, wherein the data distribution of the plurality of small data files is independent. However, when a data situation exceeding the capacity of a single computer occurs, the data needs to be split into small data files, and independent data distribution tasks corresponding to the small files need to be created. At this time, disassembly and new distribution are required by adopting a manual mode. The method has very high maintenance cost for the quantity expansion of the small data files, meanwhile, the data distribution of each small data is independent, the cooperative distribution among the small data files cannot be performed, and the method is inapplicable to the large data files which are strictly consistent.
Yet another conventional approach is to do no obvious splitting in the program and data. When new data is needed to be distributed, the program package is iteratively updated in a mode of changing the computer program package. Updating of the usage data file in the program is achieved by upgrading the computer program one by one. However, for the scheme, small data files after large data splitting are bound to computer program upgrading, iterative change of the computer program is needed to be carried out when data is updated each time, the computer program and the data files cannot be flexibly decoupled, and technical decoupling of the computer program and the data files is difficult to achieve.
In order to solve at least the above-mentioned problems, according to an embodiment of the present disclosure, an improved scheme for processing data is proposed. In this scheme, allocation information of new version data generated by a data source is acquired at a first server, the allocation information including a plurality of storage addresses of a plurality of data slices into which the new version data is divided and operation information corresponding to a plurality of ordered operations for the new version data. The first server then configures operation information using the storage addresses of the data shards of the plurality of data shards to generate configured operation information for the data shards and determines target state information corresponding to a target operation to be completed in the plurality of ordered operations. The first server sends the configured operation information and target state information for the data shard to the second server for processing the data shard. By the method, the final state consistency of the data distribution stage can be realized, the complexity of a data distribution system is obviously reduced, and the stability of data processing is improved.
Fig. 1 illustrates a schematic diagram of an environment 100 in which various embodiments of the present disclosure can be implemented. The example environment 100 includes a server 106, a server 112, and a computing device 114.
The server 106 and the server 112 may be various suitable computing devices, or may be cloud servers, also referred to as cloud computing servers or cloud hosts, which are one host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility in the traditional physical hosts and VPS services ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain. For ease of description, server 106 may be referred to as a first server and server 112 may be referred to as a second server.
The server 106 may be used to monitor whether a version of data produced by the data source has changed. If new version data 102 is monitored, new version data 102 is acquired. For each application that processes data generated by a data source, there is configuration information corresponding to the data source. The configuration information includes at least a number of data fragments into which the new version data is to be partitioned, a plurality of ordered operations for the new version data. Alternatively or additionally, the configuration information also includes a plurality of release phases that release the new version data to the computing device, cluster identifications of the computing device clusters used by the respective release phases, and so on.
The plurality of ordered operations may also be considered a state machine path, including a plurality of operations performed in order for new versions of data. The plurality of ordered operations includes at least a download and a load operation. In some embodiments the plurality of ordered operations further includes a configuration operation, such as creating a file directory or the like. Alternatively or additionally, the plurality of ordered operations may also include subsequent operations after the load operation, such as processing operations on previously accepted data. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
The server 106 obtains the configuration information. The new version data 102 is then divided into a plurality of data slices 104 based on the number of data slices. The server 106 may generate the allocation information based on the configuration information and information related to the data shards 104. The related information of the data fragment includes, for example, a storage address of the data fragment, a data version, and the like. The allocation information may include a storage address of the data shard, a destination address of the data shard in the destination computing device, a plurality of release phases to release the new version data to the computing device, a cluster identification of a computing device cluster used by each release phase, a plurality of ordered operations for the new version data, and so on.
The server 106 generates, for each data slice, operation information 108 corresponding to a plurality of ordered operations for each data slice. For example, the memory address of each data fragment is used to configure the download address in the download operation in the plurality of ordered operations for that data fragment. The server 106 may also determine target operations to perform for these data slices to form target state information 110. Alternatively or additionally, the target state information 110 is generated using the identification of the computing device cluster and the target operation for the different publication phase, e.g., the identification of the computing device cluster as a field in the target state information.
The processes of generating allocation information, configuring a plurality of ordered operations, and generating target state information in fig. 1 are performed within server 106, which is merely an example and not a specific limitation of the present disclosure, and the above processes may be implemented on different servers, and may be set by those skilled in the art as needed.
The server 112 receives the operation information 108 and the target state information 110. The server 112, upon receiving the heartbeat information from the computing device 114, issues the operation information and the target state information for the data fragments processed on the computing device to the computing device 114. In some embodiments, the heartbeat information includes an identifier of the data fragment, and the server 112 searches for corresponding operation information according to the identifier of the data fragment, and then issues the operation information and the target state information for the data fragment. Alternatively or additionally, the heartbeat information includes a cluster identification of the device cluster to which the computing device 114 belongs, and the server 112 matches the cluster identification of the device cluster to which the computing device 114 belongs with the cluster identification in the target state information. The operational information and the target state information are only issued to the computing device 114 when the two match, thus ensuring that only the computing device corresponding to each issue phase executes the application that processes the data shards. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
After acquiring the identity of the data shard to be processed, computing device 114 sends heartbeat information including the identity of the data shard to server 112 to acquire operation information 108 and target state information 110 for the shard from server 112. After the computing device 114 has acquired the operation information 108 and the target state information 110, the target state in the target state information 110 is compared to the current state within the computing device 114. If the target state does not match the current state, indicating that the currently performed operation has not yet been performed to the target operation, the computing device continues to perform an operation of the plurality of ordered operations. If there is a match, indicating that the operation performed in computing device 114 is already the target operation, no subsequent operations need to be performed.
Computing device 114 includes, but is not limited to, a personal computer, a server computer, a hand-held or laptop device, a mobile device such as a mobile phone, a Personal Digital Assistant (PDA), a media player, etc., a multiprocessor system, a consumer electronics, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and the like. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.
Fig. 1 illustrates one server 112 and one computing device 114, which are examples only and not a specific limitation of the present disclosure, and multiple servers 112 and multiple computing devices may also be included in the environment 100. Where multiple servers 112 receive operational information and target state information from one server 106 and multiple computing devices 114 receive data information from one server 112.
By the method, the final state consistency of the data distribution stage can be realized, the complexity of a data distribution system is obviously reduced, and the stability of data processing is improved.
An environment 100 in which the various embodiments of the present disclosure can be implemented is described above in connection with fig. 1. A flowchart of a method 200 for processing data according to some embodiments of the present disclosure is described below in conjunction with fig. 2. The method 200 of fig. 2 may be performed by the server 106 of fig. 1 or any suitable computing device.
At block 202, allocation information of a new version of data generated by a data source is obtained at a first server, the allocation information including a plurality of storage addresses of a plurality of data slices into which the new version of data is partitioned and operation information corresponding to a plurality of ordered operations for the new version of data. As shown in fig. 1, the server 106 obtains allocation information for new versions of data generated by the data sources. In some embodiments, the server 106 receives allocation information from other computing devices. In some embodiments, server 106 generates allocation information. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
In some embodiments, the server 106 monitors whether data generated by the data source has changed, such as whether data generated by the data source has changed or is updated, etc. If the data changes, the version of the data is considered to have changed. In some embodiments, the data version information is represented by various suitable identification information that is time-dependent or user-set. When the data changes, new data version information is generated. The server examines the version information to determine if the version of the data has changed. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof. If it is determined that the version of the data has changed, the new version of the data is partitioned into a plurality of data slices 104. In this way, the data slicing can be rapidly divided.
In some embodiments, the server 106 obtains a predetermined number of data slices to be generated. For example, the server 106 obtains the predetermined number from configuration information for data of the data source. The server 106 then divides the new version data according to a predetermined number. In this way, the data can be divided quickly. By dividing the new version data 102 into a plurality of data fragments 104, the processing of the data can be quickened, and the data processing efficiency is improved.
Fig. 3 illustrates a schematic diagram of an example 300 of a slicing mode, according to some embodiments of the disclosure. In FIG. 3, in a conventional single schema, different versions of data may generate one data file, such as data version 1 302, data version 2 304, and data version 3 306. In embodiments of the present disclosure, each version of data may be partitioned into a predetermined number of data slices. In some embodiments, data version 1 may be partitioned into three data slices: fragments 308, 310, and 312. Likewise, subsequent versions of data are also partitioned into the same number of data slices. In some embodiments, the data may be partitioned into any suitable number of data slices. The foregoing is merely an example and is not a specific description of the present disclosure.
Returning now to FIG. 2, and following description, in some embodiments, server 106 generates allocation information based on the address and version of the data shards, in combination with configuration information corresponding to the data sources. The configuration information includes at least a number of data fragments into which the new version data is to be partitioned, a number of release phases to release the new version data 102 to the computing device, cluster identifications of the computing device clusters used by each release phase, and a number of ordered operations for the new version data. The allocation information is generated based on the above information. In some embodiments, the server 106 receives the split configuration information from other servers thereof. The above examples are merely for the purpose of describing the present disclosure and are not intended to be a specific limitation thereof.
At block 204, the operation information is configured using the storage addresses of the data shards of the plurality of data shards to generate configured operation information for the data shards. For example, the server 106 in fig. 1 configures operation information for each data slice with a storage address of the data slice. Alternatively or additionally, the operation information is configured with other information in the allocation information. For example, the storage address of the data shard in the computing device may also be configured, etc.
In some embodiments, server 106 obtains the storage address for the data fragment from the allocation information. The server 106 then correlates the storage address into a portion of the operation information corresponding to the download operation of the plurality of ordered operations to generate configured operation information for the data shard. In this way, the memory address of the data slice can be quickly determined.
At block 206, target state information corresponding to a target operation to be completed in the plurality of ordered operations is determined. For example, the server 106 in FIG. 1 determines target state information corresponding to a target operation to be completed in a plurality of ordered operations.
In some embodiments, the data publishing stage has only one stage, the target state information for all computing devices running applications that process the data, in which target operations to be completed are set.
In some embodiments, the data release includes a plurality of phases, wherein the allocation information further includes a target cluster identification of the target computing device cluster corresponding to each of the plurality of release phases of the new version data. At this point, for each publication phase, server 106 determines a target cluster identification for the computing device cluster corresponding to the publication phase. The server may also determine a target operation to complete among the plurality of ordered operations. The server then generates target state information based on the target operation and the target cluster identifier, e.g., the target state information includes an operation identifier corresponding to the target operation and the target cluster identifier. In this way, target state information for computing devices in different phases may be quickly generated. The following describes a plurality of distribution phases in connection with fig. 4, where fig. 4 shows a schematic diagram of an example 400 of a plurality of distribution phases in accordance with some embodiments of the present disclosure.
In FIG. 4, the data publication includes 4 phases, phase S0 402, phase S1 404, phase S2 406, and phase S3 408. Each stage is executed by a selected different computing device to process the application of data. For example, in stage S1 404, federated clusters Union_ig1 and Union_ig2 are selected to run applications that process data shards, each federated cluster including multiple computing device clusters, e.g., federated cluster Union_ig1 includes computing device clusters ig1, ig2, ig3, and ig4. Each computing device cluster includes a plurality of computing device instances. Thus, for each publishing stage, the server 106 places an identification of a different computing device cluster in the target state information for use in issuing the operating information and the target state information to the computing devices in the target cluster.
Returning to fig. 2, described next, at block 208, the configured operational information and target state information for the data shard is sent to a second server for processing the data shard. For example, as shown in fig. 1, server 106 sends configured operational information and target state information for a data chunk to server 112 for processing the data chunk.
In some embodiments, the server 106 may also obtain current state information for the plurality of computing devices corresponding to the target cluster identification, the current state information indicating an operation of the plurality of ordered operations that the plurality of computing devices have completed. The server 106 then determines whether the current state information matches the target state information. If it is determined that the current state information matches the target state information, the target state information is updated to correspond to a next operation of the target operation. The server 106 then sends the updated target state to the server 112 for completion of the next operation. If the two pieces of state information do not match, indicating that the target operation is not executed by the other computing devices, acquiring the current state information of the plurality of computing devices at preset time intervals to detect that all the computing devices wait for completing the target operation. In this way, consistency in the data processing process can be achieved.
As an example, as shown in fig. 4, in the publishing stage S1, the server 106 queries the state of each computing device instance at regular time, and if the state of each instance in the computing device cluster matches the target state, the state of the corresponding cluster is set to the state that completes the target operation, also referred to as state reclamation. If the state of each cluster of computing devices is the state that completes the target operation, then the state of the federated cluster may be set to the state that completes the target operation, which may also be referred to as state recursion, and then execution of the target state is advanced in parallel through the federated cluster. After the states of all computing devices at this stage are target states, the server adjusts the target operation in the target state information to the next operation in the plurality of ordered operations.
In some embodiments, if the server 106 receives the release information of the updated version of the data again, the release operation of the updated version of the data is performed after the ordered operation of the current data version is completed.
By the method, the final state consistency of the data distribution stage can be realized, the complexity of a data distribution system is obviously reduced, and the stability of data processing is improved.
Flowcharts of a method 200 for processing data according to various embodiments of the present disclosure are described above in connection with fig. 2-4. A flowchart of a method 500 for processing data according to some embodiments of the present disclosure is described below in conjunction with fig. 5. Method 500 in fig. 5 may be performed by server 112 in fig. 1 or any suitable computing device.
At block 502, configured operation information and target state information for a data shard from a plurality of data shards generated by partitioning new version data generated by a data source is received from a first server at a second server, the configured operation information relating to a plurality of ordered operations for the data shard, the target state corresponding to a target operation to be completed in the plurality of ordered operations. For example, server 112 in fig. 1 receives configured operational information and target state information for a data slice from server 106. The server 112 receives and stores the operation information and the target state information for data distribution.
At block 504, it is determined whether first heartbeat information for the data slice is received from the first computing device. For example, the server 112 monitors whether heartbeat information is received from the computing device 114. If first heartbeat information for the data shard is received from the first computing device 114, at block 506, the operation information and the target state information are sent to the first computing device, the first heartbeat information including a current state for the data shard. For example, the heartbeat information sent by the computing device 114 to the server 112 includes a current state that indicates an operation for data slicing that has been completed on the computing device 114.
In some embodiments, the first heartbeat information further includes a reference cluster identification of a reference computing device cluster in which the first computing device is located, and the target state information includes a target cluster identification of a target computing device cluster corresponding to each of a plurality of release phases of the new version data. The server 112, when sending the operation information and the target state information to the first computing device 114, needs to first match the reference cluster identity with the target cluster identity. If it is determined that the reference cluster identity matches the target cluster identity, indicating that the first computing device is the computing device selected by the publication phase, then operational information and target state information are sent to the first computing device. If there is a mismatch, it is indicated that the first computing device is not the computing device selected during the publication phase. Therefore, it is not necessary to transmit the operation information and the target state information. In this way, the operation of each computing device can be accurately controlled.
By utilizing the heartbeat information to acquire the operation information and the target state information, a declarative delivery mechanism is essentially established, and the consistency of data processing is ensured. The declarative delivery mechanism is implemented by a hosting service on server 112 and a proxy service running on computing device 114, the hosting service and the proxy service being in a one-to-many relationship. As shown in fig. 6, a plurality of proxy services may be in network communication with the host service, synchronizing heartbeat information and retrieving the latest version of the data file. The main control service maintains the latest version of the data file, and simultaneously maintains the current stage of the latest version for query and acquisition of the proxy service. Specifically, the hosting service 602 running on the second server includes a plurality of ordered operations, corresponding to a plurality of phases. Multiple agents associated with hosting service 602 run on multiple different computing devices, e.g., agents 604, 606, and 608 run on three computing devices. Each agent is responsible for managing operations for multiple different phases of the data slices 610, 612, and 614.
Returning to FIG. 5, next, at block 508, the current state information of the first computing device is updated with the current state of the data slice. For example, the server 112 in fig. 1 updates the current state information of the first computing device with the current state of the data shards, e.g., stores the current state in a list of state information of the computing devices.
In some embodiments, if after the update of the target state information in the server 106, the server 112 receives the updated target state information from the server 106, wherein the updated target state information corresponds to a next operation of the target operation. Then, upon receiving the second heartbeat information for the data slice, the server 112 sends the operation information and the updated target state information to the first computing device for completion of the next operation. In this way, consistency of operation can be achieved.
In some embodiments, server 112 may also receive third heartbeat information from first computing device 114 including an identification of the transferred data shard, look up operation information and target state information for the transferred data shard in first state information for the partial data shard associated with the reference computing device cluster, and transfer the transferred data shard from the second computing device to the first computing device. If the operation information and the target state of the transferred data patch are not found in the first state information, the operation information and the target state of the transferred data patch are found in the second state information corresponding to the plurality of data patches. The server 112 then sends the operation information and the target state of the found transferred data fragment to the first computing device 114. In this way, transfer of data fragments can be achieved quickly.
The method can realize the final state consistency of the data distribution stage, obviously reduce the complexity of a data distribution system and improve the stability of data processing
A method 500 for processing data of various embodiments of the present disclosure is described above in connection with fig. 5-6. A flowchart of a method 700 for processing data according to some embodiments of the present disclosure is described below in conjunction with fig. 7. Method 700 in fig. 7 may be performed by computing device 114 in fig. 1 or any suitable computing device.
At block 702, an identification of a data shard to process from a plurality of data shards generated by partitioning new versions of data generated by a data source is obtained at a first computing device. For example, computing device 114 obtains an identification of data slice 104 to process.
In some examples, an application control system coupled to an application running on computing device 114 may obtain information for all data slices and an identification of computing devices that may run applications that process the data slices, and then assign each computing device a processable data slice. Thus, computing device 114 may obtain from the application control system an identification of the data fragments it is to process.
At block 704, a heartbeat message for the data shard is sent to the second server for receiving operation information for the data shard, the operation information relating to a plurality of ordered operations for the data shard, and target state information, the target state corresponding to a target operation to be completed in the plurality of ordered operations. For example, computing device 114 sends a heartbeat message for a data chunk to server 112 for receiving operational information and target state information for the data chunk.
The plurality of ordered operations for each data slice includes a plurality of operations. The plurality of ordered operations includes at least a download operation and a load operation, both of which are fixed operations that process the data slices. The user can set a plurality of custom operation stages before and after the two operations, for example, the custom operation before downloading, for example, the operation of creating a file directory and the like, the custom operation is added after the loading operation, and all the operations form the operation information of one data fragment. The proxy service reports the current status of the different data units to the master control service on the second server 112 via heartbeat information, and the master control service issues operation information and a target status according to the received heartbeat information from the proxy service. Wherein fig. 8 illustrates a schematic diagram of an example 800 of multiple ordered operations for data slicing, according to some embodiments of the present disclosure.
As shown in fig. 8. Data shard 1 802 is a data shard generated by a first data source that is processed by computing device 114 and data shard 2 806 is a data shard generated by a second data source that is processed on computing device 114. There are multiple ordered operations 804 for data slice 1 802 and multiple ordered operations 808 for data slice 2 806, with phase 0, phase 3, and phase 4 corresponding to other operations that are user-defined. The proxy service running on computing device 114 may send the state information of data fragment 1 to the hosting service of server 112 via heartbeat information, such as state information corresponding to the current operation that computing device 114 is running to completion. Likewise, for data fragment 2 806, status information corresponding to the current operation completed by computing device 114 for that data fragment may also be sent to the hosting service in server 112. Then receives operation information and target state information for the data slice 1 and operation information and target state information for the data slice 2 from the main control service. The computing device 114 processes the data fragments according to the received operation information and the target state information until the operation corresponding to the target state information is completed.
FIG. 8 above illustrates a schematic diagram of multiple sequential operations for data slices from different data sources located on the same computing device, and multiple sequential operations for one data slice are described below in connection with FIG. 9.
As shown in fig. 9, a proxy service on computing device 114 obtains operational information and target state information for data shards 902. The operation information 904 includes a plurality of ordered operations, where phase 0, phase 3, and phase 4 correspond to a plurality of custom operations. The computing device 114 compares the target state information with the current state of the data fragment on the computing device 114, and if the states are different, the data fragment needs to be continuously subjected to subsequent operations, and finally the current target state of the data file is achieved. For the backward operation, failure post-processing is carried out according to the definition retry times and the operation timeout time of each stage, so that the smooth achievement of the target operation is realized to the greatest extent, and the final consistency of the version stage of the data file is realized. Specifically, for each operation, if the operation is unsuccessful, a retry may be performed, and if the number of retries specified is unsuccessful or exceeds a certain time limit, the feedback is unsuccessful for the processing of the data slice. For example, the load phase may retry 20 times and complete within 1200s, if not implemented, indicating that the operation failed. Thus, the processing for data slicing can be made to be completed as accurately as possible by this mechanism. Control of all data slices generated by the data source can then be achieved by controlling the execution of the data slices.
When a new computer or container is added into the service cluster of the computer program, the latest target stage is acquired according to the interaction between the started proxy service and the main control service, the target stage of data distribution is completed in sequence, and finally, the consistency of data distribution with the services on other computing devices in the computing device cluster is achieved.
In some embodiments, the heartbeat message includes a determination of a current state for the data slice. The computing device 114 generates heartbeat information using the identification of the data shards and the current state. In this way, quasi-deterministic heartbeat information can be generated.
In some embodiments, the heartbeat information is sent periodically for updating current state information of the first computing device processing the shards, e.g., once every 5 s. In this way, the state of the computing device may be quickly updated.
At block 706, the current state of the data slice is compared to the target state. For example, in FIG. 1 computing device 114 compares the current state of the data slice with the obtained target state. At block 708, if the current state is determined to be different from the target state, execution of the plurality of ordered operations continues to complete the target operation. If the current state is the same as the target state, indicating that the predetermined operation is performed, no further subsequent operation is required.
As shown in fig. 10, large data 1002 is partitioned into multiple data slices, such as data slice 1 1004, data slice 2 1006, and data slice 3 1008. And the main control service simultaneously carries out target change of the next stage state on the plurality of data units after the current target operation is completed by all the data fragments to be collected, so that the operations of loading and the like of the plurality of data units of the distributed data are simultaneously carried out by the computer program.
In some embodiments, wherein the plurality of ordered operations includes a download operation, the download operation includes a storage address of the data chunk. Computing device 114 also performs a download operation to retrieve the data fragments from the memory addresses. In this way, the data fragments can be quickly acquired from the storage locations of the data fragments.
In some embodiments, after the data fragments have been allocated to each computing device running a program that processes the data fragments, the processing of the data fragments on different computing devices is dynamically adjusted according to the processing conditions of the computing devices. If it is determined that the data shard assigned to the computing device is to be executed by an application on a third computing device, then the identity of the data shard needs to be deleted from the description file of the program run by the computing device so that the data shard is no longer processed. By the method, data transfer can be realized rapidly.
In some embodiments, if the computing device is more powerful in processing power, no data fragmentation processing has been done, the data fragmentation assigned to other computing devices may be adjusted to the computing device processing. For example, a program control system that manages the running of programs for the computing devices will adjust the data shards to be processed by the computing devices 114. If it is determined that the identification of the transferred data fragment is received, heartbeat information including the identification of the transferred data fragment is transmitted to the second server for acquiring the operation information and the target state for the transferred data fragment. The computing device 114 processes the transferred data shard based on the operational information and the target state of the transferred data shard. By the method, the process of transferring the data fragments can be rapidly realized.
In one example, computing device 114 obtains an identification of the transferred data shard and then adds the identification to a data description file of an application running on computing device 114 that processes the data shard. The agent then obtains the identification from the data description file and then looks up the corresponding operation information and target state information by sending heartbeat information including the identification of the transferred data fragment to the host service.
As shown in fig. 11, an application 1114 that processes data slices is running in computing device 1 1110 and an application 1128 that processes data slices is running in counting device 2. Agents 1118 for application 1114 and agents 1124 for application 1128 communicate with hosting service 1102. Data shard 1 1120 and data shard 2 1122 are assigned to application 1114 processing in computing device 1 and data shard 3 1130 is assigned to application 1128 processing in computing device 2. In the computer application 1114, the identification of the sharded data files used by the program is recorded in the data description files 1116 and 1126 for that application.
When the program application control system changes the data shard 2 1122 from being processed by the application 1114 to being processed by the application 1128, a new shard description is first added to the data description file 1126 for the application 1128, at which time the data description file contents of the application 1128 are changed to data shard 2 and data shard 3. The agent 1124 of the application 1128 uses the data description information to communicate with the hosting service 1102 in a heartbeat, where the hosting service first looks up the information corresponding to data burst 2 in the key information ig_key2 1106 of the computing device cluster corresponding to the distribution version information by a first level query, and as a result, does not query the operating information and the target state information of data burst 2, because they are assigned to the computing device cluster whose key information is ig_key1 1104. At this time, the master control service performs a second-level query, searches the storage information field_key 1106 of the dictionary information dimension including the global information, and can query the current version target stage information and the operation information of the data distribution of the data fragment 2, thereby issuing the information of the correct data fragment 2 to the proxy service. After the proxy service obtains the operation information and the target state information of the data fragment, the proxy service can finish the operation on the data fragment. Alternatively or additionally, the identification of data shard 2 1122 is deleted in the data description file 1116 for the application 1114.
By the method, the final state consistency of the data distribution stage can be realized, the complexity of a data distribution system is obviously reduced, and the stability of data processing is improved.
A method 700 for processing data of various embodiments of the present disclosure is described above in connection with fig. 7-11. A schematic diagram of a system 1200 for processing data according to some embodiments of the present disclosure is described below in connection with fig. 12. As shown in fig. 12, the system includes a build module 1204. The construction module 1204 is responsible for updating and checking the data source, periodically inspecting whether each data file has the latest content change, comparing and judging historical version information, if the data file is checked to find the new data content, generating new version data of the data file, assembling the new version data with template configuration information of the data distribution, generating a distributed data distribution to be executed once, and delivering the distributed data distribution to the state machine driver 1206 for execution. The state machine driver 1206 is responsible for recommending the state of distributed data distribution, performing different driver control according to a single mode and a slicing mode, and performing coordination on a distribution target machine in each stage to realize state coordination and progress control of the sliced data. After the cooperation of the distributed data is realized, the state of the current stage is recycled and checked, and the execution is pushed to the next stage until the whole data distribution is finished.
The system 1200 also includes a host service module 1208. The main control service module 1208 receives and updates the version information of the data file for each operation, maintains the current version information and maintains communication with the proxy service, receives heartbeat information for the proxy service, and delivers operation information and target state information to the proxy service. And meanwhile, summarizing the data distribution state of the computing equipment, and recording the instance version information of the program.
The system 1200 also includes a proxy module 1212. The proxy module 1212 is responsible for declarative data distribution execution, status updating of the data version based on information taken from the hosting service and the final status. And collecting the environment information and the current data state of the computer where the proxy service is located, determining the data reported and collected by the proxy service, and obtaining the final data issued by the main control service. Likewise, the hosting service 1208 may also process other data 1210 and 1214.
Therefore, for the distribution process of the data of the new version, the data source end firstly generates the new content, the construction module senses the data, the data is assembled and decided with the distribution module, and the data distribution is generated for one time to initiate execution. Each stage of data distribution is controlled by a state driver and sent to a main control service module. And when receiving the heartbeat report information of the proxy service, the main control service module notifies and issues the new version information of the data file. And carrying out consistency catch-up and data distribution according to the version information and the final state of the data file in the proxy service.
Fig. 13 shows a schematic block diagram of an apparatus 1300 for processing data according to an embodiment of the disclosure. As shown in fig. 13, the apparatus 1300 includes: an allocation information acquisition module 1302 configured to acquire allocation information of the new version data generated by the data source at the first server, the allocation information including a plurality of storage addresses of a plurality of data fragments into which the new version data is divided and operation information corresponding to a plurality of ordered operations for the new version data; an operation information configuration module 1304 configured to configure the operation information using the storage addresses of the data slices of the plurality of data slices to generate configured operation information for the data slices; a target state information determination module 1306 configured to determine target state information corresponding to a target operation to be completed among the plurality of ordered operations; and a transmitting module 1308 configured to transmit the configured operation information and the target state information for the data shard to the second server for processing the data shard.
In some embodiments, wherein the operation information configuration module 1304 comprises: a storage address acquisition module configured to acquire a storage address for a data fragment from the allocation information; and a storage address association module configured to associate a storage address into part of information in the operation information to generate configured operation information for the data fragment, the part of information corresponding to a download operation among the plurality of ordered operations.
In some embodiments, wherein the allocation information further comprises a target cluster identification of a target computing device cluster corresponding to each of a plurality of release phases of the new version data, wherein the target state information determination module 1306 comprises: the target cluster identification determining module is configured to determine a target cluster identification corresponding to the release phase aiming at the release phase; a target operation determination module configured to determine a target operation to be completed among the plurality of ordered operations; and a target state information generation module configured to generate target state information based on the target operation and the target cluster identity.
In some embodiments, the apparatus 1300 further comprises: a current state information acquisition module configured to acquire current state information of a plurality of computing devices corresponding to a target cluster identifier, the current state information indicating an operation of a plurality of ordered operations that the plurality of computing devices have completed; a match determination module configured to determine whether the current state information matches the target state information; an updating module configured to update the target state information to correspond to a next operation of the target operation if it is determined that the current state information matches the target state information; and an update transmission module configured to transmit the updated target state to the second server for completing the next operation.
In some embodiments, the apparatus 1300 further comprises: a monitoring module configured to monitor whether a version of data generated by the data source has changed; and a first partitioning module configured to partition the new version data into a plurality of data slices if it is determined that the version of the data has changed.
In some embodiments, wherein the first partitioning module comprises: a predetermined data acquisition module configured to acquire a predetermined number of data slices to be generated; and a second dividing module configured to divide the new version data based on a predetermined number.
Fig. 14 shows a schematic block diagram of an apparatus 1400 for processing data according to an embodiment of the disclosure. As shown in fig. 14, the apparatus 1400 includes: an operation information and target state information receiving module 1402 configured to receive, at the second server, from the first server, configured operation information and target state information for a data shard from a plurality of data shards generated by dividing new version data generated by the data source, the configured operation information being related to a plurality of ordered operations for the data shard, the target state corresponding to a target operation to be completed in the plurality of ordered operations; a first operation information and target state information sending module 1404 configured to send operation information and target state information to the first computing device in response to receiving first heartbeat information for the data shard from the first computing device, the first heartbeat information including a current state for the data shard; and an update module 1406 configured to update current state information of the first computing device with the current state of the data shards.
In some embodiments, the apparatus 1400 further comprises: an updated target state information receiving module configured to receive updated target state information from the first server, the updated target state information corresponding to a next operation of the target operation; and an operation information and updated target state information transmitting module configured to transmit the operation information and the updated target state information to the first computing device for completing a next operation in response to receiving the second heartbeat information for the data slice.
In some embodiments, the first heartbeat information further includes a reference cluster identity of a reference computing device cluster in which the first computing device is located, and the target state information includes a target cluster identity of a target computing device cluster corresponding to each of a plurality of release phases of the new version data; wherein the first operation information and target state information sending module 1404 includes: the matching module is configured to match the reference cluster identifier with the target cluster identifier; and a second operation information and target state information transmitting module configured to transmit the operation information and the target state information to the first computing device if it is determined that the reference cluster identity matches the target cluster identity.
In some embodiments, the apparatus 1400 further comprises: a first lookup module configured to, in response to receiving third heartbeat information from the first computing device including an identification of a transferred data fragment, lookup operation information and target state information of the transferred data fragment in first state information of a portion of the data fragments associated with the reference computing device cluster, the transferred data fragment being transferred from the second computing device onto the first computing device; a second search module configured to search for the operation information and the target state of the transferred data patch in second state information corresponding to the plurality of data patches if the operation information and the target state of the transferred data patch are not found in the first state information; and a search result transmitting module configured to transmit the searched operation information and the target state of the transfer data fragment to the first computing device.
Fig. 15 shows a schematic block diagram of an apparatus 1500 for processing data according to an embodiment of the disclosure. As shown in fig. 15, the apparatus 1500 includes: an identification acquisition module 1502 configured to acquire, at a first computing device, an identification of a data shard to be processed from a plurality of data shards generated by partitioning new version data generated by a data source; a heartbeat information sending module 1504 configured to send a heartbeat message for the data shard to the second server for receiving operation information for the data shard and target state information, the operation information being related to a plurality of ordered operations for the data shard, the target state corresponding to a target operation to be completed among the plurality of ordered operations; a comparison module 1506 configured to compare the current state of the data slice with the target state; and an operation execution module 1508 configured to continue executing the plurality of ordered operations to complete the target operation if the current state is determined to be different from the target state.
In some embodiments, the heartbeat information sending module 1504 includes: a current state determination module configured to determine a current state for the data slice; and a heartbeat information generating module configured to generate heartbeat information based on the identification of the data fragments and the current state.
In some embodiments, wherein the heartbeat information sending module 1504 further includes: a periodicity module configured to periodically send heartbeat information for updating current state information of the first computing device processing the shards.
In some embodiments, wherein the plurality of ordered operations includes a download operation, the download operation including a storage address of the data chunk; the apparatus further includes a download operation execution module configured to execute a download operation to obtain the data fragments from the storage address.
In some embodiments, the apparatus 1500 further comprises: and a deletion module configured to delete the identification of the data shard if it is determined that the data shard is to be executed by an application on the third computing device.
In some embodiments, the apparatus 1500 further comprises: a transmission module configured to, if it is determined that the identification of the transferred data patch is received, transmit heartbeat information including the identification of the transferred data patch to the second server for obtaining operation information and a target state for the transferred data patch, the transferred data patch being transferred from the second computing device onto the first computing device; and a transfer data shard processing module configured to process the transfer data shards based on the operation information and the target state of the transfer data shards.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 16 illustrates a schematic block diagram of an example electronic device 1600 that can be used to implement embodiments of the present disclosure. The example electronic device 1600 may be used to implement the server 106, the server 112, and the computing device 114 of fig. 1. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 16, the apparatus 1600 includes a computing unit 1601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1602 or a computer program loaded from a storage unit 1608 into a Random Access Memory (RAM) 1603. In RAM 1603, various programs and data required for operation of device 1600 may also be stored. The computing unit 1601, ROM 1602, and RAM 1603 are connected to each other by a bus 1604. An input/output (I/O) interface 1605 is also connected to the bus 1604.
Various components in device 1600 are connected to I/O interface 1605, including: an input unit 1606 such as a keyboard, a mouse, and the like; an output unit 1607 such as various types of displays, speakers, and the like; a storage unit 1608, such as a magnetic disk, an optical disk, or the like; and a communication unit 1609, such as a network card, modem, wireless communication transceiver, or the like. Communication unit 1609 allows device 1600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The computing unit 1601 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of computing unit 1601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1601 performs the various methods and processes described above, such as methods 200, 500, and 700. For example, in some embodiments, the methods 200, 500, and 700 may be implemented as computer software programs tangibly embodied on a machine-readable medium, such as the storage unit 1608. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1600 via ROM 1602 and/or communication unit 1609. One or more of the steps of methods 200, 500, and 700 described above may be performed when the computer program is loaded into RAM 1603 and executed by computing unit 1601. Alternatively, in other embodiments, computing unit 1601 may be configured to perform methods 200, 500, and 700 by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (30)

1. A method for processing data, comprising:
obtaining, at a first server, allocation information of a new version of data generated by a data source, the allocation information including a plurality of storage addresses of a plurality of data slices into which the new version of data is partitioned and operation information corresponding to a plurality of ordered operations for the new version of data, and a target cluster identification of a target computing device cluster corresponding to each of a plurality of release phases of the new version of data;
Configuring the operation information using a storage address of a data shard of the plurality of data shards to generate configured operation information for the data shard;
determining target state information corresponding to a target operation to be completed in the plurality of ordered operations; and
sending the configured operational information and the target state information for the data shard to a second server for processing the data shard;
wherein determining the target state information comprises:
determining a target cluster identifier corresponding to each release phase;
determining a target operation to be completed in the plurality of ordered operations; and
and generating the target state information based on the target operation and the target cluster identification.
2. The method of claim 1, wherein configuring the operation information comprises:
acquiring a storage address for the data fragment from the allocation information; and
the memory address is associated into partial information in the operation information to generate the configured operation information for the data slice, the partial information corresponding to a download operation of the plurality of ordered operations.
3. The method of claim 1, further comprising:
obtaining current state information of a plurality of computing devices corresponding to the target cluster identification, the current state information indicating operations of the plurality of ordered operations that the plurality of computing devices have completed;
determining whether the current state information matches the target state information;
if the current state information is determined to be matched with the target state information, updating the target state information to correspond to the next operation of the target operation; and
the updated target state information is sent to a second server for completion of the next operation.
4. The method of claim 1, further comprising:
monitoring whether a version of data generated by the data source changes; and
and if the version of the data is determined to change, dividing the new version of data into the plurality of data fragments.
5. The method of claim 4, wherein dividing the new version of data into a plurality of data slices comprises:
acquiring a predetermined number of data fragments to be generated; and
the new version data is partitioned based on the predetermined number.
6. A method for processing data, comprising:
receiving, at a second server, from a first server, configured operation information and target state information for a data shard from a plurality of data shards generated by partitioning new version data generated by a data source, the configured operation information relating to a plurality of ordered operations for the data shard, the target state information corresponding to a target operation to be completed among the plurality of ordered operations;
in response to receiving first heartbeat information for the data shard from a first computing device, sending the operation information and the target state information to the first computing device, the first heartbeat information including a reference cluster identity for a current state of the data shard and a reference computing device cluster in which the first computing device is located, the target state information including a target cluster identity of a target computing device cluster corresponding to each of a plurality of release phases of the new version data; and
updating current state information of the first computing device with the current state of the data slice;
Wherein sending the operation information and the target state information to the first computing device comprises:
matching the reference cluster identifier with the target cluster identifier; and
and if the reference cluster identity is determined to match the target cluster identity, transmitting the operation information and the target state information to the first computing device.
7. The method of claim 6, further comprising:
receiving updated target state information from the first server, the updated target state information corresponding to a next operation of the target operation; and
in response to receiving second heartbeat information for the data slice, the operation information and the updated target state information are sent to the first computing device for completion of the next operation.
8. The method of claim 6, further comprising:
in response to receiving third heartbeat information from the first computing device including an identification of a transferred data shard, looking up operation information and target state information of the transferred data shard in first state information of partial data shards associated with the reference computing device cluster, the transferred data shard transferred from a second computing device onto the first computing device;
If the operation information and the target state information of the transferred data fragments are not found in the first state information, the operation information and the target state information of the transferred data fragments are found in the second state information corresponding to the plurality of data fragments; and
and sending the searched operation information and target state information of the transferred data fragment to the first computing equipment.
9. A method for processing data, comprising:
obtaining, at a first computing device, an identification of a data shard to be processed from a plurality of data shards generated by partitioning new version data generated by a data source;
sending a heartbeat message for the data shard to a second server for receiving operation information and target state information for the data shard, the operation information relating to a plurality of ordered operations for the data shard, the target state information corresponding to a target operation to be completed in the plurality of ordered operations;
wherein sending the heartbeat message includes:
determining a current state for the data slice;
generating the heartbeat information based on the identification of the data fragment and the current state; and
Periodically transmitting the heartbeat information for updating current state information of a first computing device processing the shard;
comparing the current state of the data fragment with the target state information; and
if it is determined that the current state is different from the target state information, continuing to perform the plurality of ordered operations to complete the target operation.
10. The method of claim 9, wherein the plurality of ordered operations comprises a download operation comprising a storage address of the data chunk; the method further comprises the steps of:
and executing the downloading operation to acquire the data fragments from the storage address.
11. The method of claim 9, further comprising:
if it is determined that the data shard is to be executed by an application on a third computing device, the identification of the data shard is deleted.
12. The method of claim 9, further comprising:
if it is determined that the identification of the transferred data fragment is received, sending heartbeat information including the identification of the transferred data fragment to a second server for obtaining operation information and target state information for the transferred data fragment, the transferred data fragment being transferred from a second computing device to the first computing device; and
The transferred data shard is processed based on the operation information and the target state information of the transferred data shard.
13. An apparatus for processing data, comprising:
an allocation information acquisition module configured to acquire allocation information of a new version of data generated by a data source at a first server, the allocation information including a plurality of storage addresses of a plurality of data slices into which the new version of data is divided and operation information corresponding to a plurality of ordered operations for the new version of data, and a target cluster identification of a target computing device cluster corresponding to each of a plurality of release phases of the new version of data;
an operation information configuration module configured to configure the operation information using a storage address of a data shard of the plurality of data shards to generate configured operation information for the data shard;
a target state information determination module configured to determine target state information corresponding to a target operation to be completed among the plurality of ordered operations; and
a transmitting module configured to transmit the configured operation information and the target state information for the data shard to a second server for processing the data shard;
Wherein the target state information determination module comprises:
a target cluster identity determination module configured to determine, for each publication phase, a target cluster identity corresponding to the publication phase;
a target operation determination module configured to determine a target operation to be completed among the plurality of ordered operations; and
a target state information generation module configured to generate the target state information based on the target operation and the target cluster identity.
14. The apparatus of claim 13, wherein the operation information configuration module comprises:
a storage address acquisition module configured to acquire a storage address for the data fragment from the allocation information; and
a storage address association module configured to associate the storage address into partial information in the operation information to generate the configured operation information for the data fragment, the partial information corresponding to a download operation of the plurality of ordered operations.
15. The apparatus of claim 13, further comprising:
a current state information acquisition module configured to acquire current state information of a plurality of computing devices corresponding to the target cluster identification, the current state information indicating an operation of the plurality of ordered operations that the plurality of computing devices have completed;
A match determination module configured to determine whether the current state information matches the target state information;
an updating module configured to update the target state information to correspond to a next operation of the target operation if it is determined that the current state information matches the target state information; and
and an update transmission module configured to transmit the updated target state information to a second server for completing the next operation.
16. The apparatus of claim 13, further comprising:
a monitoring module configured to monitor whether a version of data generated by the data source has changed; and
and the first dividing module is configured to divide the new version data into the plurality of data fragments if the version of the data is determined to change.
17. The device of claim 16, wherein the first partitioning module comprises:
a predetermined data acquisition module configured to acquire a predetermined number of data slices to be generated; and
and a second dividing module configured to divide the new version data based on the predetermined number.
18. An apparatus for processing data, comprising:
An operation information and target state information receiving module configured to receive, at a second server, configured operation information and target state information for a data shard from a first server, the data shard being from a plurality of data shards generated by dividing new version data generated by a data source, the configured operation information being related to a plurality of ordered operations for the data shard, the target state information corresponding to a target operation to be completed among the plurality of ordered operations;
a first operation information and target state information sending module configured to send the operation information and the target state information to a first computing device in response to receiving first heartbeat information for the data shard from the first computing device, the first heartbeat information including a reference cluster identity for a current state of the data shard and a reference computing device cluster in which the first computing device is located, the target state information including a target cluster identity of a target computing device cluster corresponding to each of a plurality of release phases of the new version data; and
an update module configured to update current state information of the first computing device with a current state of the data slice;
Wherein the first operation information and target state information transmitting module includes:
a matching module configured to match the reference cluster identity with the target cluster identity; and
and a second operation information and target state information sending module configured to send the operation information and the target state information to the first computing device if it is determined that the reference cluster identity matches the target cluster identity.
19. The apparatus of claim 18, further comprising:
an updated target state information receiving module configured to receive the updated target state information from the first server, the updated target state information corresponding to a next operation of the target operation; and
an operation information and updated target state information transmission module configured to transmit the operation information and updated target state information to the first computing device for completing the next operation in response to receiving second heartbeat information for the data slice.
20. The apparatus of claim 18, further comprising:
a first lookup module configured to, in response to receiving third heartbeat information from the first computing device including an identification of a transferred data patch, lookup operation information and target state information of the transferred data patch in first state information of a portion of the data patch associated with the reference computing device cluster, the transferred data patch transferred from a second computing device onto the first computing device;
A second search module configured to search, if the operation information and the target state information of the transferred data patch are not found in the first state information, the operation information and the target state information of the transferred data patch in second state information corresponding to the plurality of data patches; and
and the searching result sending module is configured to send the searched operation information and target state information of the transfer data fragments to the first computing device.
21. An apparatus for processing data, comprising:
an identification acquisition module configured to acquire, at a first computing device, an identification of a data shard to be processed from a plurality of data shards generated by partitioning new version data generated by a data source;
a heartbeat information sending module configured to send a heartbeat message for the data shard to a second server for receiving operation information for the data shard and target state information, the operation information relating to a plurality of ordered operations for the data shard, the target state information corresponding to a target operation to be completed among the plurality of ordered operations;
Wherein the heartbeat information sending module includes:
a current state determination module configured to determine a current state for the data slice; and
a heartbeat information generating module configured to generate the heartbeat information based on the identification of the data slice and the current state;
a periodicity module configured to periodically send the heartbeat information for updating current state information of the first computing device processing the shards
A comparison module configured to compare a current state of the data slice with the target state information; and
and an operation execution module configured to continue executing the plurality of ordered operations to complete the target operation if the current state is determined to be different from the target state information.
22. The apparatus of claim 21, wherein the plurality of ordered operations comprises a download operation comprising a storage address of the data chunk; the apparatus further comprises:
and the download operation execution module is configured to execute the download operation to acquire the data fragments from the storage address.
23. The apparatus of claim 21, further comprising:
And a deletion module configured to delete the identification of the data shard if it is determined that the data shard is to be executed by an application on a third computing device.
24. The apparatus of claim 21, further comprising:
a sending module configured to send, if it is determined that an identification of a transferred data patch is received, heartbeat information including the identification of the transferred data patch to a second server for obtaining operation information and target state information for the transferred data patch, the transferred data patch being transferred from a second computing device onto the first computing device; and
and a transfer data fragment processing module configured to process the transfer data fragment based on the operation information and the target state information of the transfer data fragment.
25. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
26. An electronic device, comprising:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 6-8.
27. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 9-12.
28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
29. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 6-8.
30. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 9-12.
CN202110737890.4A 2021-06-30 2021-06-30 Method, apparatus, device and readable storage medium for processing data Active CN113687846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110737890.4A CN113687846B (en) 2021-06-30 2021-06-30 Method, apparatus, device and readable storage medium for processing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110737890.4A CN113687846B (en) 2021-06-30 2021-06-30 Method, apparatus, device and readable storage medium for processing data

Publications (2)

Publication Number Publication Date
CN113687846A CN113687846A (en) 2021-11-23
CN113687846B true CN113687846B (en) 2023-07-18

Family

ID=78576826

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110737890.4A Active CN113687846B (en) 2021-06-30 2021-06-30 Method, apparatus, device and readable storage medium for processing data

Country Status (1)

Country Link
CN (1) CN113687846B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014079348A1 (en) * 2012-11-26 2014-05-30 Tencent Technology (Shenzhen) Company Limited Software download method and software download apparatus
CN104239417A (en) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database
CN107895023A (en) * 2017-11-16 2018-04-10 百度在线网络技术(北京)有限公司 A kind of view data quality detecting method, device, server and storage medium
WO2018087311A1 (en) * 2016-11-10 2018-05-17 Telefonaktiebolaget Lm Ericsson (Publ) Resource segmentation to improve delivery performance
CN109088929A (en) * 2018-08-09 2018-12-25 北京百度网讯科技有限公司 For sending the method and device of information
CN110830580A (en) * 2019-11-12 2020-02-21 腾讯云计算(北京)有限责任公司 Storage data synchronization method and device
CN112148350A (en) * 2020-09-04 2020-12-29 深圳市大富网络技术有限公司 Remote version management method for works, electronic device and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014079348A1 (en) * 2012-11-26 2014-05-30 Tencent Technology (Shenzhen) Company Limited Software download method and software download apparatus
CN104239417A (en) * 2014-08-19 2014-12-24 天津南大通用数据技术股份有限公司 Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database
WO2018087311A1 (en) * 2016-11-10 2018-05-17 Telefonaktiebolaget Lm Ericsson (Publ) Resource segmentation to improve delivery performance
CN107895023A (en) * 2017-11-16 2018-04-10 百度在线网络技术(北京)有限公司 A kind of view data quality detecting method, device, server and storage medium
CN109088929A (en) * 2018-08-09 2018-12-25 北京百度网讯科技有限公司 For sending the method and device of information
CN110830580A (en) * 2019-11-12 2020-02-21 腾讯云计算(北京)有限责任公司 Storage data synchronization method and device
CN112148350A (en) * 2020-09-04 2020-12-29 深圳市大富网络技术有限公司 Remote version management method for works, electronic device and computer storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
分布式流处理技术综述;崔星灿;禹晓辉;刘洋;吕朝阳;;计算机研究与发展(第02期);全文 *
工程数据库管理系统中版本的动态管理与控制;钟毓宁;谢月云;翁平;杨叔子;;武汉理工大学学报(信息与管理工程版)(第01期);全文 *

Also Published As

Publication number Publication date
CN113687846A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CA3065118C (en) Distributed searching and index updating method and system, servers, and computer devices
EP4160440A1 (en) Federated computing processing method and apparatus, electronic device, and storage medium
US11210277B2 (en) Distributing and processing streams over one or more networks for on-the-fly schema evolution
CN110083627B (en) Data processing method, system, computer device and storage medium
US9021071B2 (en) Methods of federating applications providing modular data
CN111694857B (en) Method, device, electronic equipment and computer readable medium for storing resource data
CN113886434A (en) Database cluster-based query and storage method, device and equipment
CN111858628A (en) Database-based management method, database-based management platform, electronic device and storage medium
CN112380184A (en) Transaction processing method and device, electronic equipment and readable storage medium
CN115495473A (en) Database query method and device, electronic equipment and storage medium
CN111858796A (en) Geographic information system engine system, implementation method, device and storage medium
CN113076186B (en) Task processing method, device, electronic equipment and storage medium
CN110765075A (en) Storage method and equipment of automatic driving data
CN113687846B (en) Method, apparatus, device and readable storage medium for processing data
CN112182328A (en) Method and device for expanding search engine, electronic equipment and storage medium
CN113760638A (en) Log service method and device based on kubernets cluster
CN114721686A (en) Configuration data updating method and device, electronic equipment and storage medium
CN114218266A (en) Data query method and device, electronic equipment and storage medium
CN113835728A (en) Data updating method and device, electronic equipment and storage medium
CN113326038A (en) Method, apparatus, device, storage medium and program product for providing service
CN111782633A (en) Data processing method and device and electronic equipment
CN113360689B (en) Image retrieval system, method, related device and computer program product
CN113268488B (en) Method and device for data persistence
CN112527368B (en) Cluster kernel version updating method and device, electronic equipment and storage medium
CN117667102A (en) Dependency analysis method, device, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant