CN113687846B - Method, device, device and readable storage medium for processing data - Google Patents
Method, device, device and readable storage medium for processing data Download PDFInfo
- Publication number
- CN113687846B CN113687846B CN202110737890.4A CN202110737890A CN113687846B CN 113687846 B CN113687846 B CN 113687846B CN 202110737890 A CN202110737890 A CN 202110737890A CN 113687846 B CN113687846 B CN 113687846B
- Authority
- CN
- China
- Prior art keywords
- data
- information
- target
- state information
- computing device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/65—Updates
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
- G06F8/71—Version control; Configuration management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/34—Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本公开涉及数据处理技术领域,尤其涉及大数据和智能搜索等领域的用于处理数据的方法、装置、设备和可读存储介质。The present disclosure relates to the technical field of data processing, and in particular to methods, devices, devices, and readable storage media for processing data in fields such as big data and intelligent search.
背景技术Background technique
随着计算机互联网的发展,逐渐开始使用由大规模计算机程序组成的复杂的系统。随着这种复杂的系统的发展,与该系统相关的子系统以及要处理的数据越来越多。然而,由于计算设备能力受限,现在人们通常将各组成子系统中程序的计算和用于该程序的数据进行分开存储。在存储分离的技术方案中,在云服务器中运行的云端程序服务通常包括程序文件和数据文件组成。在程序服务规模增长中,数据文件中包含的数据信息也相应地逐渐增大。然而,在为程序服务提供数据文件的过程中存在许多需要解决的技术问题With the development of the computer Internet, complex systems composed of large-scale computer programs have gradually begun to be used. With the development of such a complex system, more and more subsystems and data to be processed are related to the system. However, due to the limited capacity of computing devices, people usually store the calculation of the program in each component subsystem and the data used for the program separately. In the storage separation technical solution, the cloud program service running on the cloud server usually includes program files and data files. As the scale of the program service grows, the data information contained in the data file also gradually increases accordingly. However, there are many technical problems that need to be solved in the process of providing data files for program services
发明内容Contents of the invention
本公开提供了一种用于处理数据的方法、装置、设备以及存储介质。The present disclosure provides a method, device, device and storage medium for processing data.
根据本公开的第一方面,提供了一种用于处理数据的方法。该方法包括:在第一服务器处获取由数据源产生的新版本数据的分配信息,分配信息包括新版本数据被划分成的多个数据分片的多个存储地址和与针对新版本数据的多个有序操作相对应的操作信息;使用多个数据分片中的数据分片的存储地址配置操作信息以生成针对数据分片的经配置的操作信息;确定与多个有序操作中要完成的目标操作相对应的目标状态信息;以及向第二服务器发送针对数据分片的经配置的操作信息和目标状态信息以用于处理数据分片。According to a first aspect of the present disclosure, a method for processing data is provided. The method includes: acquiring, at the first server, distribution information of new version data generated by a data source, the distribution information including multiple storage addresses of multiple data fragments into which the new version data is divided, and multiple storage addresses for the new version data The operation information corresponding to an ordered operation; use the storage address configuration operation information of the data fragments in the multiple data fragments to generate the configured operation information for the data fragments; determine the information to be completed in the multiple ordered operations target state information corresponding to the target operation; and sending the configured operation information and target state information for the data slice to the second server for processing the data slice.
根据本公开的第二方面,提供了一种用于处理数据的方法。该方法包括:在第二服务器处从第一服务器接收针对数据分片的经配置的操作信息和目标状态信息,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片,经配置的操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应;响应于从第一计算设备接收到针对数据分片的第一心跳信息,向第一计算设备发送操作信息和目标状态信息,第一心跳信息包括针对数据分片的当前状态;以及利用数据分片的当前状态来更新第一计算设备的当前状态信息。According to a second aspect of the present disclosure, a method for processing data is provided. The method includes receiving, at a second server, from a first server configured operational information and target state information for a data slice from a plurality of data generated by dividing a new version of data generated by a data source Fragmentation, the configured operation information is related to a plurality of ordered operations for data fragmentation, and the target state corresponds to a target operation to be completed in the plurality of ordered operations; First heartbeat information of the slice, sending operation information and target state information to the first computing device, the first heartbeat information including the current state for the data slice; and updating the current state of the first computing device with the current state of the data slice information.
根据本公开的第三方面,提供了一种用于处理数据的方法。该方法包括:在第一计算设备处获取要处理的数据分片的标识,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片;向第二服务器发送针对数据分片的心跳消息以用于接收针对数据分片的操作信息和目标状态信息,操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应;将数据分片的当前状态与目标状态进行比较;以及如果确定当前状态不同于目标状态,继续执行多个有序操作以完成目标操作。According to a third aspect of the present disclosure, a method for processing data is provided. The method includes: obtaining an identifier of a data fragment to be processed at the first computing device, and the data fragment comes from a plurality of data fragments generated by dividing the new version data generated by the data source; The heartbeat message of data sharding is used to receive the operation information and target status information for data sharding. The operation information is related to multiple ordered operations for data sharding, and the target state is related to the goals to be completed in multiple ordered operations The operations correspond; comparing the current state of the data shard to the target state; and if the current state is determined to be different from the target state, proceeding to perform a plurality of sequential operations to complete the target operation.
根据本公开的第四方面,提供了一种用于处理数据的装置。该装置包括:分配信息获取模块,被配置为在第一服务器处获取由数据源产生的新版本数据的分配信息,分配信息包括新版本数据被划分成的多个数据分片的多个存储地址和与针对新版本数据的多个有序操作相对应的操作信息;操作信息配置模块,被配置为使用多个数据分片中的数据分片的存储地址配置操作信息以生成针对数据分片的经配置的操作信息;目标状态信息确定模块,被配置为确定与多个有序操作中要完成的目标操作相对应的目标状态信息;以及发送模块,被配置为向第二服务器发送针对数据分片的经配置的操作信息和目标状态信息以用于处理数据分片。According to a fourth aspect of the present disclosure, an apparatus for processing data is provided. The device includes: a distribution information acquisition module configured to obtain distribution information of new version data generated by the data source at the first server, the distribution information including multiple storage addresses of multiple data fragments into which the new version data is divided and operation information corresponding to a plurality of ordered operations for the new version data; an operation information configuration module configured to use the storage address configuration operation information of the data fragments in the plurality of data fragments to generate the data fragments The configured operation information; the target state information determination module configured to determine the target state information corresponding to the target operation to be completed in the plurality of ordered operations; and the sending module configured to send the target state information to the second server Shard's configured operational information and target state information for processing data shards.
根据本公开的第五方面,提供了一种用于处理数据的装置。该装置包括:操作信息和目标状态信息接收模块,被配置为在第二服务器处从第一服务器接收针对数据分片的经配置的操作信息和目标状态信息,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片,经配置的操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应;第一操作信息和目标状态信息发送模块,被配置为响应于从第一计算设备接收到针对数据分片的第一心跳信息,向第一计算设备发送操作信息和目标状态信息,第一心跳信息包括针对数据分片的当前状态;以及更新模块,被配置为利用数据分片的当前状态来更新第一计算设备的当前状态信息。According to a fifth aspect of the present disclosure, an apparatus for processing data is provided. The apparatus includes: an operation information and target state information receiving module configured to receive, at the second server, configured operation information and target state information for data slices from the first server, the data slices being generated from the data source Multiple data shards generated by dividing the new version of the data, the configured operation information is related to multiple ordered operations for data shards, and the target state corresponds to the target operation to be completed in the multiple ordered operations; The first operation information and target state information sending module is configured to send operation information and target state information to the first computing device in response to receiving first heartbeat information for data fragmentation from the first computing device, the first heartbeat information A current state for the data slice is included; and an update module configured to update the current state information of the first computing device with the current state of the data slice.
根据本公开的第六方面,提供了一种用于处理数据的装置。该装置包括:标识获取模块,被配置为在第一计算设备处获取要处理的数据分片的标识,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片;心跳信息发送模块,被配置为向第二服务器发送针对数据分片的心跳消息以用于接收针对数据分片的操作信息和目标状态信息,操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应;比较模块,被配置为将数据分片的当前状态与目标状态进行比较;以及操作执行模块,被配置为如果确定当前状态不同于目标状态,继续执行多个有序操作以完成目标操作。According to a sixth aspect of the present disclosure, an apparatus for processing data is provided. The apparatus includes: an identification acquisition module configured to acquire the identification of the data fragment to be processed at the first computing device, the data fragment comes from a plurality of data fragments generated by dividing the new version data generated by the data source The heartbeat information sending module is configured to send a heartbeat message for the data fragmentation to the second server for receiving operation information and target state information for the data fragmentation, the operation information and a plurality of ordered operations for the data fragmentation Regarding, the target state corresponds to the target operation to be completed in a plurality of ordered operations; the comparison module is configured to compare the current state of the data slice with the target state; and the operation execution module is configured to determine if the current state Unlike the goal state, multiple sequential operations continue to be performed to complete the goal operation.
根据本公开的第七方面,提供了一种电子设备。该电子设备包括至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行根据本公开的第一方面的方法。According to a seventh aspect of the present disclosure, an electronic device is provided. The electronic device includes at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can A method according to the first aspect of the present disclosure is performed.
根据本公开的第八方面,提供了一种电子设备。该电子设备包括至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行根据本公开的第二方面的方法。According to an eighth aspect of the present disclosure, there is provided an electronic device. The electronic device includes at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can A method according to the second aspect of the present disclosure is performed.
根据本公开的第九方面,提供了一种电子设备。该电子设备包括至少一个处理器;以及与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行根据本公开的第三方面的方法。According to a ninth aspect of the present disclosure, there is provided an electronic device. The electronic device includes at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can A method according to the third aspect of the present disclosure is performed.
根据本公开的第十方面,提供一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行根据本公开的第一方面的方法。According to a tenth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method according to the first aspect of the present disclosure.
根据本公开的第十一方面,提供一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行根据本公开的第二方面的方法。According to an eleventh aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method according to the second aspect of the present disclosure.
根据本公开的第十二方面,提供一种存储有计算机指令的非瞬时计算机可读存储介质,计算机指令用于使计算机执行根据本公开的第三方面的方法。According to a twelfth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method according to the third aspect of the present disclosure.
根据本公开的第十三方面,提供一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现根据本公开的第一方面的方法的步骤。According to a thirteenth aspect of the present disclosure there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the first aspect of the present disclosure.
根据本公开的第十四方面,提供一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现根据本公开的第二方面的方法的步骤。According to a fourteenth aspect of the present disclosure there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the second aspect of the present disclosure.
根据本公开的第十五方面,提供一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现根据本公开的第三方面的方法的步骤。According to a fifteenth aspect of the present disclosure there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the third aspect of the present disclosure.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
附图用于更好地理解本方案,不构成对本公开的限定。其中:The accompanying drawings are used to better understand the present solution, and do not constitute a limitation to the present disclosure. in:
图1示出了本公开的多个实施例能够在其中实现的环境100的示意图;FIG. 1 shows a schematic diagram of an environment 100 in which various embodiments of the present disclosure can be implemented;
图2示出了根据本公开的一些实施例的用于处理数据的方法200的流程图;FIG. 2 shows a flowchart of a method 200 for processing data according to some embodiments of the present disclosure;
图3示出了根据本公开的一些实施例的分片模式的示例300的示意图;FIG. 3 shows a schematic diagram of an example 300 of a fragmentation mode according to some embodiments of the present disclosure;
图4示出了根据本公开的一些实施例的多个分发阶段的示例400的示意图;FIG. 4 shows a schematic diagram of an example 400 of multiple distribution stages according to some embodiments of the present disclosure;
图5示出了根据本公开的一些实施例的用于处理数据的方法500的流程图;FIG. 5 shows a flowchart of a method 500 for processing data according to some embodiments of the present disclosure;
图6示出了根据本公开的一些实施例的主控服务和代理的结构600的示意图;FIG. 6 shows a schematic diagram of a structure 600 of hosting services and proxies according to some embodiments of the present disclosure;
图7示出了根据本公开的一些实施例的用于处理数据的方法700的流程图;FIG. 7 shows a flowchart of a method 700 for processing data according to some embodiments of the present disclosure;
图8示出了根据本公开的一些实施例的针对数据分片的多个有序操作的示例800的示意图;FIG. 8 shows a schematic diagram of an example 800 of multiple sequential operations for data sharding according to some embodiments of the present disclosure;
图9示出了根据本公开的一些实施例的针对数据分片的多个有序操作的示例900的示意图;FIG. 9 shows a schematic diagram of an example 900 of multiple sequential operations for data sharding according to some embodiments of the present disclosure;
图10示出了根据本公开的一些实施例的针对数据分片的多个有序操作的示例1000的示意图;FIG. 10 shows a schematic diagram of an example 1000 of multiple ordered operations for data sharding according to some embodiments of the present disclosure;
图11示出了根据本公开的一些实施例的针对数据分片的迁移数据分片的示例1100的示意图;FIG. 11 shows a schematic diagram of an example 1100 of migrating data shards for data shards according to some embodiments of the present disclosure;
图12示出了根据本公开的一些实施例的用于处理数据一个示例系统1200的示意图;FIG. 12 shows a schematic diagram of an example system 1200 for processing data according to some embodiments of the present disclosure;
图13示出了根据本公开的一些实施例的用于处理数据的装置1300的框图;以及FIG. 13 shows a block diagram of an apparatus 1300 for processing data according to some embodiments of the present disclosure; and
图14示出了根据本公开的一些实施例的用于处理数据的装置1400的框图;以及FIG. 14 shows a block diagram of an apparatus 1400 for processing data according to some embodiments of the present disclosure; and
图15示出了根据本公开的一些实施例的用于处理数据的装置1500的框图;以及FIG. 15 shows a block diagram of an apparatus 1500 for processing data according to some embodiments of the present disclosure; and
图16示出了能够实施本公开的多个实施例的设备1600的框图。FIG. 16 shows a block diagram of a device 1600 capable of implementing various embodiments of the present disclosure.
具体实施方式Detailed ways
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
在本公开的实施例的描述中,术语“包括”及其类似用语应当理解为开放性包含,即“包括但不限于”。术语“基于”应当理解为“至少部分地基于”。术语“一个实施例”或“该实施例”应当理解为“至少一个实施例”。术语“第一”、“第二”等等可以指代不同的或相同的对象。下文还可能包括其他明确的和隐含的定义。In the description of the embodiments of the present disclosure, the term "comprising" and its similar expressions should be interpreted as an open inclusion, that is, "including but not limited to". The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be read as "at least one embodiment". The terms "first", "second", etc. may refer to different or the same object. Other definitions, both express and implied, may also be included below.
随着程序服务的规模变大,数据文件中包含的数据信息也相应地逐渐增大,通常会超过一台计算机物理内存能存放的容量。这时需要对大数据文件进行切割分片,一份大的数据文件拆分成成百上千份数据分片。因此,在分布式系统解决方案,如何将数量众多的数据分片分发到分布式计算机程序中成为需要解决的技术问题。As the scale of the program service becomes larger, the data information contained in the data file gradually increases accordingly, usually exceeding the storage capacity of a computer's physical memory. At this time, the large data file needs to be cut and sharded, and a large data file is split into hundreds or thousands of data shards. Therefore, in distributed system solutions, how to distribute a large number of data fragments to distributed computer programs has become a technical problem that needs to be solved.
为了解决上述问题,在一种传统方案中,通过一个中心控制服务程序,来对目标计算机上的配送地址进行通知和推送,在计算机上获取数据后放到计算机程序使用的文件路径上,实现数据的逐个配送。然而这种方案在整个数据配送机制中需要一套中心推送系统,对数据进行分发和推送。对于单点计算机推送失败后,只能做有限的重新推送,最终无法做到状态的一致。同时对于在分发推送过程中,产生的新计算机节点,无法及时获取最新的数据文件,需要由中心推送系统去感知新节点存在并发起额外的新增推送,在时效性和容错能力上都存在明显的不足。In order to solve the above problems, in a traditional solution, a central control service program is used to notify and push the delivery address on the target computer, and after the data is obtained on the computer, it is placed on the file path used by the computer program to realize data collection. delivered one by one. However, this solution requires a central push system in the entire data distribution mechanism to distribute and push data. After the single-point computer push fails, only limited re-push can be done, and the state cannot be consistent in the end. At the same time, for the new computer nodes generated during the distribution and push process, the latest data files cannot be obtained in time, and the central push system needs to sense the existence of new nodes and initiate additional new pushes, which are obvious in terms of timeliness and fault tolerance. lack of.
另一种传统方案是对于过大的数据文件拆分成多个小数据文件,并一一对应地建立对应的数据配送,以实现多个小数据文件的数据配送功能,其中多个小数据文件的数据配送相互独立。然而在超过单台计算机容量的数据情况发生时,需要将数据拆分成小数据文件,并需要创建小文件对应的独立数据配送任务。此时需要通过采用人工的方式进行拆解和新增配送。该方式对于小数据文件的数量扩展存在非常高的维护成本,同时各个小数据的数据配送是相互独立的,无法做小数据文件之间的协同配送,对于严格一致的大数据文件是无法适用的。Another traditional solution is to split an overly large data file into multiple small data files, and establish corresponding data distribution one by one, so as to realize the data distribution function of multiple small data files, wherein multiple small data files The data distribution of each is independent of each other. However, when data exceeding the capacity of a single computer occurs, the data needs to be split into small data files, and independent data distribution tasks corresponding to the small files need to be created. At this time, it is necessary to manually disassemble and add new distribution. This method has a very high maintenance cost for the expansion of the number of small data files. At the same time, the data distribution of each small data is independent of each other, and it is impossible to do collaborative distribution between small data files. It is not applicable to strictly consistent large data files. .
还有一种传统方案是在程序和数据上不做明显的拆分。有新数据需要配送的时候,通过计算机程序包变更的方式,进行程序包迭代升级。通过逐个升级计算机程序,实现程序中使用数据文件的更新。然而,对于该方案,对于大数据拆分后的小数据文件,需要绑定到计算机程序升级上,每一次数据更新的时候,需要走计算机程序的迭代变更,逐个升级,无法做到计算机程序和数据文件的灵活解耦,也很难实现计算机程序和数据文件的技术解耦。There is also a traditional solution that does not make an obvious split between programs and data. When there is new data to be delivered, the program package is iteratively upgraded by changing the computer program package. By upgrading computer programs one by one, the updating of data files used in the programs is realized. However, for this solution, the small data files after the big data split need to be bound to the computer program upgrade. Every time the data is updated, it is necessary to iteratively change the computer program and upgrade one by one. It is impossible to achieve computer program and The flexible decoupling of data files is also difficult to achieve the technical decoupling of computer programs and data files.
为了至少解决上述问题,根据本公开的实施例,提出一种用于处理数据的改进方案。在该方案中,在第一服务器处获取由数据源产生的新版本数据的分配信息,分配信息包括新版本数据被划分成的多个数据分片的多个存储地址和与针对新版本数据的多个有序操作相对应的操作信息。然后第一服务器使用多个数据分片中的数据分片的存储地址配置操作信息以生成针对数据分片的经配置的操作信息,并且确定与多个有序操作中要完成的目标操作相对应的目标状态信息。第一服务器向第二服务器发送针对所述数据分片的经配置的操作信息和目标状态信息以用于处理数据分片。通过该方法,可以实现数据配送阶段的最终状态一致性,显著降低数据配送系统的复杂度,提高数据处理的稳定性。In order to at least solve the above problems, according to an embodiment of the present disclosure, an improved solution for processing data is proposed. In this scheme, the allocation information of the new version data generated by the data source is acquired at the first server, the allocation information includes multiple storage addresses of the multiple data fragments into which the new version data is divided and the Operation information corresponding to multiple sequential operations. Then the first server uses the storage address configuration operation information of the data fragments in the plurality of data fragments to generate configured operation information for the data fragments, and determines that it corresponds to the target operation to be completed in the plurality of sequential operations target state information. The first server sends configured operational information and target state information for the data slice to the second server for processing the data slice. Through this method, the final state consistency in the data distribution stage can be realized, the complexity of the data distribution system can be significantly reduced, and the stability of data processing can be improved.
图1示出了本公开的多个实施例能够在其中实现的环境100的示意图。该示例环境100包括服务器106、服务器112和计算设备114。Figure 1 shows a schematic diagram of an environment 100 in which various embodiments of the present disclosure can be implemented. The example environment 100 includes a server 106 , a server 112 and a computing device 114 .
服务器106和服务器112可以是各种合适的计算设备,也可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。为了便于描述,服务器106可以被称为第一服务器,服务器112可以被成为第二服务器。Server 106 and server 112 can be various suitable computing devices, and can also be cloud servers, also known as cloud computing servers or cloud hosts, which are a host product in the cloud computing service system to solve the problem of traditional physical hosts and VPS In the service ("Virtual Private Server", or "VPS" for short), there are defects such as difficult management and weak business scalability. The server can also be a server of a distributed system, or a server combined with a blockchain. For ease of description, server 106 may be referred to as a first server and server 112 may be referred to as a second server.
服务器106可以用于监测数据源生产的数据的版本是否发生变化。如果监测到新版本数据102,获取新版本数据102。对于处理数据源产生的数据的每个应用存在与该数据源对应的配置信息。配置信息至少包括新版本数据要划分成的数据分片的数目、针对该新版本数据的多个有序操作。备选地或附加地,配置信息还包括将该新版本数据发布到计算设备的多个发布阶段、各个发布阶段使用的计算设备集群的集群标识等。The server 106 can be used to monitor whether the version of the data produced by the data source changes. If the new version data 102 is detected, the new version data 102 is acquired. Configuration information corresponding to a data source exists for each application that processes data generated by the data source. The configuration information includes at least the number of data fragments into which the new version data is divided, and multiple sequential operations for the new version data. Alternatively or additionally, the configuration information further includes multiple publishing stages for releasing the new version data to computing devices, cluster identifiers of computing device clusters used in each publishing stage, and the like.
该多个有序操作也可被视为状态机路径,包括针对新版本数据的按序进行的多个操作。多个有序操作至少包括下载和加载操作。在一些实施例中多个有序操作还包括配置操作,例如创建文件目录等。备选的或附加的,多个有序操作还可以包括在加载操作之后的后续操作,例如对先前接受的数据的处理操作。上述示例仅是用于描述本公开,而非对本公开的具体限定。The plurality of sequential operations can also be viewed as a state machine path, comprising a plurality of operations performed in sequence on a new version of data. The plurality of sequential operations includes at least download and load operations. In some embodiments, the multiple sequential operations also include configuration operations, such as creating file directories and the like. Alternatively or additionally, the plurality of sequential operations may also include subsequent operations after the load operation, such as processing operations on previously accepted data. The above examples are only used to describe the present disclosure, rather than to specifically limit the present disclosure.
服务器106获取到配置信息。然后根据数据分片的数目将新版本数据102划分为多个数据分片104。服务器106可以根据配置信息和数据分片104的相关信息来生成分配信息。数据分片的相关信息包括例如数据分片的存储地址、数据版本等。该分配信息可以包括数据分片的存储地址、数据分片在目的地计算设备中的目的地地址、将新版本数据发布到计算设备的多个发布阶段、各个发布阶段使用的计算设备集群的集群标识、针对该新版本数据的多个有序操作等。The server 106 obtains the configuration information. The new version data 102 is then divided into multiple data fragments 104 according to the number of data fragments. The server 106 can generate allocation information according to the configuration information and related information of the data slice 104 . The relevant information of the data slice includes, for example, the storage address of the data slice, the data version, and the like. The allocation information may include the storage address of the data fragment, the destination address of the data fragment in the destination computing device, multiple release stages for releasing new version data to the computing device, and the cluster of computing device clusters used in each release stage identification, multiple sequential operations on this new version of the data, etc.
服务器106针对每个数据分片,生成对应于每个数据分片的多个有序操作的操作信息108。例如利用每个数据分片的存储地址配置该数据分片的多个有序操作中的下载操作中的下载地址。服务器106还会针对这些数据分片确定要执行的目标操作来形成目标状态信息110。备选地或附加地,利用不同的发布阶段将针对该发布阶段的计算设备集群的标识和目标操作来生成目标状态信息110,例如将计算设备集群的标识作为目标状态信息中的一个字段。For each data shard, the server 106 generates operation information 108 corresponding to a plurality of sequential operations for each data shard. For example, the storage address of each data fragment is used to configure the download address in the download operation among the multiple sequential operations of the data fragment. The server 106 also determines target operations to be performed for these data fragments to form target state information 110 . Alternatively or additionally, the target status information 110 is generated for the identification and target operation of the computing device cluster in different publishing phases, for example, the computing device cluster identification is used as a field in the target status information.
在图1中生成分配信息、配置多个有序操作及生成目标状态信息的过程在服务器106内执行,其仅是示例,而非对本共开的具体限定,上述过程可以在不同的服务器上实现,本领域技术人员可以根据需要来进行设置。In FIG. 1, the process of generating allocation information, configuring multiple ordered operations and generating target state information is performed in the server 106, which is only an example, rather than a specific limitation of this disclosure. The above-mentioned processes can be implemented on different servers. , those skilled in the art can set as needed.
服务器112接收操作信息108和目标状态信息110。服务器112在接受到计算设备114发来的心跳信息时将在该计算设备上处理的数据分片的操作信息和目标状态信息下发给计算设备114。在一些实施例中,心跳信息中包括数据分片的标识,服务器112根据数据分片的标识来查找对应的操作信息,然后下发针对该数据分片的操作信息和目标状态信息。备选地或附加地,心跳信息中包括计算设备114所属的设备集群的集群标识,服务器112将计算设备114所属的设备集群的集群标识与目标状态信息中的集群标识进行匹配。在两者匹配时才向计算设备114下发操作信息和目标状态信息,这样保证了仅与每个发布阶段相对应的计算设备来执行处理数据分片的应用。上述示例仅是用于描述本公开,而非对本公开的具体限定。Server 112 receives operational information 108 and object status information 110 . When the server 112 receives the heartbeat information sent by the computing device 114 , it sends the operation information and target state information of the data slices processed on the computing device to the computing device 114 . In some embodiments, the heartbeat information includes the identification of the data segment, and the server 112 searches for the corresponding operation information according to the identification of the data segment, and then delivers the operation information and target status information for the data segment. Alternatively or additionally, the heartbeat information includes the cluster ID of the device cluster to which the computing device 114 belongs, and the server 112 matches the cluster ID of the device cluster to which the computing device 114 belongs with the cluster ID in the target state information. The operation information and the target state information are delivered to the computing device 114 only when the two match, which ensures that only the computing device corresponding to each publishing stage executes the application for processing data fragments. The above examples are only used to describe the present disclosure, rather than to specifically limit the present disclosure.
计算设备114在获取到要处理的数据分片的标识后,向服务器112发送包括数据分片的标识的心跳信息以从服务器112来获取该分片的操作信息108和目标状态信息110。计算设备114在获取到操作信息108和目标状态信息110后,将目标状态信息110中的目标状态与现在计算设备114内的当前状态进行比较。如果目标状态与当前的状态不匹配,表明当前进行的操作还没进行到目标操作,计算设备继续进行多个有序操作中的操作。如果匹配,表明计算设备114中进行的操作已经为目标操作,不在需要继续执行后面跟的操作。After obtaining the identifier of the data segment to be processed, the computing device 114 sends heartbeat information including the identifier of the data segment to the server 112 to obtain the operation information 108 and target state information 110 of the segment from the server 112 . After the computing device 114 acquires the operation information 108 and the target status information 110 , it compares the target status in the target status information 110 with the current status in the computing device 114 . If the target state does not match the current state, it indicates that the current operation has not yet reached the target operation, and the computing device continues to perform operations in multiple sequential operations. If they match, it indicates that the operation performed in the computing device 114 is already the target operation, and there is no need to continue to perform subsequent operations.
计算设备114包括但不限于个人计算机、服务器计算机、手持或膝上型设备、移动设备(诸如移动电话、个人数字助理(PDA)、媒体播放器等)、多处理器系统、消费电子产品、小型计算机、大型计算机、包括上述系统或设备中的任意一个的分布式计算环境等。其中服务器可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务("Virtual Private Server",或简称"VPS")中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合了区块链的服务器。Computing devices 114 include, but are not limited to, personal computers, server computers, handheld or laptop devices, mobile devices (such as mobile phones, personal digital assistants (PDAs), media players, etc.), multiprocessor systems, consumer electronics, small Computers, mainframe computers, distributed computing environments including any of the above systems or devices, etc. The server can be a cloud server, also known as a cloud computing server or a cloud host. ), there are defects such as high management difficulty and weak business scalability. The server can also be a server of a distributed system, or a server combined with a blockchain.
图1示出了一个服务器112和一个计算设备114,其仅是示例,而非对本公开的具体限定,该环境100中还可以包括多个服务器112和多个计算设备。其中多个服务器112从一个服务器106接收操作信息和目标状态信息,多个计算设备114从一个服务器112接收数据信息。FIG. 1 shows one server 112 and one computing device 114 , which is only an example and not a specific limitation to the present disclosure. The environment 100 may also include multiple servers 112 and multiple computing devices. Where multiple servers 112 receive operational information and object status information from one server 106 , and multiple computing devices 114 receive data information from one server 112 .
通过该方法,可以实现数据配送阶段的最终状态一致性,显著降低数据配送系统的复杂度,提高数据处理的稳定性。Through this method, the final state consistency in the data distribution stage can be realized, the complexity of the data distribution system can be significantly reduced, and the stability of data processing can be improved.
上面结合图1描述了本公开的多个实施例的能够在其中实现的环境100。下面结合图2描述根据本公开的一些实施例的用于处理数据的方法200的流程图。图2中的方法200可以由图1中的服务器106或任意合适的计算设备执行。The environment 100 in which multiple embodiments of the present disclosure can be implemented has been described above with reference to FIG. 1 . The following describes a flow chart of a method 200 for processing data according to some embodiments of the present disclosure with reference to FIG. 2 . Method 200 in FIG. 2 may be performed by server 106 in FIG. 1 or any suitable computing device.
在框202处,在第一服务器处获取由数据源产生的新版本数据的分配信息,分配信息包括新版本数据被划分成的多个数据分片的多个存储地址和与针对新版本数据的多个有序操作相对应的操作信息。如图1所示,服务器106获取由数据源产生的新版本数据的分配信息。在一些实施例中,服务器106从其他计算设备接收分配信息。在一些实施例中,服务器106生成分配信息。上述示例仅是用于描述本公开,而非对本公开的具体限定。At block 202, the allocation information of the new version data generated by the data source is obtained at the first server, the allocation information includes multiple storage addresses of the multiple data fragments into which the new version data is divided and the Operation information corresponding to multiple sequential operations. As shown in FIG. 1, the server 106 obtains distribution information of new version data generated by the data source. In some embodiments, server 106 receives allocation information from other computing devices. In some embodiments, server 106 generates allocation information. The above examples are only used to describe the present disclosure, rather than to specifically limit the present disclosure.
在一些实施例中,服务器106监测由数据源生成的数据是否发生变化,例如数据源生成的数据是否变化或是否更新等。如果数据发生变化,认为数据版本发生了变化。在一些实施例中,数据版本信息由时间来体现或者用户设定的各种合适的标识信息来表示。在数据发生变化时,生成新的数据版本信息。服务器检查版本信息来确定数据版本是否发生变化。上述示例仅是用于描述本公开,而非对本公开的具体限定。如果确定所述数据的版本发生变化,将新版本数据划分为多个数据分片104。通过该方式,可以快速的划分数据分片。In some embodiments, the server 106 monitors whether the data generated by the data source changes, for example, whether the data generated by the data source changes or is updated. If the data changes, the data version is considered to have changed. In some embodiments, the data version information is represented by time or various appropriate identification information set by the user. When data changes, new data version information is generated. The server checks the version information to determine if the data version has changed. The above examples are only used to describe the present disclosure, rather than to specifically limit the present disclosure. If it is determined that the version of the data has changed, the new version data is divided into multiple data fragments 104 . In this way, data shards can be quickly divided.
在一些实施例中,服务器106获取要生成的数据分片的预定数目。例如,服务器106从针对数据源的数据的配置信息中获取该预定数目。然后服务器106根据预定数目来划分新版本数据。通过该方式,可以对数据进行快速的划分。通过将新版本数据102划分为多个数据分片104,可以加快数据的处理,提高了数据处理效率。In some embodiments, server 106 obtains a predetermined number of data shards to generate. For example, the server 106 obtains the predetermined number from the configuration information for the data of the data source. The server 106 then divides the new version data according to a predetermined number. In this manner, data can be quickly divided. By dividing the new version data 102 into multiple data fragments 104, data processing can be accelerated and data processing efficiency is improved.
图3示出了根据本公开的一些实施例的分片模式的示例300的示意图。在图3中,在传统的单一模式中,不同的版本的数据会生成一个数据文件,例如数据版本1 302、数据版本2 304和数据版本3 306。在本公开的实施例中,每个版本的数据可以被划分为预定数目的数据分片。在一些实施例中,如数据版本1可以被划分为三个数据分片:分片308、310和312。同样,后续的数据版本也被划分为同样数目的数据分片。在一些实施例中,可以将数据划分为任意合适数目的数据分片。上述仅是示例,而非对本公开具体描述。FIG. 3 shows a schematic diagram of an example 300 of a fragmentation scheme according to some embodiments of the present disclosure. In FIG. 3 , in the traditional single mode, different versions of data will generate a data file, such as data version 1 302 , data version 2 304 and data version 3 306 . In an embodiment of the present disclosure, each version of data may be divided into a predetermined number of data fragments. In some embodiments, for example, data version 1 may be divided into three data shards: shards 308 , 310 , and 312 . Similarly, subsequent data versions are also divided into the same number of data shards. In some embodiments, data may be partitioned into any suitable number of data shards. The foregoing are examples only, not specific descriptions of the present disclosure.
现在返回图2接着进行描述,在一些实施例中,服务器106根据数据分片的地址以及版本,结合与该数据源对应的配置信息来生成分配信息。配置信息至少包括要新版本数据要划分成的数据分片的数目、将该新版本数据102发布到计算设备的多个发布阶段、各个发布阶段使用的计算设备集群的集群标识、针对该新版本数据的多个有序操作。基于上述信息来生成分配信息。在一些实施例中,服务器106从其他其服务器接收分配置信息。上述示例仅是用于描述本公开,而非对本公开的具体限定。Now return to FIG. 2 to continue the description. In some embodiments, the server 106 generates allocation information according to the address and version of the data fragment and in combination with the configuration information corresponding to the data source. The configuration information includes at least the number of data fragments to be divided into new version data, multiple release stages for releasing the new version data 102 to computing devices, cluster identifiers of computing device clusters used in each release stage, and Multiple sequential operations on data. Assignment information is generated based on the above information. In some embodiments, server 106 receives sub-configuration information from other other servers. The above examples are only used to describe the present disclosure, rather than to specifically limit the present disclosure.
在框204处,使用多个数据分片中的数据分片的存储地址配置操作信息以生成针对数据分片的经配置的操作信息。例如,图1中的服务器106利用数据分片的存储地址来配置针对每个数据分片的操作信息。备选的或附加地,利用分配信息中的其他信息来配置操作信息。例如还可以配置数据分片在计算设备中的存储地址等。At block 204, operational information is configured using storage addresses of data slices of the plurality of data slices to generate configured operational information for the data slices. For example, the server 106 in FIG. 1 uses the storage address of the data slice to configure the operation information for each data slice. Alternatively or additionally, other information in the allocation information is utilized to configure the operation information. For example, a storage address of the data slice in the computing device may also be configured.
在一些实施例中,服务器106从分配信息中获取针对数据分片的存储地址。然后,服务器106将存储地址关联到操作信息中的部分信息中以生成针对数据分片的经配置的操作信息,部分信息对应于多个有序操作中的下载操作。通过该方式,可以快速的确定数据分片的存储地址。In some embodiments, the server 106 obtains the storage address for the data slice from the allocation information. Then, the server 106 associates the storage address with the partial information in the operation information to generate configured operation information for the data slice, the partial information corresponds to the download operation in the plurality of sequential operations. In this way, the storage address of the data fragment can be quickly determined.
在框206处,确定与多个有序操作中要完成的目标操作相对应的目标状态信息。例如,图1中的服务器106确定与多个有序操作中要完成的目标操作相对应的目标状态信息。At block 206, target state information corresponding to a target operation to be completed in the plurality of ordered operations is determined. For example, server 106 in FIG. 1 determines target state information corresponding to a target operation to be completed in a plurality of sequential operations.
在一些实施例中,数据发布阶段只有一个阶段,该目标状态信息用于运行处理数据的应用的所有计算设备,在目标状态信息中设置要完成的目标操作。In some embodiments, there is only one phase of data release, and the target state information is used for all computing devices running applications that process data, and target operations to be completed are set in the target state information.
在一些实施例种,数据发布包括多个阶段,其中分配信息还包括与新版本数据的多个发布阶段中的每个发布阶段相对应的目标计算设备集群的目标集群标识。此时针对每个发布阶段,服务器106确定与发布阶段相对应的计算设备集群的目标集群标识。服务器还可以确定多个有序操作中要完成的目标操作。然后服务器基于目标操作和目标集群标识生成目标状态信息,例如目标状态信息中包括与目标操作相对应的操作标识以及目标集群标识。通过该方式,可以快速的生成针对不同阶段中的计算设备的目标状态信息。下面结合图4描述多个发布阶段,其中图4示出了根据本公开的一些实施例的多个分发阶段的示例400的示意图。In some embodiments, the data publishing includes multiple phases, wherein the distribution information further includes target cluster identifications of target computing device clusters corresponding to each of the multiple publishing phases of the new version of the data. At this time, for each release stage, the server 106 determines the target cluster identifier of the computing device cluster corresponding to the release stage. The server may also determine a target operation to complete among multiple sequential operations. Then the server generates target status information based on the target operation and the target cluster ID, for example, the target status information includes the operation ID corresponding to the target operation and the target cluster ID. In this manner, target state information for computing devices in different stages can be quickly generated. Multiple publishing stages are described below in conjunction with FIG. 4 , which shows a schematic diagram of an example 400 of multiple distribution stages according to some embodiments of the present disclosure.
在图4中,数据发布包括4个阶段,阶段S0 402、阶段S1 404、阶段S2 406和阶段S3408。每个阶段由选定的不同计算设备执行处理数据的应用。例如,在阶段S1 404,选定了联合集群Union_ig1和Union_ig2来运行处理数据分片的应用,每个联合集群包括多个计算设备集群,例如联合集群Union_ig1包括计算设备集群ig1、ig2、ig3和ig4。每个计算设备集群包括多个计算设备实例。因此针对每个发布阶段,服务器106将不同的计算设备集群的标识放入目标状态信息中以用于向目标集群中的计算设备下发操作信息和目标状态信息。In FIG. 4 , data release includes four stages, stage S0 402 , stage S1 404 , stage S2 406 and stage S3 408 . Each stage is executed by a different computing device selected to process the application of the data. For example, in stage S1 404, union clusters Union_ig1 and Union_ig2 are selected to run applications for processing data fragmentation, each union cluster includes multiple computing device clusters, for example, union cluster Union_ig1 includes computing device clusters ig1, ig2, ig3 and ig4 . Each computing device cluster includes multiple computing device instances. Therefore, for each release phase, the server 106 puts the identifications of different computing device clusters into the target state information for delivering operation information and target state information to computing devices in the target cluster.
返回图2接着进行描述,在框208处,向第二服务器发送针对数据分片的经配置的操作信息和目标状态信息以用于处理数据分片。例如,如图1所示,服务器106向服务器112发送针对数据分片的经配置的操作信息和目标状态信息以用于处理数据分片。Returning to FIG. 2 to continue the description, at block 208 , the configured operation information and target state information for the data shards are sent to the second server for processing the data shards. For example, as shown in FIG. 1 , server 106 sends configured operational information and target state information for a data segment to server 112 for processing the data segment.
在一些实施例中,服务器106还会获取与目标集群标识相对应的多个计算设备的当前状态信息,当前状态信息指示多个计算设备已完成的多个有序操作中的操作。然后,服务器106确定当前状态信息与目标状态信息是否匹配。如果确定当前状态信息与目标状态信息匹配,将目标状态信息更新为与目标操作的下一操作相对应。然后服务器106向服务器112发送经更新的目标状态以用于完成下一操作。如果两个状态信息不匹配,则表明还有计算设备未执行到目标操作,再以预定时间间隔获取多个计算设备的当前状态信息以检测等待所有计算设备完成该目标操作。通过该方式,可以实现数据处理过程中的一致性。In some embodiments, the server 106 also obtains current state information of the plurality of computing devices corresponding to the target cluster identifier, where the current state information indicates operations in the plurality of sequential operations completed by the plurality of computing devices. The server 106 then determines whether the current state information matches the target state information. If it is determined that the current state information matches the target state information, the target state information is updated to correspond to the next operation of the target operation. Server 106 then sends the updated target state to server 112 for use in completing the next operation. If the two state information do not match, it indicates that there are computing devices that have not yet performed the target operation, and then obtain the current state information of multiple computing devices at a predetermined time interval to detect and wait for all computing devices to complete the target operation. In this way, consistency in the data processing process can be achieved.
作为一个示例,如图4所示,在发布阶段S1 404,服务器106定时查询每个计算设备实例的状态,如果计算设备集群中的每个实例的状态与目标状态匹配,则将对应的集群的状态设置为完成目标操作的状态,也成为状态回收。如果每个计算设备集群的状态为完成目标操作的状态,则可以将联合集群的状态设置为完成目标操作的状态,这也可以称为状态递归,然后通过联合集群并行推进目标状态的执行。在该阶段的所有计算设备的状态为目标状态后,服务器将目标状态信息中的目标操作调整为多个有序操作中的下一操作。As an example, as shown in FIG. 4, in the publishing phase S1 404, the server 106 regularly queries the state of each computing device instance, and if the state of each instance in the computing device cluster matches the target state, the corresponding cluster's The state is set to the state where the target operation is completed, also known as state recycling. If the state of each computing device cluster is the state of completing the target operation, the state of the joint cluster can be set to the state of completing the target operation, which can also be called state recursion, and then the execution of the target state can be advanced in parallel through the joint cluster. After the states of all computing devices at this stage are the target state, the server adjusts the target operation in the target state information to be the next operation among the multiple sequential operations.
在一些实施例中,如果服务器106中又接收到更新版本的数据的发布信息,则等待该当前数据版本的有序操作完成后进行更新版本数据的发布操作。In some embodiments, if the server 106 receives release information of an updated version of the data again, it waits for the orderly operation of the current data version to be completed before performing the release operation of the updated version of the data.
通过该方法,可以实现数据配送阶段的最终状态一致性,显著降低数据配送系统的复杂度,提高数据处理的稳定性。Through this method, the final state consistency in the data distribution stage can be realized, the complexity of the data distribution system can be significantly reduced, and the stability of data processing can be improved.
上面结合图2-图4描述了本公开的多个实施例的用于处理数据的方法200的流程图。下面结合图5描述根据本公开的一些实施例的用于处理数据的方法500的流程图。图5中的方法500可以由图1中的服务器112或任意合适的计算设备执行。The flow chart of the method 200 for processing data according to multiple embodiments of the present disclosure is described above with reference to FIGS. 2-4 . A flow chart of a method 500 for processing data according to some embodiments of the present disclosure is described below with reference to FIG. 5 . Method 500 in FIG. 5 may be performed by server 112 in FIG. 1 or any suitable computing device.
在框502处,在第二服务器处从第一服务器接收针对数据分片的经配置的操作信息和目标状态信息,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片,经配置的操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应。例如,图1中的服务器112从服务器106接收针对数据分片的经配置的操作信息和目标状态信息。服务器112接收到针对数据分布的操作信息和目标状态信息后进行存储。At block 502, configured operational information and target state information for data shards from the first server are received at the second server from multiple For data sharding, the configured operation information is related to multiple ordered operations for data sharding, and the target state corresponds to the target operation to be completed in the multiple ordered operations. For example, server 112 in FIG. 1 receives configured operational information and target state information for data shards from server 106 . The server 112 stores the operation information and target state information for data distribution after receiving it.
在框504处,确定是否从第一计算设备接收到针对数据分片的第一心跳信息。例如,服务器112监测是否从计算设备114接收到心跳信息。如果从第一计算设备114接收到针对数据分片的第一心跳信息,在框506处,向第一计算设备发送操作信息和目标状态信息,第一心跳信息包括针对数据分片的当前状态。例如,由计算设备114向服务器112发送的心跳信息包括当前状态,该当前状态指示在计算设备114上已完成的针对数据分片的操作。At block 504, it is determined whether first heartbeat information for the data shard is received from the first computing device. For example, server 112 monitors whether heartbeat information is received from computing device 114 . If first heartbeat information for the data slice is received from the first computing device 114, at block 506, operational information and target state information are sent to the first computing device, the first heartbeat information including the current state for the data slice. For example, the heartbeat information sent by computing device 114 to server 112 includes a current status indicating operations that have been completed on computing device 114 for data shards.
在一些实施例中,第一心跳信息还包括第一计算设备所位于的参考计算设备集群的参考集群标识,目标状态信息包括与新版本数据的多个发布阶段中的每个发布阶段相对应的目标计算设备集群的目标集群标识。服务器112在向第一计算设备114发送操作信息和目标状态信息时,需要先将参考集群标识与目标集群标识进行匹配。如果确定参考集群标识与目标集群标识匹配,表明第一计算设备是该发布阶段选定的计算设备,则向第一计算设备发送操作信息和目标状态信息。如果不匹配,表明第一计算设备不是在该发布阶段选定的计算设备。因此,不需要发送操作信息和目标状态信息。通过该方式,可以准确的控制各个计算设备的操作。In some embodiments, the first heartbeat information further includes the reference cluster identifier of the reference computing device cluster where the first computing device is located, and the target state information includes a release stage corresponding to each of the multiple release stages of the new version data. The target cluster ID of the target computing device cluster. When the server 112 sends the operation information and the target state information to the first computing device 114, it needs to first match the reference cluster ID with the target cluster ID. If it is determined that the reference cluster ID matches the target cluster ID, indicating that the first computing device is the selected computing device in the publishing phase, then the operation information and the target status information are sent to the first computing device. If not, it indicates that the first computing device was not the computing device selected at the publishing stage. Therefore, there is no need to send operational information and target status information. In this way, the operation of each computing device can be precisely controlled.
利用上述心跳信息获取操作信息和目标状态信息,实质上建立了声明式配送机制,保证了数据处理的一致性。该声明式配送机制是由服务器112上的主控服务和在计算设备114上运行的代理服务实现,主控服务与代理服务是一对多关系。如图6所示,多个代理服务会与主控服务进行网络通信,同步心跳信息和获取数据文件的最新版本。主控服务维护数据文件的最新版本,同时维护最新版本的当前阶段,供代理服务查询获取。具体地,在第二服务器上运行的主控服务602包括多个有序操作,对应于多个阶段。与主控服务602相关联的多个代理运行在多个不同的计算设备上,例如代理604、606和608运行在三个计算设备上。每个代理负责管理针对数据分片610、612和614的多个不同阶段的操作。Using the above heartbeat information to obtain operation information and target status information, a declarative distribution mechanism is essentially established to ensure the consistency of data processing. The declarative distribution mechanism is implemented by a master service on the server 112 and a proxy service running on the computing device 114, and the master service and the proxy service have a one-to-many relationship. As shown in Figure 6, multiple proxy services will communicate with the main control service over the network to synchronize heartbeat information and obtain the latest version of data files. The main control service maintains the latest version of the data file, and at the same time maintains the current stage of the latest version for the proxy service to query and obtain. Specifically, the master service 602 running on the second server includes a plurality of sequential operations, corresponding to a plurality of stages. Multiple agents associated with hosting service 602 run on multiple different computing devices, eg, agents 604, 606, and 608 run on three computing devices. Each agent is responsible for managing a number of different stages of operations for data shards 610 , 612 and 614 .
返回图5接着进行描述,在框508处,利用数据分片的当前状态来更新第一计算设备的当前状态信息。例如,图1中的服务器112利用数据分片的当前状态来更新第一计算设备的当前状态信息,例如将当前状态存储到计算设备的状态信息列表中。Returning to FIG. 5 to continue the description, at block 508, the current state information of the first computing device is updated with the current state of the data slice. For example, the server 112 in FIG. 1 uses the current state of the data slice to update the current state information of the first computing device, such as storing the current state in the state information list of the computing device.
在一些实施例中,如果在服务器106中目标状态信息更新之后,服务器112从服务器106接收经更新的目标状态信息,其中经更新的目标状态信息与目标操作的下一操作相对应。然后,服务器112在接收到针对数据分片的第二心跳信息后,向第一计算设备发送操作信息和经更新的目标状态信息以用于完成下一操作。通过该方式,可以实现操作的一致性。In some embodiments, if the target state information is updated in server 106, server 112 receives updated target state information from server 106, wherein the updated target state information corresponds to the next operation of the target operation. Then, after receiving the second heartbeat information for the data segment, the server 112 sends the operation information and the updated target state information to the first computing device for completing the next operation. In this way, consistency of operation can be achieved.
在一些实施例中,服务器112还会从第一计算设备114接收到包括转移数据分片的标识的第三心跳信息,在与参考计算设备集群相关联的部分数据分片的第一状态信息中查找转移数据分片的操作信息和目标状态信息,转移数据分片从第二计算设备转移到第一计算设备上。如果在第一状态信息中未查找到转移数据分片的操作信息和目标状态,在与多个数据分片相对应的第二状态信息中来查找转移数据分片的操作信息和目标状态。然后服务器112向所述第一计算设备114发送查找到的转移数据分片的操作信息和目标状态。通过该方式,可以快速实现数据分片的转移。In some embodiments, the server 112 also receives a third heartbeat message from the first computing device 114 including an identification of the transferred data segment, in the first state information of the partial data segment associated with the reference computing device cluster The operation information and target state information of the transferred data fragments are searched, and the transferred data fragments are transferred from the second computing device to the first computing device. If the operation information and target state of transferring data fragments are not found in the first state information, the operation information and target state of transferring data fragments are searched in the second state information corresponding to multiple data fragments. Then the server 112 sends the found operation information and target status of the transferred data fragments to the first computing device 114 . In this way, the transfer of data fragments can be realized quickly.
通过该方法,可以实现数据配送阶段的最终状态一致性,显著降低数据配送系统的复杂度,提高数据处理的稳定性Through this method, the final state consistency in the data distribution stage can be achieved, the complexity of the data distribution system can be significantly reduced, and the stability of data processing can be improved.
上面结合图5-6描述了本公开的多个实施例的用于处理数据的方法500。下面结合图7描述根据本公开的一些实施例的用于处理数据的方法700的流程图。图7中的方法700可以由图1中的计算设备114或任意合适的计算设备执行。The method 500 for processing data according to multiple embodiments of the present disclosure has been described above with reference to FIGS. 5-6 . The following describes a flow chart of a method 700 for processing data according to some embodiments of the present disclosure with reference to FIG. 7 . Method 700 in FIG. 7 may be performed by computing device 114 in FIG. 1 or any suitable computing device.
在框702处,在第一计算设备处获取要处理的数据分片的标识,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片。例如,计算设备114获取要处理的数据分片104的标识。At block 702, an identification of a data slice to be processed is obtained at a first computing device, the data slice being from a plurality of data slices generated by dividing new version data generated by a data source. For example, computing device 114 obtains an identification of data shard 104 to process.
在一些实例中,与计算设备114上运行的应用相连接的应用控制系统可以获得所有数据分片的信息以及可运行处理数据分片的应用的计算设备的标识,然后为每个计算设备分配可处理的数据分片。因此,计算设备114可以从应用控制系统获得其要处理的数据分片的标识。In some examples, the application control system connected to the application running on the computing device 114 can obtain the information of all the data slices and the identification of the computing devices that can run the application that processes the data slices, and then assign the available data to each computing device. Processed data shards. Accordingly, computing device 114 may obtain from the application control system an identification of a data slice that it is to process.
在框704处,向第二服务器发送针对数据分片的心跳消息以用于接收针对数据分片的操作信息和目标状态信息,操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应。例如,计算设备114向服务器112发送针对数据分片的心跳消息以用于接收针对数据分片的操作信息和目标状态信息。At block 704, a heartbeat message for data fragmentation is sent to the second server for receiving operation information and target state information for data fragmentation, the operation information is related to multiple sequential operations for data fragmentation, and the target state Corresponds to the target operation to be completed in multiple ordered operations. For example, computing device 114 sends a heartbeat message for the data shard to server 112 for receiving operational information and target status information for the data shard.
针对每个数据分片的多个有序操作包括多个操作。该多个有序操作至少包括下载操作和加载操作,这两个操作是处理数据分片的固定操作。用户可以在这两个操作前后设置自定义的多个操作阶段,例如在下载前的自定义的操作,例如创建文件目录等操作,在加载操作后增加自定义的操作,全部操作组成一个数据分片的操作信息。代理服务通过心跳信息将不同数据单元的当前状态上报给第二服务器112上的主控服务,主控服务会根据从代理服务的接收到的心跳信息,下发操作信息和目标状态。其中图8示出了根据本公开的一些实施例的针对数据分片的多个有序操作的示例800的示意图。Multiple ordered operations per data shard include multiple operations. The multiple sequential operations include at least a download operation and a load operation, which are fixed operations for processing data fragments. Users can set multiple custom operation stages before and after these two operations, such as custom operations before downloading, such as creating file directories, and adding custom operations after loading operations. All operations form a data analysis Information about the operation of the slice. The proxy service reports the current status of different data units to the main control service on the second server 112 through the heartbeat information, and the main control service will issue operation information and target status according to the heartbeat information received from the proxy service. 8 shows a schematic diagram of an example 800 of multiple sequential operations on data sharding according to some embodiments of the present disclosure.
如图8所示。数据分片1 802是计算设备114处理的由第一数据源产生的数据分片,数据分片2 806是计算设备114上处理的由第二数据源产生的数据分片。针对数据分片1802存在多个有序操作804,针对数据分片2 806存在多个有序操作808,其中阶段0、阶段3和阶段4对应于用户自定义的其他操作。运行在计算设备114上的代理服务可以通过心跳信息将数据分片1的状态信息发送给服务器112的主控服务,例如与计算设备114运行完成的当前操作相对应的状态信息。同样,对于数据分片2 806,也可以将计算设备114完成的针对该数据分片的当前操作相对应的状态信息发送给服务器112中的主控服务。然后从主控服务接收针对数据分片1的操作信息以及目标状态信息和针对数据分片2的操作信息以及目标状态信息。计算设备114依据接收的操作信息和目标状态信息来对数据分片进行处理直到完成与目标状态信息相对应的操作。As shown in Figure 8. Data slice 1 802 is a data slice generated by the first data source processed by the computing device 114 , and data slice 2 806 is a data slice generated by the second data source processed on the computing device 114 . There are multiple sequential operations 804 for data fragment 1802, and multiple sequential operations 808 for data fragment 2 806, where stage 0, stage 3, and stage 4 correspond to other user-defined operations. The proxy service running on the computing device 114 can send the state information of the data slice 1 to the main control service of the server 112 through the heartbeat information, for example, the state information corresponding to the current operation completed by the computing device 114 . Similarly, for the data slice 2 806 , the status information corresponding to the current operation on the data slice completed by the computing device 114 may also be sent to the main control service in the server 112 . Then the operation information and target state information for data slice 1 and the operation information and target state information for data slice 2 are received from the master service. The computing device 114 processes the data slices according to the received operation information and target state information until the operation corresponding to the target state information is completed.
上面图8示出了针对位于同一计算设备上的来自不同数据源的数据分片的多个有序操作的示意图,下面结合图9介绍针对一个数据分片的多个有序操作。FIG. 8 above shows a schematic diagram of multiple sequential operations for data fragments from different data sources located on the same computing device. The following describes multiple sequential operations for a data fragment in conjunction with FIG. 9 .
如图9所示,计算设备114上的代理服务获取针对数据分片902的操作信息和目标状态信息。该操作信息904包括多个有序操作,其中阶段0、阶段3和阶段4对应于多个自定义操作。计算设备114将目标状态信息和计算设备114上针对该数据分片的当前状态进行比较,如果状态不同,则需要继续对数据分片进行后面的操作,最终实现数据文件的当前目标状态。对于落后的操作,会根据每个阶段的定义重试次数和运行超时时间做失败后置处理,最大程度地实现目标操作的顺利达成,做到数据文件的版本阶段最终一致性。具体,针对每个操作,如果该操作不成功,可以进行重试,如果重试规定的次数不成功或者超过了一定的时间限定,则反馈针对该数据分片的处理不成功。例如加载阶段可以重试20次,并且在1200s内完成,如果未实现,则表明操作失败。因此,通过这种机制可以使得针对数据分片的处理尽可能准确的完成。然后通过控制数据分片的执行,可以实现对数据源产生的所有数据分片的控制。As shown in FIG. 9 , the proxy service on computing device 114 obtains operational information and target state information for data shard 902 . The operation information 904 includes a plurality of ordered operations, wherein stage 0, stage 3 and stage 4 correspond to a plurality of custom operations. The computing device 114 compares the target state information with the current state of the data slice on the computing device 114. If the state is different, it needs to continue to perform subsequent operations on the data slice to finally achieve the current target state of the data file. For backward operations, post-failure processing will be performed according to the defined retry times and operation timeout time of each stage, so as to maximize the smooth achievement of the target operation and achieve the final consistency of the version stage of the data file. Specifically, for each operation, if the operation is unsuccessful, it can be retried. If the specified number of retries is unsuccessful or exceeds a certain time limit, it will be fed back that the processing of the data fragment is unsuccessful. For example, the loading phase can be retried 20 times and completed within 1200s. If it is not implemented, it indicates that the operation failed. Therefore, through this mechanism, the processing for data fragmentation can be completed as accurately as possible. Then, by controlling the execution of data sharding, it is possible to control all data shards generated by the data source.
对于一台新计算机或容器加入到计算机程序的服务集群中时,会根据启动的代理服务与主控服务的交互获取最新目标阶段,依次完成数据配送的目标阶段达成,最终与计算设备集群中的其他计算设备上的服务做到数据配送的一致性。When a new computer or container is added to the service cluster of the computer program, it will obtain the latest target stage according to the interaction between the started agent service and the main control service, complete the target stage of data distribution in turn, and finally communicate with other computing device clusters The service on the computing device achieves the consistency of data distribution.
在一些实施例中,心跳消息包括确定针对数据分片的当前状态。计算设备114利用数据分片的标识和当前状态生成心跳信息。通过该方式,可以生成准确定的心跳信息。In some embodiments, the heartbeat message includes determining the current state for the data shards. Computing device 114 generates heartbeat information using the identification and current state of the data slice. In this way, accurate heartbeat information can be generated.
在一些实施例中,周期性的地发送心跳信息以用于更新处理分片的第一计算设备的当前状态信息,例如每5s发送一次心跳信息。通过该方式,可以快速的更新计算设备的状态。In some embodiments, the heartbeat information is sent periodically for updating the current state information of the first computing device processing the slice, for example, the heartbeat information is sent every 5s. In this way, the status of the computing device can be quickly updated.
在框706处,将数据分片的当前状态与目标状态进行比较。例如,在图1中计算设备114将数据分片的当前状态与获得的目标状态进行比较。在框708处,如果确定当前状态不同于目标状态,继续执行多个有序操作以完成目标操作。如果当前状态与目标在状态相同,表明执行到预定的操作,则不需要再进行后续操作。At block 706, the current state of the data shard is compared to the target state. For example, in FIG. 1 computing device 114 compares the current state of the data slice to the obtained target state. At block 708, if it is determined that the current state is different from the target state, a plurality of sequential operations are continued to complete the target operation. If the current state is the same as the target state, it means that the scheduled operation has been performed, and no subsequent operation is required.
如图10所示,大的数据1002被分割成多个数据分片,例如数据分片1 1004、数据分片2 1006和数据分片3 1008。利用控制多个数据分片的多个有序操作中的目标操作,使代理服务按照目标状态进行执行,例如同时执行到数据下载完成的阶段,待收集全部数据分片完成当前目标操作后,主控服务对多个数据单元同时做下一阶段状态的目标更改,从而实现分布式数据的多个数据单元被计算机程序同时执行加载等操作。As shown in FIG. 10 , large data 1002 is divided into multiple data slices, such as data slice 1 1004 , data slice 2 1006 and data slice 3 1008 . Use the target operation in the multiple ordered operations that control multiple data fragments to make the proxy service execute according to the target state, for example, execute at the same time until the data download is completed. After collecting all the data fragments and completing the current target operation, the main The control service makes changes to the target state of the next stage of multiple data units at the same time, so that multiple data units of distributed data can be loaded and other operations performed by computer programs at the same time.
在一些实施例中,其中多个有序操作包括下载操作,下载操作包括数据分片的存储地址。计算设备114还执行下载操作从存储地址获取数据分片。通过该方式,可以快速的从数据分片的存储位置获取到数据分片。In some embodiments, the plurality of sequential operations include a download operation, and the download operation includes storage addresses of data segments. Computing device 114 also performs a download operation to obtain data slices from storage addresses. In this manner, the data fragments can be quickly obtained from the storage locations of the data fragments.
在一些实施例中,在已为各个运行处理数据分片的程序的计算设备分配数据分片后,还会根据计算设备的处理情况来动态调整数据分片在不同计算设备上的处理。如果确定分配给该计算设备数据分片要由第三计算设备上的应用执行,则需要从该计算设备运行的程序的描述文件中删除数据分片的标识,从而不再对该数据分片进行处理。通过该方法,可以快速的实现数据的转移。In some embodiments, after the data slices are allocated to each computing device running the program for processing the data slices, the processing of the data slices on different computing devices is dynamically adjusted according to the processing conditions of the computing devices. If it is determined that the data slice assigned to the computing device is to be executed by an application on the third computing device, the identification of the data slice needs to be deleted from the description file of the program running on the computing device, so that the data slice is no longer processed. deal with. Through this method, data transfer can be quickly realized.
在一些实施例中,如果该计算设备的处理能力较强,已经没有数据分片处理,可以将分配给其他计算设备的数据分片调整到该计算设备处理。例如,管理各计算设备运行程序的程序控制系统会将该数据分片调整到由计算设备114处理。如果确定接收到转移数据分片的标识,向第二服务器发送包括转移数据分片的标识的心跳信息以用于获取针对转移数据分片的操作信息和目标状态。计算设备114基于转移数据分片的操作信息和目标状态来处理转移数据分片。通过该方法,可以快速的实现转移数据分片的处理。In some embodiments, if the processing capability of the computing device is relatively strong and there is no data fragment processing, data fragments allocated to other computing devices may be adjusted to be processed by the computing device. For example, a program control system managing the programs run by each computing device would coordinate the data slices for processing by computing device 114 . If it is determined that the identification of the transferred data segment is received, the heartbeat information including the identification of the transferred data segment is sent to the second server for acquiring operation information and target status for the transferred data segment. Computing device 114 processes the transfer data segment based on the operational information and the target state of the transfer data segment. Through this method, the processing of transferring data fragments can be realized quickly.
在一个示例中,计算设备114获得转移数据分片的标识,然后将该标识加入在计算设备114上运行的处理数据分片的应用的数据描述文件中。然后代理会从数据描述文件中获取该标识,然后通过向主控服务发送包括该转移数据分片的标识的心跳信息来查找对应的操作信息和目标状态信息。In one example, the computing device 114 obtains an identifier of the transferred data segment, and then adds the identifier into a data description file of an application running on the computing device 114 that processes the data segment. Then the agent will obtain the identifier from the data description file, and then search for corresponding operation information and target status information by sending heartbeat information including the identifier of the transfer data fragment to the main control service.
如图11所示,在计算设备1 1110中运行处理数据分片的应用1114,在计数设备2内运行处理数据分片的应用1128。针对应用1114的代理1118和针对应用1128的代理1124与主控服务1102通信。数据分片1 1120和数据分片2 1122分配给计算设备1中的应用1114处理,数据分片3 1130分配给计算设备2中的应用1128处理。在计算机应用1114中,程序所使用的分片数据文件的标识记录在针对该应用的数据描述文件1116和1126中。As shown in FIG. 11 , an application 1114 for processing data slices is running in computing device 1 1110 , and an application 1128 for processing data slices is running in counting device 2 . Proxy 1118 for application 1114 and proxy 1124 for application 1128 communicate with hosting service 1102 . The data slice 1 1120 and the data slice 2 1122 are allocated to the application 1114 in the computing device 1 for processing, and the data slice 3 1130 is allocated to the application 1128 in the computing device 2 for processing. In the computer application 1114, the identifiers of the sliced data files used by the program are recorded in the data description files 1116 and 1126 for the application.
当程序应用控制系统将数据分片2 1122从由应用1114处理变更至由应用1128处理时,首先在针对应用1128的数据描述文件1126中加入新分片描述,这时候应用1128的数据描述文件内容变更为数据分片2和数据分片3。应用1128的代理1124使用数据描述信息与主控服务1102进行心跳通信,这时主控服务首先通过一级查询在对应配送版本信息的计算设备集群的关键信息ig_key2 1106中查找与数据分片2对应的信息,结果未查询到数据分片2的操作信息和目标状态信息,因为其被分配给关键信息为ig_key1 1104的计算设备集群中。此时,主控服务进行第二级查询,对包含全局信息的词典信息维度的存储信息Field_key 1106进行查找,可查询到数据分片2数据配送的当前版本目标阶段信息以及操作信息,从而下发给代理服务正确的数据分片2的信息。代理服务获取到数据分片的操作信息和目标状态信息后,会完对该数据分片的操作。备选地或附加地,在针对应用1114的数据描述文件1116中删除数据分片2 1122的标识。When the program application control system changes the data fragment 2 1122 from being processed by the application 1114 to being processed by the application 1128, first add a new fragment description in the data description file 1126 for the application 1128, and at this time the content of the data description file of the application 1128 Change to Data Shard 2 and Data Shard 3. The agent 1124 of the application 1128 uses the data description information to perform heartbeat communication with the main control service 1102. At this time, the main control service first searches the key information ig_key2 1106 of the computing device cluster corresponding to the distribution version information to find the corresponding data slice 2 through a first-level query. As a result, the operation information and target state information of data slice 2 are not queried, because it is assigned to the computing device cluster whose key information is ig_key1 1104 . At this time, the main control service performs a second-level query, searches the stored information Field_key 1106 of the dictionary information dimension that contains global information, and can query the current version target stage information and operation information of the data distribution of data fragment 2, and then deliver Give the proxy service the correct data shard 2 information. After the proxy service obtains the operation information and target status information of the data shard, it will complete the operation on the data shard. Alternatively or additionally, the identification of data slice 2 1122 is deleted in the data description file 1116 for the application 1114 .
通过该方法,可以实现数据配送阶段的最终状态一致性,显著降低数据配送系统的复杂度,提高数据处理的稳定性。Through this method, the final state consistency in the data distribution stage can be realized, the complexity of the data distribution system can be significantly reduced, and the stability of data processing can be improved.
上面结合图7-11描述了本公开的多个实施例的用于处理数据的方法700。下面结合图12描述根据本公开的一些实施例的用于处理数据的系统1200的示意图。如图12所示,系统包括构建模块1204。该构建模块1204负责对数据源的更新检查,定期巡检每一个数据文件是否有最新内容变更,做历史版本的信息比较判定,如检查发现有新数据内容则产生数据文件的一个新版本数据,与数据配送的模板配置信息进行组装,产生一次要执行的分布式数据配送,交给状态机驱动器1206进行执行。状态机驱动器1206负责对分布式数据配送的状态推荐,根据单一模式和分片模式进行不同的驱动器控制,对每个阶段内的配送目标机器进行协同,实现分片数据的状态协同和进度控制。实现分布式数据的协同操作后,对当前阶段状态进行回收检查,并推动执行到下一个阶段,直至整体数据配送结束。The method 700 for processing data according to multiple embodiments of the present disclosure has been described above with reference to FIGS. 7-11 . A schematic diagram of a system 1200 for processing data according to some embodiments of the present disclosure is described below with reference to FIG. 12 . As shown in FIG. 12 , the system includes building blocks 1204 . The building block 1204 is responsible for checking the update of the data source, periodically inspecting whether each data file has the latest content change, making a comparison and judgment of the information of the historical version, and generating a new version of the data file if the inspection finds that there is new data content, Assembled with the template configuration information of data distribution, a distributed data distribution to be executed is generated, and delivered to the state machine driver 1206 for execution. The state machine driver 1206 is responsible for recommending the state of distributed data distribution, performing different driver controls according to the single mode and fragmentation mode, coordinating the distribution target machines in each stage, and realizing state coordination and progress control of fragmented data. After the collaborative operation of distributed data is realized, the status of the current stage is recovered and checked, and the execution is promoted to the next stage until the end of the overall data distribution.
系统1200还包括主控服务模块1208。该主控服务模块1208接收和更新每个操作的数据文件版本信息,维护当前版本信息并与代理服务维护通信,承接代理服务的心跳信息,向代理服务传递操作信息和目标状态信息。同时汇总计算设备的数据分发状态,记录程序的实例版本信息。System 1200 also includes a master service module 1208 . The main control service module 1208 receives and updates the data file version information of each operation, maintains the current version information and maintains communication with the proxy service, accepts the heartbeat information of the proxy service, and transmits operation information and target status information to the proxy service. At the same time, the data distribution status of computing devices is summarized, and the instance version information of the program is recorded.
系统1200还包括代理模块1212。代理模块1212负责声明式的数据配送执行,根据从主控服务拿到的信息和最终状态,对数据版本进行状态更新。收集代理服务所在计算机环境信息和当前数据状态,定其上报收集的数据,获取主控服务下发的最终数据。同样,主控服务1208还会处理其他的数据1210和数据1214。System 1200 also includes an agent module 1212 . The proxy module 1212 is responsible for declarative data distribution execution, and updates the status of the data version according to the information obtained from the main control service and the final status. Collect the computer environment information and current data status where the agent service is located, determine the data collected by it, and obtain the final data issued by the main control service. Likewise, the host service 1208 also processes other data 1210 and data 1214 .
因此,对于一次新版本数据的配送过程,首先数据源端产出新内容,构建模块对数据进行感知,与配送模块进行组装和决策,产出一次数据配送发起执行。数据配送的每个阶段由状态驱动器进行控制,下发给主控服务模块。主控服务模块在接收代理服务的心跳上报信息时,将数据文件新版本信息通知下发。在代理服务中根据数据文件的版本信息和最终状态做一致性追赶和完成数据配送。Therefore, for a distribution process of a new version of data, first, the data source generates new content, the building module senses the data, assembles and makes decisions with the distribution module, and generates a data distribution to initiate execution. Each stage of data distribution is controlled by the state driver and sent to the main control service module. When the main control service module receives the heartbeat report information of the proxy service, it notifies and sends the new version information of the data file. In the agent service, according to the version information and final state of the data file, the consistency catch-up and data distribution are completed.
图13示出了根据本公开实施例的用于处理数据的装置1300的示意性框图。如图13所示,装置1300包括:分配信息获取模块1302,被配置为在第一服务器处获取由数据源产生的新版本数据的分配信息,分配信息包括新版本数据被划分成的多个数据分片的多个存储地址和与针对新版本数据的多个有序操作相对应的操作信息;操作信息配置模块1304,被配置为使用多个数据分片中的数据分片的存储地址配置所述操作信息以生成针对数据分片的经配置的操作信息;目标状态信息确定模块1306,被配置为确定与多个有序操作中要完成的目标操作相对应的目标状态信息;以及发送模块1308,被配置为向第二服务器发送针对数据分片的经配置的操作信息和目标状态信息以用于处理数据分片。Fig. 13 shows a schematic block diagram of an apparatus 1300 for processing data according to an embodiment of the present disclosure. As shown in FIG. 13 , the apparatus 1300 includes: a distribution information acquisition module 1302 configured to obtain, at the first server, the distribution information of the new version data generated by the data source, and the distribution information includes a plurality of data into which the new version data is divided. A plurality of storage addresses of the fragments and operation information corresponding to a plurality of sequential operations for the new version data; the operation information configuration module 1304 is configured to use the storage addresses of the data fragments among the plurality of data fragments to configure all The above operation information is used to generate configured operation information for data fragmentation; the target state information determination module 1306 is configured to determine the target state information corresponding to the target operation to be completed in the plurality of ordered operations; and the sending module 1308 , configured to send the configured operation information and target state information for the data shards to the second server for processing the data shards.
在一些实施例中,其中操作信息配置模块1304包括:存储地址获取模块,被配置为从分配信息中获取针对数据分片的存储地址;以及存储地址关联模块,被配置为将存储地址关联到操作信息中的部分信息中以生成针对数据分片的经配置的操作信息,部分信息对应于多个有序操作中的下载操作。In some embodiments, the operation information configuration module 1304 includes: a storage address obtaining module, configured to obtain the storage address for the data slice from the allocation information; and a storage address association module, configured to associate the storage address with the operation Part of the information in the information is used to generate configured operation information for data fragments, and the part of information corresponds to a download operation among the multiple sequential operations.
在一些实施例中,其中分配信息还包括与新版本数据的多个发布阶段中的每个发布阶段相对应的目标计算设备集群的目标集群标识,其中目标状态信息确定模块1306包括:目标集群标识确定模块,被配置为针对发布阶段,确定与发布阶段相对应的目标集群标识;目标操作确定模块,被配置为确定多个有序操作中要完成的目标操作;以及目标状态信息生成模块,被配置为基于目标操作和目标集群标识生成目标状态信息。In some embodiments, the allocation information further includes a target cluster identifier of a target computing device cluster corresponding to each of the multiple release stages of the new version data, wherein the target state information determining module 1306 includes: the target cluster identifier The determination module is configured to determine the target cluster identifier corresponding to the release phase for the release phase; the target operation determination module is configured to determine the target operation to be completed in a plurality of sequential operations; and the target state information generation module is Configured to generate target status information based on the target operation and the target cluster ID.
在一些实施例中,装置1300还包括:当前状态信息获取模块,被配置为获取与目标集群标识相对应的多个计算设备的当前状态信息,当前状态信息指示多个计算设备已完成的多个有序操作中的操作;匹配确定模块,被配置为确定当前状态信息与目标状态信息是否匹配;更新模块,被配置为如果确定当前状态信息与目标状态信息匹配,将目标状态信息更新为与目标操作的下一操作相对应;以及更新发送模块,被配置为向第二服务器发送经更新的目标状态以用于完成所述下一操作。In some embodiments, the apparatus 1300 further includes: a current status information acquisition module configured to acquire the current status information of multiple computing devices corresponding to the target cluster identifier, the current status information indicating that the multiple computing devices have completed multiple An operation in an ordered operation; a matching determination module configured to determine whether the current state information matches the target state information; an update module configured to update the target state information to match the target state information if it is determined that the current state information matches the target state information corresponding to the next operation of the operation; and an update sending module configured to send the updated target state to the second server for completing the next operation.
在一些实施例中,装置1300还包括:监测模块,被配置为监测由数据源生成的数据的版本是否发生变化;以及第一划分模块,被配置为如果确定数据的版本发生变化,将新版本数据划分为多个数据分片。In some embodiments, the apparatus 1300 further includes: a monitoring module configured to monitor whether the version of the data generated by the data source changes; and a first division module configured to convert the new version Data is divided into multiple data shards.
在一些实施例中,其中第一划分模块包括:预定数据获取模块,被配置为获取要生成的数据分片的预定数目;以及第二划分模块,被配置为数基于预定数目来划分新版本数据。In some embodiments, the first division module includes: a predetermined data acquisition module configured to acquire a predetermined number of data fragments to be generated; and a second division module configured to divide the new version data based on the predetermined number.
图14示出了根据本公开实施例的用于处理数据的装置1400的示意性框图。如图14所示,装置1400包括:操作信息和目标状态信息接收模块1402,被配置为在第二服务器处从第一服务器接收针对数据分片的经配置的操作信息和目标状态信息,所述数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片,经配置的操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应;第一操作信息和目标状态信息发送模块1404,被配置为响应于从第一计算设备接收到针对数据分片的第一心跳信息,向第一计算设备发送操作信息和目标状态信息,第一心跳信息包括针对数据分片的当前状态;以及更新模块1406,被配置为利用数据分片的当前状态来更新第一计算设备的当前状态信息。Fig. 14 shows a schematic block diagram of an apparatus 1400 for processing data according to an embodiment of the present disclosure. As shown in FIG. 14 , the apparatus 1400 includes: an operation information and target state information receiving module 1402 configured to receive, at the second server, configured operation information and target state information for data fragments from the first server, the Data sharding comes from multiple data shards generated by dividing the new version data generated by the data source. The configured operation information is related to multiple ordered operations for data shards, and the target state is related to multiple ordered operations. corresponding to the target operation to be completed; the first operation information and target state information sending module 1404 is configured to send the operation to the first computing device in response to receiving the first heartbeat information for data fragmentation from the first computing device information and target state information, the first heartbeat information includes a current state for the data slice; and an update module 1406 configured to update the current state information of the first computing device with the current state of the data slice.
在一些实施例中,装置1400还包括:经更新的目标状态信息接收模块,被配置为从第一服务器接收经更新的目标状态信息,经更新的目标状态信息与目标操作的下一操作相对应;以及操作信息和经更新的所述目标状态信息发送模块,被配置为响应于接收到针对数据分片的第二心跳信息,向第一计算设备发送操作信息和经更新的所述目标状态信息以用于完成下一操作。In some embodiments, the apparatus 1400 further includes: an updated target state information receiving module configured to receive updated target state information from the first server, the updated target state information corresponds to the next operation of the target operation and the operation information and the updated target state information sending module, configured to send the operation information and the updated target state information to the first computing device in response to receiving the second heartbeat information for the data fragmentation to complete the next operation.
在一些实施例中,其中所述第一心跳信息还包括第一计算设备所位于的参考计算设备集群的参考集群标识,目标状态信息包括与新版本数据的多个发布阶段中的每个发布阶段相对应的目标计算设备集群的目标集群标识;其中第一操作信息和目标状态信息发送模块1404包括:匹配模块,被配置为将参考集群标识与目标集群标识进行匹配;以及第二操作信息和目标状态信息发送模块,被配置为如果确定参考集群标识与目标集群标识匹配,则向第一计算设备发送操作信息和目标状态信息。In some embodiments, the first heartbeat information further includes the reference cluster identifier of the reference computing device cluster where the first computing device is located, and the target state information includes information about each of the multiple release stages of the new version data. The target cluster identifier of the corresponding target computing device cluster; wherein the first operation information and target status information sending module 1404 includes: a matching module configured to match the reference cluster identifier with the target cluster identifier; and the second operation information and the target cluster identifier The status information sending module is configured to send the operation information and the target status information to the first computing device if it is determined that the reference cluster ID matches the target cluster ID.
在一些实施例中,装置1400还包括:第一查找模块,被配置为响应于从第一计算设备接收到包括转移数据分片的标识的第三心跳信息,在与参考计算设备集群相关联的部分数据分片的第一状态信息中查找转移数据分片的操作信息和目标状态信息,转移数据分片从第二计算设备转移到第一计算设备上;第二查找模块,被配置为如果在第一状态信息中未查找到转移数据分片的操作信息和目标状态,在与多个数据分片相对应的第二状态信息中来查找转移数据分片的操作信息和目标状态;以及查找结果发送模块,被配置为向第一计算设备发送查找到的转移数据分片的操作信息和目标状态。In some embodiments, the apparatus 1400 further includes: a first search module configured to, in response to receiving from the first computing device third heartbeat information including an identification of the transfer data segment, in the cluster associated with the reference computing device Find the operation information and target state information of the transfer data fragment in the first state information of the partial data fragment, and transfer the data fragment from the second computing device to the first computing device; the second search module is configured to if in The operation information and target state of transferring data fragments are not found in the first state information, and the operation information and target state of transferring data fragments are found in the second state information corresponding to multiple data fragments; and the search result A sending module configured to send the found operation information and target status of the transferred data fragments to the first computing device.
图15示出了根据本公开实施例的用于处理数据的装置1500的示意性框图。如图15所示,装置1500包括:标识获取模块1502,被配置为在第一计算设备处获取要处理的数据分片的标识,数据分片来自对由数据源产生的新版本数据进行划分而生成的多个数据分片;心跳信息发送模块1504,被配置为向第二服务器发送针对数据分片的心跳消息以用于接收针对数据分片的操作信息和目标状态信息,操作信息与针对数据分片的多个有序操作有关,目标状态与多个有序操作中要完成的目标操作相对应;比较模块1506,被配置为将数据分片的当前状态与目标状态进行比较;以及操作执行模块1508,被配置为如果确定当前状态不同于目标状态,继续执行多个有序操作以完成目标操作。Fig. 15 shows a schematic block diagram of an apparatus 1500 for processing data according to an embodiment of the present disclosure. As shown in FIG. 15 , the apparatus 1500 includes: an identification acquisition module 1502 configured to acquire, at the first computing device, the identification of the data fragment to be processed, and the data fragment comes from dividing the new version data generated by the data source. A plurality of generated data fragments; the heartbeat information sending module 1504, configured to send a heartbeat message for the data fragments to the second server for receiving operation information and target state information for the data fragments, and the operation information is related to the data The multiple ordered operations of the slice are related, and the target state corresponds to the target operation to be completed in the multiple ordered operations; the comparison module 1506 is configured to compare the current state of the data slice with the target state; and the operation execution Module 1508, configured to continue executing multiple sequential operations to complete the target operation if it is determined that the current state is different from the target state.
在一些实施例中,其中所述心跳信息发送模块1504包括:当前状态确定模块,被配置为确定针对数据分片的当前状态;以及心跳信息生成模块,被配置为基于数据分片的标识和当前状态生成心跳信息。In some embodiments, the heartbeat information sending module 1504 includes: a current state determination module configured to determine the current state of the data fragmentation; Status generates heartbeat information.
在一些实施例中,其中心跳信息发送模块1504还包括:周期性模块,被配置为周期性的地发送心跳信息以用于更新处理分片的第一计算设备的当前状态信息。In some embodiments, the heartbeat information sending module 1504 further includes: a periodic module configured to periodically send heartbeat information for updating the current state information of the first computing device processing the slice.
在一些实施例中,其中所述多个有序操作包括下载操作,下载操作包括数据分片的存储地址;装置还包括:下载操作执行模块,被配置为执行下载操作从存储地址获取数据分片。In some embodiments, wherein the plurality of ordered operations includes a download operation, and the download operation includes a storage address of the data slice; the device further includes: a download operation execution module configured to execute the download operation to obtain the data slice from the storage address .
在一些实施例中,装置1500还包括:删除模块,被配置为如果确定数据分片要由第三计算设备上的应用执行,删除数据分片的标识。In some embodiments, the apparatus 1500 further includes: a deletion module configured to delete the identifier of the data slice if it is determined that the data slice is to be executed by an application on the third computing device.
在一些实施例中,装置1500还包括:发送模块,被配置为如果确定接收到转移数据分片的标识,向第二服务器发送包括转移数据分片的标识的心跳信息以用于获取针对转移数据分片的操作信息和目标状态,转移数据分片从第二计算设备转移到所述第一计算设备上;以及转移数据分片处理模块,被配置为基于转移数据分片的操作信息和目标状态来处理转移数据分片。In some embodiments, the apparatus 1500 further includes: a sending module configured to, if it is determined that the identifier of the transferred data fragment is received, send heartbeat information including the identifier of the transferred data fragment to the second server for obtaining operation information and target state of the slice, the transfer data slice is transferred from the second computing device to the first computing device; and the transfer data slice processing module is configured to transfer the data slice based on the operation information and the target state to handle transferring data shards.
本公开的技术方案中,所涉及的用户个人信息的获取,存储和应用等,均符合相关法律法规的规定,且不违背公序良俗。In the technical solution of the present disclosure, the acquisition, storage and application of the user's personal information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。According to the embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
图16示出了可以用来实施本公开的实施例的示例电子设备1600的示意性框图。该示例电子设备1600可用于实现图1中的服务器106、服务器112和计算设备114。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。FIG. 16 shows a schematic block diagram of an example electronic device 1600 that may be used to implement embodiments of the present disclosure. The example electronic device 1600 may be used to implement server 106 , server 112 , and computing device 114 in FIG. 1 . Electronic device is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图16所示,设备1600包括计算单元1601,其可以根据存储在只读存储器(ROM)1602中的计算机程序或者从存储单元1608加载到随机访问存储器(RAM)1603中的计算机程序,来执行各种适当的动作和处理。在RAM 1603中,还可存储设备1600操作所需的各种程序和数据。计算单元1601、ROM 1602以及RAM 1603通过总线1604彼此相连。输入/输出(I/O)接口1605也连接至总线1604。As shown in FIG. 16, the device 1600 includes a computing unit 1601 that can execute according to a computer program stored in a read-only memory (ROM) 1602 or loaded from a storage unit 1608 into a random access memory (RAM) 1603. Various appropriate actions and treatments. In the RAM 1603, various programs and data necessary for the operation of the device 1600 can also be stored. The calculation unit 1601 , the ROM 1602 and the RAM 1603 are connected to each other through a bus 1604 . An input/output (I/O) interface 1605 is also connected to the bus 1604 .
设备1600中的多个部件连接至I/O接口1605,包括:输入单元1606,例如键盘、鼠标等;输出单元1607,例如各种类型的显示器、扬声器等;存储单元1608,例如磁盘、光盘等;以及通信单元1609,例如网卡、调制解调器、无线通信收发机等。通信单元1609允许设备1600通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据。Multiple components in the device 1600 are connected to the I/O interface 1605, including: an input unit 1606, such as a keyboard, a mouse, etc.; an output unit 1607, such as various types of displays, speakers, etc.; a storage unit 1608, such as a magnetic disk, an optical disk, etc. ; and a communication unit 1609, such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1609 allows the device 1600 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
计算单元1601可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1601的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1601执行上文所描述的各个方法和处理,例如方法200、500和700。例如,在一些实施例中,方法200、500和700可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1608。在一些实施例中,计算机程序的部分或者全部可以经由ROM 1602和/或通信单元1609而被载入和/或安装到设备1600上。当计算机程序加载到RAM 1603并由计算单元1601执行时,可以执行上文描述的方法200、500和700的一个或多个步骤。备选地,在其他实施例中,计算单元1601可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行方法200、500和700。The computing unit 1601 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of computing units 1601 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 1601 executes various methods and processes described above, such as methods 200 , 500 and 700 . For example, in some embodiments, methods 200 , 500 , and 700 may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1608 . In some embodiments, part or all of the computer program may be loaded and/or installed on the device 1600 via the ROM 1602 and/or the communication unit 1609 . When a computer program is loaded into RAM 1603 and executed by computing unit 1601, one or more steps of methods 200, 500 and 700 described above may be performed. Alternatively, in other embodiments, the computing unit 1601 may be configured to execute the methods 200, 500 and 700 in any other suitable manner (for example, by means of firmware).
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。Various implementations of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips Implemented in a system of systems (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor Can be special-purpose or general-purpose programmable processor, can receive data and instruction from storage system, at least one input device, and at least one output device, and transmit data and instruction to this storage system, this at least one input device, and this at least one output device an output device.
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide for interaction with the user, the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system. The components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务器可以是云服务器,也可以为分布式系统的服务器,或者是结合了区块链的服务器。A computer system may include clients and servers. Clients and servers are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。The specific implementation manners described above do not limit the protection scope of the present disclosure. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be included within the protection scope of the present disclosure.
Claims (30)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110737890.4A CN113687846B (en) | 2021-06-30 | 2021-06-30 | Method, device, device and readable storage medium for processing data |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202110737890.4A CN113687846B (en) | 2021-06-30 | 2021-06-30 | Method, device, device and readable storage medium for processing data |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN113687846A CN113687846A (en) | 2021-11-23 |
| CN113687846B true CN113687846B (en) | 2023-07-18 |
Family
ID=78576826
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202110737890.4A Active CN113687846B (en) | 2021-06-30 | 2021-06-30 | Method, device, device and readable storage medium for processing data |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN113687846B (en) |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114840544A (en) * | 2022-05-07 | 2022-08-02 | 百度在线网络技术(北京)有限公司 | Data publishing method, data updating method, device, equipment and storage medium |
| CN119807213B (en) * | 2024-12-20 | 2025-10-10 | 北京百度网讯科技有限公司 | Data updating method, device, equipment and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014079348A1 (en) * | 2012-11-26 | 2014-05-30 | Tencent Technology (Shenzhen) Company Limited | Software download method and software download apparatus |
| CN104239417A (en) * | 2014-08-19 | 2014-12-24 | 天津南大通用数据技术股份有限公司 | Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database |
| CN107895023A (en) * | 2017-11-16 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | A kind of view data quality detecting method, device, server and storage medium |
| WO2018087311A1 (en) * | 2016-11-10 | 2018-05-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Resource segmentation to improve delivery performance |
| CN109088929A (en) * | 2018-08-09 | 2018-12-25 | 北京百度网讯科技有限公司 | For sending the method and device of information |
| CN110830580A (en) * | 2019-11-12 | 2020-02-21 | 腾讯云计算(北京)有限责任公司 | Storage data synchronization method and device |
| CN112148350A (en) * | 2020-09-04 | 2020-12-29 | 深圳市大富网络技术有限公司 | Remote version management method for works, electronic device and computer storage medium |
-
2021
- 2021-06-30 CN CN202110737890.4A patent/CN113687846B/en active Active
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2014079348A1 (en) * | 2012-11-26 | 2014-05-30 | Tencent Technology (Shenzhen) Company Limited | Software download method and software download apparatus |
| CN104239417A (en) * | 2014-08-19 | 2014-12-24 | 天津南大通用数据技术股份有限公司 | Dynamic adjustment method and dynamic adjustment device after data fragmentation in distributed database |
| WO2018087311A1 (en) * | 2016-11-10 | 2018-05-17 | Telefonaktiebolaget Lm Ericsson (Publ) | Resource segmentation to improve delivery performance |
| CN107895023A (en) * | 2017-11-16 | 2018-04-10 | 百度在线网络技术(北京)有限公司 | A kind of view data quality detecting method, device, server and storage medium |
| CN109088929A (en) * | 2018-08-09 | 2018-12-25 | 北京百度网讯科技有限公司 | For sending the method and device of information |
| CN110830580A (en) * | 2019-11-12 | 2020-02-21 | 腾讯云计算(北京)有限责任公司 | Storage data synchronization method and device |
| CN112148350A (en) * | 2020-09-04 | 2020-12-29 | 深圳市大富网络技术有限公司 | Remote version management method for works, electronic device and computer storage medium |
Non-Patent Citations (2)
| Title |
|---|
| 分布式流处理技术综述;崔星灿;禹晓辉;刘洋;吕朝阳;;计算机研究与发展(第02期);全文 * |
| 工程数据库管理系统中版本的动态管理与控制;钟毓宁;谢月云;翁平;杨叔子;;武汉理工大学学报(信息与管理工程版)(第01期);全文 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113687846A (en) | 2021-11-23 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11829742B2 (en) | Container-based server environments | |
| CN106164865B (en) | Method and system for dependency-aware transaction batching for data replication | |
| CN107273540B (en) | Distributed search and index update method, system, server and computer equipment | |
| CN113760847A (en) | Log data processing method, device, equipment and storage medium | |
| US11263174B2 (en) | Reducing resource consumption in container image management | |
| CN113760638B (en) | A log service method and device based on kubernetes cluster | |
| US11210277B2 (en) | Distributing and processing streams over one or more networks for on-the-fly schema evolution | |
| CN112181942A (en) | Time series database system and data processing method and device | |
| CN112541513B (en) | Model training method, device, equipment and storage medium | |
| CN113687846B (en) | Method, device, device and readable storage medium for processing data | |
| CN114391141A (en) | Automatic derivation of shard key values and transparent multi-shard transaction and query support | |
| WO2022174553A1 (en) | File processing method and apparatus, electronic device, and storage medium | |
| CN117667102A (en) | Dependency analysis method, device, system and storage medium | |
| CN116049142A (en) | Data processing method, device, electronic device and storage medium | |
| US20250077301A1 (en) | Statefulsets graceful termination for cloud computing platforms | |
| CN114444719A (en) | Model updating method, apparatus, storage medium and electronic device | |
| CN106294496A (en) | A kind of data migration method based on hadoop cluster and instrument | |
| CN111752960B (en) | Data processing method and device | |
| CN113312351A (en) | Data processing method and device | |
| CN113360689A (en) | Image retrieval system, method, related device and computer program product | |
| CN110196854A (en) | Data processing method and device | |
| US20210211520A1 (en) | Environment for developing of distributed multicloud applications | |
| US20250202978A1 (en) | Mirror processing method and apparatus, device, storage medium, and program product | |
| CN119829546A (en) | Data processing method and device, electronic equipment and storage medium | |
| CN118503325A (en) | Data processing method, apparatus, computer device, storage medium, and program product |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |