WO2024011896A1 - Data processing method and device and storage medium - Google Patents

Data processing method and device and storage medium Download PDF

Info

Publication number
WO2024011896A1
WO2024011896A1 PCT/CN2023/074616 CN2023074616W WO2024011896A1 WO 2024011896 A1 WO2024011896 A1 WO 2024011896A1 CN 2023074616 W CN2023074616 W CN 2023074616W WO 2024011896 A1 WO2024011896 A1 WO 2024011896A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
host
identification information
pod
category
Prior art date
Application number
PCT/CN2023/074616
Other languages
French (fr)
Chinese (zh)
Inventor
刘土明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2024011896A1 publication Critical patent/WO2024011896A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/042Network management architectures or arrangements comprising distributed management centres cooperatively managing the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/78Architectures of resource allocation
    • H04L47/783Distributed allocation of resources, e.g. bandwidth brokers

Definitions

  • the present disclosure relates to the field of data management technology, and in particular, to a data processing method, equipment and storage medium.
  • HDFS Hadoop Distributed File System
  • Ceph Ceph
  • GIS Geographic Information System
  • the GIS system reads and writes data
  • the distributed file system needs to serialize and deserialize a large amount of data.
  • technical problems such as complex data processing and low data processing efficiency, which affect the user experience.
  • the present disclosure provides a data processing method, equipment and storage medium, aiming to solve the technical problems of complex data processing procedures and low data processing efficiency.
  • the present disclosure provides a data processing method, which includes: in response to a data acquisition request sent by a terminal device, determining identification information of the first data based on the data acquisition request; and determining each host and machine according to the identification information of the first data.
  • Each host has a corresponding POD; according to the preset data acquisition rules, the second data is obtained from at least one POD, wherein the second data is processed by the host in at least one POD according to the preset data processing rules. The data is processed and the second data is sent to the terminal device.
  • the present disclosure also provides a data processing device, including a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for realizing connection communication between the processor and the memory, When the computer program is executed by the processor, the steps of any data processing method provided by this disclosure are implemented.
  • the present disclosure also provides a storage medium for computer-readable storage.
  • the storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the present disclosure. Steps for any data processing method provided in the instructions.
  • Figure 1 is a schematic flow chart of a data processing method provided by the present disclosure
  • Figure 2 is a schematic diagram of a data processing framework related to an embodiment of the present disclosure
  • Figure 3 is a data processing interaction diagram related to an embodiment of the present disclosure
  • Figure 4 is a data processing interaction diagram related to an embodiment of the present disclosure
  • Figure 5 is a schematic structural block diagram of a data processing device provided by the present disclosure.
  • a distributed file system is used in a GIS system
  • the GIS system reads and writes data
  • the distributed file system needs to perform serialization, deserialization, and network transmission of a large amount of data.
  • the data processing flow is Complexity can easily lead to low data processing efficiency and affect the user experience.
  • the present disclosure provides a data processing method device and a storage medium.
  • the data processing method can be applied to terminal equipment, which can be electronic equipment such as mobile phones, tablet computers, notebook computers, and desktop computers. It can also be applied to servers.
  • the server can be a separate server or can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, Cloud servers for basic cloud computing services such as Content Delivery Network (CDN) and big data and artificial intelligence platforms.
  • CDN Content Delivery Network
  • Figure 1 is a schematic flow chart of a data processing method provided by the present disclosure.
  • FIG. 2 is a schematic diagram of a data processing framework according to an embodiment of the present disclosure.
  • the data processing framework is deployed in a Kubernetes environment.
  • the data processing framework includes a Master, a host, and a POD.
  • the Master can connect to and communicate with multiple hosts, and the host can accommodate multiple PODs.
  • the Master is used to manage clusters in the Kubernetes environment, such as hosts and PODs; the hosts are used to communicate with the Master; the POD is the smallest deployment unit in the Kubernetes environment and is used to store data.
  • the data processing method includes steps S101 to S104.
  • Step S101 In response to the data acquisition request sent by the terminal device, determine the identification information of the first data based on the data acquisition request.
  • the data processing method can be applied to any system that requires distributed storage of a large amount of data or the need to call a large amount of data, such as a GIS system, to process relevant data of the GIS system.
  • a GIS system in response to the user's related operations on the terminal device, the GIS system needs to present the corresponding geographical image to the user. For example, the presentation of geographical images requires obtaining corresponding data from the GIS system and processing the acquired data to obtain geographical images.
  • the data distribution of the GIS system is stored in a POD in the data processing framework.
  • the terminal device requests to render a certain geographed image, it is equivalent to receiving a data acquisition request sent by the terminal device.
  • the Master in the data processing framework can analyze and determine the first data required to present the corresponding geographical image based on the data acquisition request. For example, the Master responds to the data acquisition request sent by the terminal device, based on number Determine the identification information of the first data according to the acquisition request.
  • Step S102 Determine each host and the POD corresponding to each host according to the identification information of the first data.
  • the Master can determine the storage location of the first data based on the identification information of the first data, for example, determine each host where each POD storing the first data is located and the corresponding POD.
  • determining each host and the POD corresponding to each host according to the identification information of the first data includes: according to the identification information of the first data, from the preset data identification information, the host identification In the association between the information and the POD identification information, each host and the corresponding POD of each host are determined.
  • the database can be a lightweight database.
  • the database can be used to store preset data identification information, host identification information and POD. Identify the association relationship between the information and/or store the second data obtained from the POD.
  • HDFS needs to respond to data acquisition instructions, obtain the first data, and perform serialization, network transmission, deserialization, etc. on the first data.
  • any data is stored in the form of objects in the Master in HDFS. For example, if each object occupies about 150 bytes, then when there are one million objects, the Master in HDFS stores each object.
  • multiple databases are provided in the Master to store various objects, such as the association between preset data identification information, host identification information and POD identification information and/or Or storing the second data obtained from the POD can stabilize the memory usage in the Master at a low level and will not increase with the growth of objects, which is conducive to cluster expansion.
  • the preset association relationship between the data identification information, the host identification information and the POD identification information may be set in advance or updated in real time.
  • an identification information reporting instruction may also be sent to each host to indicate each POD in each host. Report data identification information, host identification information and POD identification information respectively; based on the correlation between data identification information, host identification information and POD identification information.
  • the Master can send identification information reporting instructions to each host regularly or when it detects changes in the host to update the association between data identification, host identification information, and POD identification information. There are no restrictions here.
  • the Master selects the preset data identification information, the host, and the host according to the preset identification information of the first data.
  • the host identification information and the POD identification information associated with the identification information of the first data are determined. For example, each host and its corresponding POD can be determined through the determined host identification information and POD identification information, so as to determine the storage location of the first data.
  • the Master can manage data stored in each host and the POD corresponding to each host. In an exemplary embodiment, detect whether the third data exists; if the third data exists, determine each host that stores the third data of each category and the POD corresponding to each host based on preset data storage rules; respectively Instruct the corresponding host to store the third data of the corresponding category in the corresponding POD.
  • a corresponding number of copies of the third data of the same category can be created and stored in PODs corresponding to different hosts.
  • determining each host that stores third data of each category and the POD corresponding to each host includes: determining the number of categories corresponding to the third data; according to the number of categories of the third data and the number of hosts, to determine the number of copies corresponding to each category of third data; according to the number of copies corresponding to each category of third data, determine the number of hosts that store the corresponding category of third data; according to the preset
  • the correlation between the data identification information, the host identification information and the POD identification information is determined to determine each host that stores the third data of each category and the POD corresponding to each host.
  • the Master can obtain the identification information of the third data. It can be understood that the identification information of different categories of third data is different. For example, the Master can obtain the identification information of the third data based on the number of different identification information of the third data. , determine the number of categories corresponding to the third data, and of course it is not limited to this.
  • the number of copies corresponding to each category of third data is determined based on the number of categories of third data and the number of hosts, so that the number of copies of the corresponding category of third data is determined according to the number of copies of each category of third data.
  • the third data is copied until the number of copies corresponding to the number of copies is reached, so that multiple copies of the third data of the same category are stored in different hosts and PODs corresponding to each host.
  • the host identification information after determining the number of copies corresponding to each category of third data, for example, based on the preset association relationship between the data identification information, the host identification information and the POD identification information, determine the current number of copies of each host.
  • the number of categories of data stored in each POD in the host is used to determine the host that stores the third data of each category and the POD corresponding to each host.
  • FIG. 3 is a data processing interaction diagram according to an embodiment of the present disclosure.
  • the third data detected by the Master is temporarily stored in HDFS, for example, where the third data includes A.zip, B.zip, C.zip and D.zip, and the hosts currently connected to the Master include Host 1, Host 2 and Host 3.
  • the Master determines each of A.zip, B.zip, C.zip and D.zip based on the preset storage rules and according to the number of third data categories being 4 and the number of hosts being 3.
  • the corresponding number of copies is 2 to control the number of copies of A.zip, B.zip, C.zip and D.zip, and at the same time, according to the association between the preset data identification information, host identification information and POD identification information , instructs host 1 to store A.zip, C.zip and D.zip, host 2 to store B.zip and D.zip, and host 3 to store A.zip, B.zip and C.zip. Storage is performed to distribute and store A.zip, B.zip, C.zip, and D.zip in Host 1, Host 2, and Host 3 in a balanced manner.
  • the data currently stored in each POD included in host 1, host 2 and host 3 can also be determined based on the preset correlation between the data identification information, the host identification information and the POD identification information. Corresponding number of categories, thereby determining the PODs corresponding to A.zip, B.zip, C.zip and D.zip stored in Host 1, Host 2 and Host 3 respectively to balance Host 1, Host 2 and The number of categories corresponding to the data stored in each POD in host 3.
  • Step S103 Obtain second data from at least one POD according to preset data acquisition rules, where the second data is obtained by the host processing the first data in at least one POD according to preset data processing rules.
  • the terminal device still needs to process the first data accordingly.
  • HDFS high-speed distribution protocol
  • the data is sent to the GIS system.
  • the GIS system still needs to process the first data to generate a geographical image, and then display the geographical image. The process is complicated and takes a lot of time, and the data processing efficiency is relatively low.
  • HDFS needs to respond to the data acquisition instruction, acquire the first data, and perform serialization, network transmission, deserialization and other processing on the first data
  • the corresponding delay of the data processing is relatively
  • the storage device carrier used to store data in HDFS such as a hard disk, needs to maintain very high real-time performance when reading and writing data, and the requirements for the storage device carrier are relatively high.
  • the second data can be directly obtained from at least one POD according to preset data acquisition rules, wherein the second data is processed by the host in at least one POD according to preset data processing rules.
  • the first data is processed.
  • the host processes the first data required for the GIS system to generate a geographical image, generates a geographical image, and feeds the generated geographical image back to the Master as the second data.
  • the Master can send the geographicalized image to the terminal device, and the terminal device no longer needs to process the first data, saving the data processing process and improving the efficiency of data processing.
  • data The delay corresponding to the data processing process is relatively small, so POD can flexibly use different hard disks, such as mechanical hard disks or other hard disks, and has lower requirements for storage device carriers.
  • data of the same category can be stored in PODs corresponding to multiple hosts by creating copies. Then, data of the same category can be obtained according to the preset
  • the data acquisition rules can be obtained from at least one POD.
  • obtaining the second data from at least one POD according to preset data acquisition rules includes: when the data acquisition request is used to indicate the acquisition of the same category of data, the data amount of the category data that needs to be acquired is obtained. , determine the amount of data that needs to be fed back by each POD that stores data of the corresponding category; obtain the second data corresponding to the amount of data from the corresponding POD.
  • obtaining the second data from at least one POD according to preset data acquisition rules includes: when the data acquisition request is used to indicate the acquisition of different categories of data, obtaining any category of data as needed The amount of data determines the amount of data that needs to be fed back by any POD that stores data of any type; and obtains the second data corresponding to the amount of data from any POD.
  • the amount of data that needs to be fed back by POD1 of host 1 is one A.zip
  • the amount of data that needs to be fed back by POD2 of host 2 is One copy of B.zip
  • the amount of data that POD1 of host 3 needs to feed back is one copy of C.zip
  • host 1, host 2 and host 3 can respectively process the amount of data fed back by POD to obtain the corresponding data amount.
  • the second data feeds back the second data corresponding to the data amount to the Master to improve the efficiency of data processing. Concurrency balances the feedback data volume of each host and the POD corresponding to each host, thereby improving data processing efficiency.
  • Step S104 Send the second data to the terminal device.
  • the terminal device includes, for example, a terminal device of a GIS system.
  • the geographical image can be directly presented to the user, thereby improving the user experience.
  • the data processing method further includes: detecting whether there is a host change; when a host change is detected, based on the difference between the data identification information after the host change and the data identification information before the host change. According to the matching situation, adjust the data in each host and the corresponding POD of each host.
  • the host change category includes any one of adding a host, reducing a host, and replacing a host.
  • changes to the host may be tracked.
  • the host change category may be determined based on changes in the host within adjacent first time thresholds and second time thresholds, where the first time threshold is earlier than the second time threshold. For example, if an increase in hosts is detected within the first time threshold, and a decrease in hosts is detected within the second time threshold, the host change category can be determined to be host replacement; if within the first time threshold, an increase in hosts is detected.
  • the host change category can be determined to be host replacement; if an increase in hosts is detected within the first time threshold, within the second time threshold , if no host change is detected, it can be determined that the host change category is adding a host; if a host decrease is detected within the first time threshold, and no host change is detected within the second time threshold, it can be determined
  • the host change category is host reduction.
  • the missing data identification information after the change of the host can be determined based on the matching of the data identification information after the change of the host and the data identification information before the change of the host. This allows the missing data to be determined.
  • the lost data can be downloaded to the replacement host, and/or the POD corresponding to each category of lost data can be determined to be stored in the replacement host according to preset storage rules. For example, the lost data can be stored in the corresponding POD of the replaced host according to the data storage time and acquisition frequency. For example, the closer the storage time of data is to the change time of the host, the corresponding data will be stored first; for example, the higher the frequency of data acquisition, the corresponding data will be stored first to avoid data loss.
  • each host and the corresponding data of each host are determined based on the matching between the data identification information after the host is changed and the data identification information before the host is changed.
  • the difference in the number of categories corresponding to the data stored in the POD so that according to the difference in the number of categories corresponding to the data stored in each host and the POD corresponding to each host, adjust the copy amount of the corresponding category of data and/or transfer the corresponding category of data Data migration to In the corresponding POD of the added host, to balance the number of categories corresponding to each host and the data stored in the POD corresponding to each host.
  • the lost data when the host is changed to a reduced host, is determined based on the matching between the data identification information after the host change and the data identification information before the host change, and each host is determined. And the difference in the number of categories corresponding to the data stored in the POD corresponding to each host.
  • the lost data can be stored in the corresponding host and the corresponding POD of the corresponding host and/or the copy amount and amount of the corresponding category of data can be adjusted according to the storage time and acquisition frequency of the data. /Or migrate the data of the corresponding category to the POD corresponding to the corresponding host to balance the number of categories corresponding to the data stored in each host and the POD corresponding to each host.
  • each POD in each host reports data identification information, host identification information and POD identification information to the Master respectively, so that the Master updates the association between data identification information, host identification information and POD identification information. relation.
  • the data processing method can also detect the storage time and acquisition frequency of data. For example, when the storage time of data is greater than the preset storage time threshold, and the data acquisition frequency is lower than the preset acquisition frequency threshold, the corresponding data can be deleted to reduce the number of hosts and the number of requests corresponding to each host. Redundant data stored in POD.
  • FIG. 5 is a schematic structural block diagram of a data processing device provided by the present disclosure.
  • the data processing device 300 includes a processor 301 and a memory 302.
  • the processor 301 and the memory 302 are connected through a bus 303, which is, for example, an I2C (Inter-integrated Circuit) bus.
  • I2C Inter-integrated Circuit
  • the processor 301 is used to provide computing and control capabilities to support the operation of the entire data processing device. OK.
  • the processor 301 can be a central processing unit (Central Processing Unit, CPU).
  • the processor 301 can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general processor may be a microprocessor or the processor may be any conventional processor.
  • the memory 302 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) disk, an optical disk, a USB disk, a mobile hard disk, or the like.
  • ROM read-only memory
  • the memory 302 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) disk, an optical disk, a USB disk, a mobile hard disk, or the like.
  • FIG. 5 is only a block diagram of a partial structure related to the disclosed solution, and does not constitute a limitation on the data processing equipment to which the disclosed solution is applied.
  • the specific server can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
  • the processor is used to run the computer program stored in the memory, and implement any data processing method provided by the present disclosure when executing the computer program.
  • the processor is configured to run a computer program stored in the memory, and implement the following steps when executing the computer program: in response to a data acquisition request sent by the terminal device, determine the identification information of the first data based on the data acquisition request; According to the identification information of the first data, each host and the corresponding POD of each host are determined; according to the preset data acquisition rules, the second data is obtained from at least one POD, wherein the second data is obtained by the host in at least one In the POD, the first data is processed according to the preset data processing rules; the second data is sent to the terminal device.
  • the processor when determining each host and the POD corresponding to each host according to the identification information of the first data, is configured to: based on the identification information of the first data, identify the preset data from In the correlation between the information, the host identification information and the POD identification information, each host and the corresponding POD of each host are determined.
  • the processor when the processor obtains the second data from at least one POD according to the preset data acquisition rules, the processor is configured to: when the data acquisition request is used to indicate the acquisition of the same category of data, the processor obtains the second data as needed.
  • the data volume of the category data determines the data volume that needs to be fed back by each POD that stores the corresponding category data; the second data corresponding to the data volume is obtained from the corresponding POD respectively.
  • the processor obtains the second data from at least one POD according to preset data acquisition rules, including: when the data acquisition request is used to indicate the acquisition of different categories of data, any category required to acquire The amount of data, determine the amount of data that needs to be fed back by any POD that stores any type of data; obtain the corresponding data from any POD The second data of the data quantity.
  • the processor when implementing the data processing method, is configured to: detect whether third data exists; if third data exists, determine each location where third data of each category is stored based on preset data storage rules.
  • the host machine and the POD corresponding to each host machine respectively instruct the corresponding host machine to store the third data of the corresponding category in the corresponding POD.
  • the processor when determining each host that stores third data of each category and the POD corresponding to each host based on preset data storage rules, is configured to: determine the category corresponding to the third data. quantity; according to the number of categories of third data and the number of hosts, determine the number of copies corresponding to each category of third data; according to the number of copies corresponding to each category of third data, determine the number of copies for storing the corresponding category of third data.
  • the number of hosts according to the preset correlation between the data identification information, the host identification information and the POD identification information, determine each host that stores the third data of each category and the POD corresponding to each host.
  • the processor before the processor determines the identification information of the first data based on the data acquisition request in response to the data acquisition request sent by the terminal device, the processor is configured to: send an identification information reporting instruction to each host to indicate that each Each POD in the host machine reports data identification information, host identification information and POD identification information respectively; based on the data identification information, host identification information and POD identification information, determine the relationship between the data identification information, host identification information and POD identification information. relationship.
  • the processor when implementing the data processing method, is used to: detect whether there is a host change; when detecting that there is a host change, based on the data identification information after the host change and the data before the host change Match the data identification information, adjust the data in each host and the corresponding POD of each host.
  • the present disclosure also provides a storage medium for computer-readable storage.
  • the storage medium stores one or more programs.
  • the one or more programs can be executed by one or more processors to implement any of the tasks provided by the present disclosure.
  • the steps of a data processing method are not limited to.
  • the storage medium may be an internal storage unit of the data processing device described in the previous embodiment, such as a hard disk or memory of the data processing device.
  • the storage medium may also be an external storage device of the data processing device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital (SD) equipped on the data processing device. card, flash card, etc.
  • the present disclosure provides a data processing method, device and storage medium.
  • the present disclosure determines the identification information of the first data based on the data acquisition request in response to the data acquisition request sent by the terminal device; determines each sink according to the identification information of the first data.
  • the host and the corresponding POD of each host according to the preset data acquisition rules, obtain the second data from at least one POD, wherein the second data is processed by the host in at least one POD according to the preset data processing rules.
  • the first data is processed and the second data is sent to the terminal device. It solves the problem of complex data processing procedures, simplifies the data processing procedures, improves the data processing efficiency, and improves the user experience.
  • the technical solution disclosed in this disclosure aims to simplify the data processing process, improve data processing efficiency, and enhance user experience.
  • Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present disclosure relates to the technical filed of data management, and provides a data processing method and device and a storage medium. The method comprises: in response to a data acquisition request sent by a terminal device, determining identification information of first data on the basis of the data acquisition request; according to the identification information of the first data, determining each host machine and POD respectively corresponding to each host machine; acquiring second data from at least one POD according to a preset data acquisition rule, wherein the second data is obtained by each host machine processing, according to a preset data processing rule, the first data in the at least one POD; and sending the second data to the terminal device.

Description

数据处理方法、设备及存储介质Data processing methods, equipment and storage media
相关申请的交叉引用Cross-references to related applications
本公开要求享有2022年07月15日提交的名称为“数据处理方法、设备及存储介质”的中国专利申请CN202210833724.9的优先权,其全部内容通过引用并入本公开中。This disclosure claims the priority of Chinese patent application CN202210833724.9 titled "Data processing method, equipment and storage medium" submitted on July 15, 2022, the entire content of which is incorporated into this disclosure by reference.
技术领域Technical field
本公开涉及数据管理技术领域,尤其涉及一种数据处理方法、设备及存储介质。The present disclosure relates to the field of data management technology, and in particular, to a data processing method, equipment and storage medium.
背景技术Background technique
目前,业界一般利用分布式文件系统如HDFS(Hadoop Distributed File System,分布式文件系统),Ceph等对数据进行存储和/或调用。举例而言,若在GIS系统(Geographic Information System,地理化信息系统)中使用分布式文件系统,则GIS系统在读写数据时,分布式文件系统需要对大量的数据进行序列化处理、反序列化处理以及网络传输等处理流程,存在数据的处理流程复杂,数据的处理效率较低的技术问题,影响用户的使用体验。At present, the industry generally uses distributed file systems such as HDFS (Hadoop Distributed File System), Ceph, etc. to store and/or call data. For example, if a distributed file system is used in a GIS system (Geographic Information System), when the GIS system reads and writes data, the distributed file system needs to serialize and deserialize a large amount of data. There are technical problems such as complex data processing and low data processing efficiency, which affect the user experience.
发明内容Contents of the invention
本公开提供一种数据处理方法,设备及存储介质,旨在解决数据的处理流程复杂,数据的处理效率较低的技术问题。The present disclosure provides a data processing method, equipment and storage medium, aiming to solve the technical problems of complex data processing procedures and low data processing efficiency.
第一方面,本公开提供一种数据处理方法,包括:响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息;根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD;根据预设的数据获取规则,从至少一个POD中获取第二数据,其中,第二数据由宿主机在至少一个POD中根据预设的数据处理规则,对第一数据进行处理得到;将第二数据发送至终端设备。In a first aspect, the present disclosure provides a data processing method, which includes: in response to a data acquisition request sent by a terminal device, determining identification information of the first data based on the data acquisition request; and determining each host and machine according to the identification information of the first data. Each host has a corresponding POD; according to the preset data acquisition rules, the second data is obtained from at least one POD, wherein the second data is processed by the host in at least one POD according to the preset data processing rules. The data is processed and the second data is sent to the terminal device.
第二方面,本公开还提供一种数据处理设备,包括处理器、存储器、存储在存储器上并可被处理器执行的计算机程序以及用于实现处理器和存储器之间的连接通信的数据总线,其中所述计算机程序被处理器执行时,实现如本公开说明书提供的任一项数据处理方法的步骤。 In a second aspect, the present disclosure also provides a data processing device, including a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for realizing connection communication between the processor and the memory, When the computer program is executed by the processor, the steps of any data processing method provided by this disclosure are implemented.
第三方面,本公开还提供一种存储介质,用于计算机可读存储,存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现如本公开说明书提供的任一项数据处理方法的步骤。In a third aspect, the present disclosure also provides a storage medium for computer-readable storage. The storage medium stores one or more programs, and the one or more programs can be executed by one or more processors to implement the present disclosure. Steps for any data processing method provided in the instructions.
附图说明Description of drawings
为了更清楚地说明本公开的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the present disclosure more clearly, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are some embodiments of the present disclosure. For ordinary people in the art For technical personnel, other drawings can also be obtained based on these drawings without exerting creative work.
图1为本公开提供的一种数据处理方法的流程示意图;Figure 1 is a schematic flow chart of a data processing method provided by the present disclosure;
图2为本公开的一实施例涉及的数据处理框架的示意图;Figure 2 is a schematic diagram of a data processing framework related to an embodiment of the present disclosure;
图3为本公开的一实施例涉及的数据处理交互图;Figure 3 is a data processing interaction diagram related to an embodiment of the present disclosure;
图4为本公开的一实施例涉及的数据处理交互图;Figure 4 is a data processing interaction diagram related to an embodiment of the present disclosure;
图5为本公开提供的一种数据处理设备的结构示意框图。Figure 5 is a schematic structural block diagram of a data processing device provided by the present disclosure.
具体实施方式Detailed ways
下面将结合本公开中的附图,对本公开中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in this disclosure will be clearly and completely described below with reference to the accompanying drawings in this disclosure. Obviously, the described embodiments are part of the embodiments of this disclosure, rather than all embodiments. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of this disclosure.
附图中所示的流程图仅是示例说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解、组合或部分合并,因此实际执行的顺序有可能根据实际情况改变。The flowcharts shown in the accompanying drawings are only examples and do not necessarily include all contents and operations/steps, nor are they necessarily performed in the order described. For example, some operations/steps can also be decomposed, combined or partially merged, so the actual order of execution may change according to actual conditions.
应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本公开。如在本公开说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。It should be understood that the terminology used in the description of the disclosure is for the purpose of describing particular embodiments only and is not intended to limit the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms unless the context clearly dictates otherwise.
若在GIS系统中使用分布式文件系统,则GIS系统在读写数据时,由于分布式文件系统需要对大量的数据进行序列化处理、反序列化处理以及网络传输等处理流程,数据的处理流程 复杂,容易导致数据的处理效率较低,影响用户的使用体验。If a distributed file system is used in a GIS system, when the GIS system reads and writes data, the distributed file system needs to perform serialization, deserialization, and network transmission of a large amount of data. The data processing flow is Complexity can easily lead to low data processing efficiency and affect the user experience.
本公开提供一种数据处理方法设备及存储介质。其中,该数据处理方法可应用于终端设备中,该终端设备可以是手机、平板电脑、笔记本电脑、台式电脑等电子设备。也可以应用于服务器中,该服务器可以是单独的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The present disclosure provides a data processing method device and a storage medium. Among them, the data processing method can be applied to terminal equipment, which can be electronic equipment such as mobile phones, tablet computers, notebook computers, and desktop computers. It can also be applied to servers. The server can be a separate server or can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, Cloud servers for basic cloud computing services such as Content Delivery Network (CDN) and big data and artificial intelligence platforms.
下面结合附图,对本公开的一些实施例作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Some embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings. The following embodiments and features in the embodiments may be combined with each other without conflict.
请参阅图1,图1为本公开提供的一种数据处理方法的流程示意图。Please refer to Figure 1, which is a schematic flow chart of a data processing method provided by the present disclosure.
请参阅图2,图2为本公开的一实施例涉及的数据处理框架的示意图。Please refer to FIG. 2 , which is a schematic diagram of a data processing framework according to an embodiment of the present disclosure.
在一示例性实施例中,数据处理框架部署于Kubernetes环境中。举例而言,如图2所示,数据处理框架包括Master、宿主机以及POD,其中,Master可连接多个宿主机并与多个宿主机进行通信,宿主机可容纳多个POD。In an exemplary embodiment, the data processing framework is deployed in a Kubernetes environment. For example, as shown in Figure 2, the data processing framework includes a Master, a host, and a POD. The Master can connect to and communicate with multiple hosts, and the host can accommodate multiple PODs.
在一些实施例中,Master用于管理Kubernetes环境中的集群,例如宿主机以及POD;宿主机用于与Master通信;POD为Kubernetes环境中的最小部署单元,用于存储数据。In some embodiments, the Master is used to manage clusters in the Kubernetes environment, such as hosts and PODs; the hosts are used to communicate with the Master; the POD is the smallest deployment unit in the Kubernetes environment and is used to store data.
如图1所示,该数据处理方法包括步骤S101至步骤S104。As shown in Figure 1, the data processing method includes steps S101 to S104.
步骤S101、响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息。Step S101: In response to the data acquisition request sent by the terminal device, determine the identification information of the first data based on the data acquisition request.
在一示例性实施例中,该数据处理方法可以应用于任意需要对大量数据进行分布式存储或者调用大量数据的系统例如应用于GIS系统中,以对GIS系统的相关数据进行处理。在一些实施例中,响应于用户对终端设备的相关操作,GIS系统需要向用户呈现对应的地理化图像。举例而言,地理化图像的呈现需要获取GIS系统的相应数据,并对获取的数据进行处理得到地理化图像。In an exemplary embodiment, the data processing method can be applied to any system that requires distributed storage of a large amount of data or the need to call a large amount of data, such as a GIS system, to process relevant data of the GIS system. In some embodiments, in response to the user's related operations on the terminal device, the GIS system needs to present the corresponding geographical image to the user. For example, the presentation of geographical images requires obtaining corresponding data from the GIS system and processing the acquired data to obtain geographical images.
在一示例性实施例中,GIS系统的数据分布存储于数据处理框架中的POD中。在一些实施例中,当终端设备请求呈现某一地理化图像时,相当于接收到终端设备发送的数据获取请求。在一示例性实施例中,数据处理框架中的Master可以基于该数据获取请求,分析确定呈现对应的地理化图像所需的第一数据,例如,Master响应于终端设备发送的数据获取请求,基于数 据获取请求确定第一数据的标识信息。In an exemplary embodiment, the data distribution of the GIS system is stored in a POD in the data processing framework. In some embodiments, when the terminal device requests to render a certain geographed image, it is equivalent to receiving a data acquisition request sent by the terminal device. In an exemplary embodiment, the Master in the data processing framework can analyze and determine the first data required to present the corresponding geographical image based on the data acquisition request. For example, the Master responds to the data acquisition request sent by the terminal device, based on number Determine the identification information of the first data according to the acquisition request.
步骤S102、根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD。Step S102: Determine each host and the POD corresponding to each host according to the identification information of the first data.
在一些实施例中,Master可以根据第一数据的标识信息,确定第一数据的存储位置,例如,确定存储有第一数据的各个POD所在的各个宿主机以及对应的POD。In some embodiments, the Master can determine the storage location of the first data based on the identification information of the first data, for example, determine each host where each POD storing the first data is located and the corresponding POD.
在一示例性实施例中,根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD,包括:根据第一数据的标识信息,从预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系中,确定各个宿主机以及各个宿主机各自对应的POD。In an exemplary embodiment, determining each host and the POD corresponding to each host according to the identification information of the first data includes: according to the identification information of the first data, from the preset data identification information, the host identification In the association between the information and the POD identification information, each host and the corresponding POD of each host are determined.
在一示例性实施例中,Master中设置有多个数据库,例如,该数据库可以为轻量级数据库,举例而言,该数据库可以用于存储预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系和/或存储从POD获取的第二数据。在一些实施例中,例如使用HDFS对GIS系统的数据进行分布式存储时,由于HDFS需要响应于数据获取指令,获取第一数据,并对第一数据进行序列化、网络传输、反序列化等处理过程,任何数据均以对象的形式存储在HDFS中的Master中,举例而言,若每个对象约占150byte,则当有一百万个对象时,HDFS中的Master对每个对象进行存储大约需要2G内存,并且随着对象数量的增加,HDFS中的Master的内存容量需求也相应地增加。可以理解的,HDFS中的Master的内存容量严重制约了集群的扩展。在一示例性实施例中,在Master中设置有多个数据库例如轻量级数据库用于存储各个对象例如存储预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系和/或存储从POD获取的第二数据,则可以使得Master中的内存占用量稳定维持在较低水平,不会随着对象的增长而增长,有利于集群的扩展。In an exemplary embodiment, multiple databases are provided in the Master. For example, the database can be a lightweight database. For example, the database can be used to store preset data identification information, host identification information and POD. Identify the association relationship between the information and/or store the second data obtained from the POD. In some embodiments, for example, when HDFS is used for distributed storage of data in a GIS system, HDFS needs to respond to data acquisition instructions, obtain the first data, and perform serialization, network transmission, deserialization, etc. on the first data. During the processing, any data is stored in the form of objects in the Master in HDFS. For example, if each object occupies about 150 bytes, then when there are one million objects, the Master in HDFS stores each object. Approximately 2G of memory is required, and as the number of objects increases, the memory capacity requirements of the Master in HDFS also increase accordingly. Understandably, the memory capacity of the Master in HDFS severely restricts cluster expansion. In an exemplary embodiment, multiple databases, such as lightweight databases, are provided in the Master to store various objects, such as the association between preset data identification information, host identification information and POD identification information and/or Or storing the second data obtained from the POD can stabilize the memory usage in the Master at a low level and will not increase with the growth of objects, which is conducive to cluster expansion.
在一示例性实施例中,预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系可以是预先设置的,也可以是实时更新的。In an exemplary embodiment, the preset association relationship between the data identification information, the host identification information and the POD identification information may be set in advance or updated in real time.
举例而言,在响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息之前,还可以:向各个宿主机发送标识信息上报指令,以指示各个宿主机中的各个POD分别上报数据标识信息、宿主机标识信息以及POD标识信息;根据数据标识信息、宿主机标识信息、POD标识信息之间的关联关系。当然也不限于此,例如,Master可以定时或者在检测到宿主机发生变更时,向各个宿主机发送标识信息上报指令,以更新数据标识、宿主机标识信息、POD标识信息之间的关联关系,在此不做限制。For example, before responding to the data acquisition request sent by the terminal device and determining the identification information of the first data based on the data acquisition request, an identification information reporting instruction may also be sent to each host to indicate each POD in each host. Report data identification information, host identification information and POD identification information respectively; based on the correlation between data identification information, host identification information and POD identification information. Of course, it is not limited to this. For example, the Master can send identification information reporting instructions to each host regularly or when it detects changes in the host to update the association between data identification, host identification information, and POD identification information. There are no restrictions here.
在一些实施例中,Master根据预设的第一数据的标识信息,从预设的数据标识信息、宿主 机标识信息和POD标识信息之间的关联关系中,确定与第一数据的标识信息相关联的宿主机标识信息以及POD标识信息。举例而言,可以通过确定的宿主机标识信息以及POD标识信息,确定各个宿主机以及各个宿主机各自对应的POD,以确定第一数据的存储位置。In some embodiments, the Master selects the preset data identification information, the host, and the host according to the preset identification information of the first data. In the association relationship between the host identification information and the POD identification information, the host identification information and the POD identification information associated with the identification information of the first data are determined. For example, each host and its corresponding POD can be determined through the determined host identification information and POD identification information, so as to determine the storage location of the first data.
在一示例性实施例中,Master可以对各个宿主机以及各个宿主机对应的POD中存储的数据进行管理。在一示例性实施例中,检测是否存在第三数据;若存在第三数据,基于预设的数据存储规则,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD;分别指示对应的宿主机将对应类别的第三数据存储至对应的POD中。In an exemplary embodiment, the Master can manage data stored in each host and the POD corresponding to each host. In an exemplary embodiment, detect whether the third data exists; if the third data exists, determine each host that stores the third data of each category and the POD corresponding to each host based on preset data storage rules; respectively Instruct the corresponding host to store the third data of the corresponding category in the corresponding POD.
在一些实施例中,基于预设的存储规则,对于同一类别的第三数据可以创建相应数量的副本分别存储在不同宿主机对应的POD中。In some embodiments, based on preset storage rules, a corresponding number of copies of the third data of the same category can be created and stored in PODs corresponding to different hosts.
举例而言,基于预设的数据存储规则,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD,包括:确定第三数据对应的类别数量;根据第三数据的类别数量以及宿主机的数量,确定各个类别的第三数据各自对应的副本数;根据各个类别的第三数据各自对应的副本数,确定存储对应类别的第三数据的宿主机的个数;根据预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD。For example, based on preset data storage rules, determining each host that stores third data of each category and the POD corresponding to each host includes: determining the number of categories corresponding to the third data; according to the number of categories of the third data and the number of hosts, to determine the number of copies corresponding to each category of third data; according to the number of copies corresponding to each category of third data, determine the number of hosts that store the corresponding category of third data; according to the preset The correlation between the data identification information, the host identification information and the POD identification information is determined to determine each host that stores the third data of each category and the POD corresponding to each host.
在一示例性实施例中,Master可以获取第三数据的标识信息,可以理解的,不同类别的第三数据的标识信息不同,Master例如可以根据第三数据的标识信息之间的不相同个数,确定第三数据对应的类别数量,当然也不限于此。In an exemplary embodiment, the Master can obtain the identification information of the third data. It can be understood that the identification information of different categories of third data is different. For example, the Master can obtain the identification information of the third data based on the number of different identification information of the third data. , determine the number of categories corresponding to the third data, and of course it is not limited to this.
在一些实施例中,根据第三数据的类别数量以及宿主机的数量,确定各个类别的第三数据各自对应的副本数,以根据各个类别的第三数据各自对应的副本数将对应类别的第三数据进行副本复制,直至达到副本数对应的份数,以将多份同一类别的第三数据分别存储在不同的宿主机以及各个宿主机对应的POD中。In some embodiments, the number of copies corresponding to each category of third data is determined based on the number of categories of third data and the number of hosts, so that the number of copies of the corresponding category of third data is determined according to the number of copies of each category of third data. The third data is copied until the number of copies corresponding to the number of copies is reached, so that multiple copies of the third data of the same category are stored in different hosts and PODs corresponding to each host.
在一示例性实施例中,在确定各个类别的第三数据各自对应的副本数之后,例如根据预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系,确定当前各个宿主机中的各个POD存储的数据的类别数量,从而确定存储各个类别的第三数据的宿主机以及各个宿主机对应的POD。In an exemplary embodiment, after determining the number of copies corresponding to each category of third data, for example, based on the preset association relationship between the data identification information, the host identification information and the POD identification information, determine the current number of copies of each host. The number of categories of data stored in each POD in the host is used to determine the host that stores the third data of each category and the POD corresponding to each host.
请参阅图3,图3为本公开的一实施例涉及的数据处理交互图。Please refer to FIG. 3 , which is a data processing interaction diagram according to an embodiment of the present disclosure.
如图3所示,Master检测到的第三数据例如暂时存储在HDFS中,其中,第三数据包括 A.zip、B.zip、C.zip以及D.zip,并且当前连接Master的宿主机包括宿主机1、宿主机2以及宿主机3。在一示例性实施例中,Master基于预设的存储规则,根据第三数据的类别数量为4以及宿主机的数量为3,确定A.zip、B.zip、C.zip以及D.zip各自对应的副本数为2,以控制A.zip、B.zip、C.zip以及D.zip的副本数,同时根据预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系,指示宿主机1对A.zip、C.zip以及D.zip进行存储,宿主机2对B.zip以及D.zip进行存储,以及宿主机3对A.zip、B.zip、C.zip进行存储,以均衡地将A.zip、B.zip、C.zip以及D.zip分布存储于宿主机1、宿主机2以及宿主机3中。As shown in Figure 3, the third data detected by the Master is temporarily stored in HDFS, for example, where the third data includes A.zip, B.zip, C.zip and D.zip, and the hosts currently connected to the Master include Host 1, Host 2 and Host 3. In an exemplary embodiment, the Master determines each of A.zip, B.zip, C.zip and D.zip based on the preset storage rules and according to the number of third data categories being 4 and the number of hosts being 3. The corresponding number of copies is 2 to control the number of copies of A.zip, B.zip, C.zip and D.zip, and at the same time, according to the association between the preset data identification information, host identification information and POD identification information , instructs host 1 to store A.zip, C.zip and D.zip, host 2 to store B.zip and D.zip, and host 3 to store A.zip, B.zip and C.zip. Storage is performed to distribute and store A.zip, B.zip, C.zip, and D.zip in Host 1, Host 2, and Host 3 in a balanced manner.
在一些实施例中,还可以根据预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系,确定宿主机1、宿主机2以及宿主机3包括的各个POD当前存储的数据对应的类别数量,从而分别确定宿主机1、宿主机2以及宿主机3中存储A.zip、B.zip、C.zip以及D.zip对应的POD,以均衡宿主机1、宿主机2以及宿主机3中各个POD存储的数据对应的类别数量。In some embodiments, the data currently stored in each POD included in host 1, host 2 and host 3 can also be determined based on the preset correlation between the data identification information, the host identification information and the POD identification information. Corresponding number of categories, thereby determining the PODs corresponding to A.zip, B.zip, C.zip and D.zip stored in Host 1, Host 2 and Host 3 respectively to balance Host 1, Host 2 and The number of categories corresponding to the data stored in each POD in host 3.
步骤S103、根据预设的数据获取规则,从至少一个POD中获取第二数据,其中,第二数据由宿主机在至少一个POD中根据预设的数据处理规则,对第一数据进行处理得到。Step S103: Obtain second data from at least one POD according to preset data acquisition rules, where the second data is obtained by the host processing the first data in at least one POD according to preset data processing rules.
在一些实施例中,若Master根据第一数据的标识信息,确定第一数据的存储位置之后,对第一数据进行获取并发送给终端设备,则终端设备仍需要对第一数据进行相应的处理,例如使用HDFS对GIS系统的数据进行分布式存储时,需要响应于数据获取指令,获取第一数据,并对第一数据进行序列化、网络传输、反序列化等处理过程方能将第一数据发送至GIS系统,GIS系统仍需对第一数据进行处理以生成地理化图像,再进行地理化图像的显示,流程复杂,并耗费大量时间,数据的处理效率相对较低。在一示例性实施例中,由于HDFS需要响应于数据获取指令,获取第一数据,并对第一数据进行序列化、网络传输、反序列化等处理过程,数据的处理过程对应的时延相对较大,相应地,HDFS中的用于存储数据的存储设备载体例如硬盘在进行数据的读写时需要保持非常高的实时性,对存储设备载体的要求相对较高。In some embodiments, if the Master determines the storage location of the first data based on the identification information of the first data, acquires the first data and sends it to the terminal device, the terminal device still needs to process the first data accordingly. For example, when HDFS is used to distribute the data of the GIS system, it is necessary to obtain the first data in response to the data acquisition instruction, and perform serialization, network transmission, deserialization and other processing processes on the first data in order to obtain the first data. The data is sent to the GIS system. The GIS system still needs to process the first data to generate a geographical image, and then display the geographical image. The process is complicated and takes a lot of time, and the data processing efficiency is relatively low. In an exemplary embodiment, because HDFS needs to respond to the data acquisition instruction, acquire the first data, and perform serialization, network transmission, deserialization and other processing on the first data, the corresponding delay of the data processing is relatively Correspondingly, the storage device carrier used to store data in HDFS, such as a hard disk, needs to maintain very high real-time performance when reading and writing data, and the requirements for the storage device carrier are relatively high.
在一示例性实施例中,可以根据预设的数据获取规则,从至少一个POD中直接获取第二数据,其中,第二数据由宿主机在至少一个POD中根据预设的数据处理规则,对第一数据进行处理得到。举例而言,宿主机基于预设的数据处理规则,对GIS系统生成地理化图像所需的第一数据进行处理,生成地理化图像,并将生成的地理化图像作为第二数据反馈给Master,从而Master可以将地理化图像发送给终端设备,终端设备无需再对第一数据进行处理,节省数据处理的流程,提高数据处理的效率。在一示例性实施例中,基于节省的数据处理的流程,数 据的处理过程对应的时延相对较小,则POD可以灵活采用不同的硬盘,例如机械硬盘或其他硬盘,对存储设备载体的要求较低。In an exemplary embodiment, the second data can be directly obtained from at least one POD according to preset data acquisition rules, wherein the second data is processed by the host in at least one POD according to preset data processing rules. The first data is processed. For example, based on the preset data processing rules, the host processes the first data required for the GIS system to generate a geographical image, generates a geographical image, and feeds the generated geographical image back to the Master as the second data. In this way, the Master can send the geographicalized image to the terminal device, and the terminal device no longer needs to process the first data, saving the data processing process and improving the efficiency of data processing. In an exemplary embodiment, based on the flow of saved data processing, data The delay corresponding to the data processing process is relatively small, so POD can flexibly use different hard disks, such as mechanical hard disks or other hard disks, and has lower requirements for storage device carriers.
在一示例性实施例中,基于预设的数据存储规则,对于同一类别的数据可以通过建立副本将其存储于多个宿主机对应的POD中,则对于同一类别的数据的获取,根据预设的数据获取规则,可以从至少一个POD中进行获取。In an exemplary embodiment, based on preset data storage rules, data of the same category can be stored in PODs corresponding to multiple hosts by creating copies. Then, data of the same category can be obtained according to the preset The data acquisition rules can be obtained from at least one POD.
在一示例性实施例中,根据预设的数据获取规则,从至少一个POD中获取第二数据,包括:当数据获取请求用于指示获取相同类别数据时,根据需要获取的类别数据的数据量,确定存储有对应类别数据的各个POD需反馈的数据量;分别从对应的POD中获取对应数据量的第二数据。In an exemplary embodiment, obtaining the second data from at least one POD according to preset data acquisition rules includes: when the data acquisition request is used to indicate the acquisition of the same category of data, the data amount of the category data that needs to be acquired is obtained. , determine the amount of data that needs to be fed back by each POD that stores data of the corresponding category; obtain the second data corresponding to the amount of data from the corresponding POD.
在一示例性实施例中,请结合图4,当数据获取请求用于指示获取相同类别数据时,例如数据获取请求用于指示获取A.zip时,根据需要获取的A.zip的数据量,例如需要获取十份A.zip,则可以确定存储有A.zip的各个POD需反馈的数据量,例如宿主机1的POD1存储有A.zip,宿主机3的POD3存储有A.zip,则宿主机1的POD1需反馈的数据量为5份A.zip,宿主机3的POD3需反馈的数据量为5份A.zip,从而,宿主机1和宿主机3可以同时对A.zip进行处理得到对应数据量的第二数据以反馈对应数据量的第二数据至Master,提高数据处理的并发性,从而提高数据的处理效率。In an exemplary embodiment, please refer to Figure 4, when the data acquisition request is used to indicate the acquisition of the same category of data, for example, when the data acquisition request is used to indicate the acquisition of A.zip, according to the amount of data of A.zip that needs to be acquired, For example, if you need to obtain ten copies of A.zip, you can determine the amount of data that needs to be fed back by each POD that stores A.zip. For example, POD1 of host 1 stores A.zip, and POD3 of host 3 stores A.zip. Then The amount of data that POD1 of host 1 needs to feed back is 5 copies of A.zip, and the amount of data that POD3 of host 3 needs to feed back is 5 copies of A.zip. Therefore, host 1 and host 3 can process A.zip at the same time. The second data corresponding to the data amount is processed to feed back the second data corresponding to the data amount to the Master, thereby improving the concurrency of data processing and thus improving the data processing efficiency.
在一示例性实施例中,根据预设的数据获取规则,从至少一个POD中获取第二数据,包括:当数据获取请求用于指示获取不同类别数据时,根据需要获取的任一类别数据的数据量,确定存储有该任一类别数据的任一POD需反馈的数据量;从该任一POD中获取对应数据量的第二数据。In an exemplary embodiment, obtaining the second data from at least one POD according to preset data acquisition rules includes: when the data acquisition request is used to indicate the acquisition of different categories of data, obtaining any category of data as needed The amount of data determines the amount of data that needs to be fed back by any POD that stores data of any type; and obtains the second data corresponding to the amount of data from any POD.
在一示例性实施例中,请结合图4,当数据获取请求用于指示获取不同类别的数据时,例如数据获取请求用于指示获取A.zip、B.zip以及C.zip时,根据需要获取的A.zip、B.zip以及C.zip各自对应的数据量,例如需要获取一份A.zip、一份B.zip以及一份C.zip,则可以确定存储有该任一类别数据的任一POD需反馈的数据量,例如宿主机1的POD1中存储有A.zip以及POD2中存储有C.zip,宿主机2的POD2中存储有B.zip,宿主机3的POD1中存储有C.zip以及POD3中存储有A.zip,在一示例性实施例中,确定宿主机1的POD1需要反馈的数据量为一份A.zip,宿主机2的POD2需要反馈的数据量为一份B.zip,宿主机3的POD1需要反馈的数据量为一份C.zip,从而宿主机1、宿主机2以及宿主机3可以分别对POD反馈的数据量进行处理得到对应数据量的第二数据以反馈对应数据量的第二数据至Master,提高数据处理的 并发性,均衡各个宿主机以及各个宿主机对应的POD的反馈数据量,从而提高数据的处理效率。In an exemplary embodiment, please refer to Figure 4, when the data acquisition request is used to indicate the acquisition of different categories of data, for example, when the data acquisition request is used to indicate the acquisition of A.zip, B.zip and C.zip, as needed The amount of data corresponding to the obtained A.zip, B.zip and C.zip. For example, if you need to obtain one A.zip, one B.zip and one C.zip, you can be sure that any category of data is stored. The amount of data that needs to be fed back by any POD, for example, A.zip is stored in POD1 of host 1 and C.zip is stored in POD2, B.zip is stored in POD2 of host 2, and POD1 of host 3 stores There is C.zip and POD3 stores A.zip. In an exemplary embodiment, it is determined that the amount of data that needs to be fed back by POD1 of host 1 is one A.zip, and the amount of data that needs to be fed back by POD2 of host 2 is One copy of B.zip, the amount of data that POD1 of host 3 needs to feed back is one copy of C.zip, so host 1, host 2 and host 3 can respectively process the amount of data fed back by POD to obtain the corresponding data amount. The second data feeds back the second data corresponding to the data amount to the Master to improve the efficiency of data processing. Concurrency balances the feedback data volume of each host and the POD corresponding to each host, thereby improving data processing efficiency.
步骤S104、将第二数据发送至终端设备。Step S104: Send the second data to the terminal device.
在一些实施例中,终端设备例如包括GIS系统的终端设备,根据第二数据,可以直接向用户呈现地理化图像,从而提升了用户的使用体验。In some embodiments, the terminal device includes, for example, a terminal device of a GIS system. According to the second data, the geographical image can be directly presented to the user, thereby improving the user experience.
在一示例性实施例中,该数据处理方法还包括:检测是否存在宿主机变更;当检测到宿主机变更时,根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息之间的匹配情况,调整各个宿主机以及各个宿主机各自对应的POD中的数据。In an exemplary embodiment, the data processing method further includes: detecting whether there is a host change; when a host change is detected, based on the difference between the data identification information after the host change and the data identification information before the host change. According to the matching situation, adjust the data in each host and the corresponding POD of each host.
在一示例性实施例中,宿主机变更类别包括添加宿主机、减少宿主机、更换宿主机中的任意一种。在一些实施例中,可以对宿主机的变更情况进行跟踪。在一示例性实施例中,可以根据在相邻的第一时间阈值和第二时间阈值内宿主机的变更情况,确定宿主机变更类别,其中,第一时间阈值早于第二时间阈值。例如,若在第一时间阈值内,检测到宿主机增加,在第二时间阈值内,检测到宿主机减少,则可以确定宿主机变更类别为更换宿主机;若在第一时间阈值内,检测到宿主机减少,在第二时间阈值内,检测到宿主机增加,则可以确定宿主机变更类别为更换宿主机;若在第一时间阈值内,检测到宿主机增加,在第二时间阈值内,未检测到宿主机变更,则可以确定宿主机变更类别为添加宿主机;若在第一时间阈值,检测到宿主机减少,在第二时间阈值内,未检测到宿主机变更,则可以确定宿主机变更类别为减少宿主机。In an exemplary embodiment, the host change category includes any one of adding a host, reducing a host, and replacing a host. In some embodiments, changes to the host may be tracked. In an exemplary embodiment, the host change category may be determined based on changes in the host within adjacent first time thresholds and second time thresholds, where the first time threshold is earlier than the second time threshold. For example, if an increase in hosts is detected within the first time threshold, and a decrease in hosts is detected within the second time threshold, the host change category can be determined to be host replacement; if within the first time threshold, an increase in hosts is detected. When the number of hosts decreases and an increase in hosts is detected within the second time threshold, the host change category can be determined to be host replacement; if an increase in hosts is detected within the first time threshold, within the second time threshold , if no host change is detected, it can be determined that the host change category is adding a host; if a host decrease is detected within the first time threshold, and no host change is detected within the second time threshold, it can be determined The host change category is host reduction.
在一些实施例中,当宿主机变更为更换宿主机时,根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息的匹配情况,可以确定宿主机变更后缺少的数据标识信息,从而可以确定丢失的数据。在一示例性实施例中,可以将丢失的数据下载至更换的宿主机中,和/或根据预设的存储规则,确定在更换的宿主机中存储各个类别的丢失的数据对应的POD。举例而言,可以按照数据的存储时间以及获取频率,将丢失的数据存储至更换的宿主机的相应POD中。例如,数据的存储时间距离宿主机的变更时间越近,则优先对相应的数据进行存储;又例如,数据的获取频率越高,则优先对相应的数据进行存储,以避免数据丢失。In some embodiments, when the host is changed to another host, the missing data identification information after the change of the host can be determined based on the matching of the data identification information after the change of the host and the data identification information before the change of the host. This allows the missing data to be determined. In an exemplary embodiment, the lost data can be downloaded to the replacement host, and/or the POD corresponding to each category of lost data can be determined to be stored in the replacement host according to preset storage rules. For example, the lost data can be stored in the corresponding POD of the replaced host according to the data storage time and acquisition frequency. For example, the closer the storage time of data is to the change time of the host, the corresponding data will be stored first; for example, the higher the frequency of data acquisition, the corresponding data will be stored first to avoid data loss.
在一些实施例中,当宿主机变更为增加宿主机时,根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息之间的匹配情况,确定各个宿主机以及各个宿主机对应的POD中存储的数据对应的类别数量的差别,从而根据各个宿主机以及各个宿主机对应的POD中存储的数据对应的类别数量的差别,调整相应类别的数据的副本量和/或将相应类别的数据迁移至 增加的宿主机的相应POD中,以均衡各个宿主机以及各个宿主机对应的POD中存储的数据对应的类别数量。In some embodiments, when the host is changed to add a host, each host and the corresponding data of each host are determined based on the matching between the data identification information after the host is changed and the data identification information before the host is changed. The difference in the number of categories corresponding to the data stored in the POD, so that according to the difference in the number of categories corresponding to the data stored in each host and the POD corresponding to each host, adjust the copy amount of the corresponding category of data and/or transfer the corresponding category of data Data migration to In the corresponding POD of the added host, to balance the number of categories corresponding to each host and the data stored in the POD corresponding to each host.
在一些实施例中,当宿主机变更为减少宿主机时,根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息之间的匹配情况,确定丢失的数据,并确定各个宿主机以及各个宿主机对应的POD中存储的数据对应的类别数量差别。在一示例性实施例中,可以根据数据的存储时间以及获取频率,将丢失的数据存储至相应的宿主机以及相应的宿主机各自对应的POD中和/或调整相应类别的数据的副本量和/或将相应类别的数据迁移至相应的宿主机对应的POD中,以均衡各个宿主机以及各个宿主机对应的POD中存储的数据对应的类别数量。In some embodiments, when the host is changed to a reduced host, the lost data is determined based on the matching between the data identification information after the host change and the data identification information before the host change, and each host is determined. And the difference in the number of categories corresponding to the data stored in the POD corresponding to each host. In an exemplary embodiment, the lost data can be stored in the corresponding host and the corresponding POD of the corresponding host and/or the copy amount and amount of the corresponding category of data can be adjusted according to the storage time and acquisition frequency of the data. /Or migrate the data of the corresponding category to the POD corresponding to the corresponding host to balance the number of categories corresponding to the data stored in each host and the POD corresponding to each host.
在一示例性实施例中,在根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息之间的匹配情况,调整各个宿主机以及各个宿主机各自对应的POD中的数据之后,更新数据标识信息、宿主机标识信息、POD标识信息之间的关联关系。In an exemplary embodiment, after adjusting the data in each host and the corresponding POD of each host according to the matching between the data identification information after the host changes and the data identification information before the host changes, Update the association between data identification information, host identification information, and POD identification information.
在一些实施例中,各个宿主机中的各个POD分别向Master上报数据标识信息、宿主机标识信息以及POD标识信息,以使得Master更新数据标识信息、宿主机标识信息、POD标识信息之间的关联关系。In some embodiments, each POD in each host reports data identification information, host identification information and POD identification information to the Master respectively, so that the Master updates the association between data identification information, host identification information and POD identification information. relation.
在一示例性实施例中,该数据处理方法还可以对数据的存储时间和获取频率进行检测。举例而言,当数据的存储时间大于预设存储时间阈值,并且,数据的获取频率低于预设获取频率阈值时,可以对相应的数据进行删除,以减少各个宿主机以及各个宿主机对应的POD中存储的冗余数据。In an exemplary embodiment, the data processing method can also detect the storage time and acquisition frequency of data. For example, when the storage time of data is greater than the preset storage time threshold, and the data acquisition frequency is lower than the preset acquisition frequency threshold, the corresponding data can be deleted to reduce the number of hosts and the number of requests corresponding to each host. Redundant data stored in POD.
通过响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息;根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD;根据预设的数据获取规则,从至少一个POD中获取第二数据,其中,第二数据由宿主机在至少一个POD中根据预设的数据处理规则,对第一数据进行处理得到,将第二数据发送至终端设备,有效解决了数据的处理流程复杂,导致数据的处理效率较低,影响用户的使用体验的问题,简化了数据的处理流程,提高了数据的处理效率,提升了用户的使用体验。By responding to the data acquisition request sent by the terminal device, determining the identification information of the first data based on the data acquisition request; determining each host and the corresponding POD of each host according to the identification information of the first data; obtaining data according to the preset Rule, obtain the second data from at least one POD, wherein the second data is obtained by the host processing the first data in at least one POD according to the preset data processing rules, and send the second data to the terminal device, It effectively solves the problem of complex data processing processes, resulting in low data processing efficiency and affecting the user experience. It simplifies the data processing process, improves the data processing efficiency, and enhances the user experience.
请参阅图5,图5为本公开提供的一种数据处理设备的结构示意性框图。Please refer to FIG. 5 , which is a schematic structural block diagram of a data processing device provided by the present disclosure.
如图5所示,数据处理设备300包括处理器301和存储器302,处理器301和存储器302通过总线303连接,该总线比如为I2C(Inter-integrated Circuit)总线。As shown in Figure 5, the data processing device 300 includes a processor 301 and a memory 302. The processor 301 and the memory 302 are connected through a bus 303, which is, for example, an I2C (Inter-integrated Circuit) bus.
在一示例性实施例中,处理器301用于提供计算和控制能力,支撑整个数据处理设备的运 行。处理器301可以是中央处理单元(Central Processing Unit,CPU),该处理器301还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。In an exemplary embodiment, the processor 301 is used to provide computing and control capabilities to support the operation of the entire data processing device. OK. The processor 301 can be a central processing unit (Central Processing Unit, CPU). The processor 301 can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC). ), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. The general processor may be a microprocessor or the processor may be any conventional processor.
在一示例性实施例中,存储器302可以是Flash芯片、只读存储器(ROM,Read-Only Memory)磁盘、光盘、U盘或移动硬盘等。In an exemplary embodiment, the memory 302 may be a Flash chip, a read-only memory (ROM, Read-Only Memory) disk, an optical disk, a USB disk, a mobile hard disk, or the like.
本领域技术人员可以理解,图5中示出的结构,仅仅是与本公开方案相关的部分结构的框图,并不构成对本公开方案所应用于其上的数据处理设备的限定,具体的服务器可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in Figure 5 is only a block diagram of a partial structure related to the disclosed solution, and does not constitute a limitation on the data processing equipment to which the disclosed solution is applied. The specific server can May include more or fewer parts than shown, or combine certain parts, or have a different arrangement of parts.
其中,处理器用于运行存储在存储器中的计算机程序,并在执行计算机程序时实现本公开提供的任意一种数据处理方法。Wherein, the processor is used to run the computer program stored in the memory, and implement any data processing method provided by the present disclosure when executing the computer program.
在一实施例中,处理器用于运行存储在存储器中的计算机程序,并在执行计算机程序时实现如下步骤:响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息;根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD;根据预设的数据获取规则,从至少一个POD中获取第二数据,其中,第二数据由宿主机在至少一个POD中根据预设的数据处理规则,对第一数据进行处理得到;将第二数据发送至终端设备。In one embodiment, the processor is configured to run a computer program stored in the memory, and implement the following steps when executing the computer program: in response to a data acquisition request sent by the terminal device, determine the identification information of the first data based on the data acquisition request; According to the identification information of the first data, each host and the corresponding POD of each host are determined; according to the preset data acquisition rules, the second data is obtained from at least one POD, wherein the second data is obtained by the host in at least one In the POD, the first data is processed according to the preset data processing rules; the second data is sent to the terminal device.
在一实施例中,处理器在实现根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD时,用于实现:根据第一数据的标识信息,从预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系中,确定各个宿主机以及各个宿主机各自对应的POD。In one embodiment, when determining each host and the POD corresponding to each host according to the identification information of the first data, the processor is configured to: based on the identification information of the first data, identify the preset data from In the correlation between the information, the host identification information and the POD identification information, each host and the corresponding POD of each host are determined.
在一实施例中,处理器在实现根据预设的数据获取规则,从至少一个POD中获取第二数据时,用于实现:当数据获取请求用于指示获取相同类别数据时,根据需要获取的类别数据的数据量,确定存储有对应类别数据的各个POD需反馈的数据量;分别从对应的POD中获取对应数据量的第二数据。In one embodiment, when the processor obtains the second data from at least one POD according to the preset data acquisition rules, the processor is configured to: when the data acquisition request is used to indicate the acquisition of the same category of data, the processor obtains the second data as needed. The data volume of the category data determines the data volume that needs to be fed back by each POD that stores the corresponding category data; the second data corresponding to the data volume is obtained from the corresponding POD respectively.
在一实施例中,处理器在实现根据预设的数据获取规则,从至少一个POD中获取第二数据,包括:当数据获取请求用于指示获取不同类别数据时,根据需要获取的任一类别数据的数据量,确定存储有该任一类别数据的任一POD需反馈的数据量;从该任一POD中获取对应数 据量的第二数据。In one embodiment, the processor obtains the second data from at least one POD according to preset data acquisition rules, including: when the data acquisition request is used to indicate the acquisition of different categories of data, any category required to acquire The amount of data, determine the amount of data that needs to be fed back by any POD that stores any type of data; obtain the corresponding data from any POD The second data of the data quantity.
在一实施例中,处理器在实现数据处理方法时,用于实现:检测是否存在第三数据;若存在第三数据,基于预设的数据存储规则,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD;分别指示对应的宿主机将对应类别的第三数据存储至对应的POD中。In one embodiment, when implementing the data processing method, the processor is configured to: detect whether third data exists; if third data exists, determine each location where third data of each category is stored based on preset data storage rules. The host machine and the POD corresponding to each host machine respectively instruct the corresponding host machine to store the third data of the corresponding category in the corresponding POD.
在一实施例中,处理器在实现基于预设的数据存储规则,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD时,用于实现:确定第三数据对应的类别数量;根据第三数据的类别数量以及宿主机的数量,确定各个类别的第三数据各自对应的副本数;根据各个类别的第三数据各自对应的副本数,确定存储对应类别的第三数据的宿主机的个数;根据预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD。In one embodiment, when determining each host that stores third data of each category and the POD corresponding to each host based on preset data storage rules, the processor is configured to: determine the category corresponding to the third data. quantity; according to the number of categories of third data and the number of hosts, determine the number of copies corresponding to each category of third data; according to the number of copies corresponding to each category of third data, determine the number of copies for storing the corresponding category of third data. The number of hosts; according to the preset correlation between the data identification information, the host identification information and the POD identification information, determine each host that stores the third data of each category and the POD corresponding to each host.
在一些实施例中,处理器在实现响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息之前,用于实现:向各个宿主机发送标识信息上报指令,以指示各个宿主机中的各个POD分别上报数据标识信息、宿主机标识信息以及POD标识信息;根据数据标识信息、宿主机标识信息以及POD标识信息,确定数据标识信息、宿主机标识信息、POD标识信息之间的关联关系。In some embodiments, before the processor determines the identification information of the first data based on the data acquisition request in response to the data acquisition request sent by the terminal device, the processor is configured to: send an identification information reporting instruction to each host to indicate that each Each POD in the host machine reports data identification information, host identification information and POD identification information respectively; based on the data identification information, host identification information and POD identification information, determine the relationship between the data identification information, host identification information and POD identification information. relationship.
在一实施例中,处理器在实现数据处理方法时,用于实现:检测是否存在宿主机变更;当检测到存在宿主机变更时,根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息之间的匹配情况,调整各个宿主机以及各个宿主机各自对应的POD中的数据。In one embodiment, when implementing the data processing method, the processor is used to: detect whether there is a host change; when detecting that there is a host change, based on the data identification information after the host change and the data before the host change Match the data identification information, adjust the data in each host and the corresponding POD of each host.
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的数据处理设备的具体工作过程,可以参考前述数据处理方法实施例中的对应过程,在此不再赘述。It should be noted that those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working process of the data processing equipment described above can be referred to the corresponding process in the foregoing data processing method embodiment, and will not be described here. Again.
本公开还提供一种存储介质,用于计算机可读存储,存储介质存储有一个或者多个程序,一个或者多个程序可被一个或者多个处理器执行,以实现如本公开说明书提供的任一项数据处理的方法的步骤。The present disclosure also provides a storage medium for computer-readable storage. The storage medium stores one or more programs. The one or more programs can be executed by one or more processors to implement any of the tasks provided by the present disclosure. The steps of a data processing method.
其中,所述存储介质可以是前述实施例所述的数据处理设备的内部存储单元,例如所述数据处理设备的硬盘或内存。所述存储介质也可以是所述数据处理设备的外部存储设备,例如所述数据处理设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。 The storage medium may be an internal storage unit of the data processing device described in the previous embodiment, such as a hard disk or memory of the data processing device. The storage medium may also be an external storage device of the data processing device, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), or a secure digital (SD) equipped on the data processing device. card, flash card, etc.
本公开提供一种数据处理方法、设备及存储介质,本公开通过响应于终端设备发送的数据获取请求,基于数据获取请求确定第一数据的标识信息;根据第一数据的标识信息,确定各个宿主机以及各个宿主机各自对应的POD;根据预设的数据获取规则,从至少一个POD中获取第二数据,其中,第二数据由宿主机在至少一个POD中根据预设的数据处理规则,对第一数据进行处理得到;将第二数据发送至终端设备。解决了数据的处理流程复杂的问题,简化了数据的处理流程,提高了数据的处理效率,提升了用户的使用体验。本公开的技术方案旨在简化数据的处理流程,提高数据的处理效率,提升用户的使用体验。The present disclosure provides a data processing method, device and storage medium. The present disclosure determines the identification information of the first data based on the data acquisition request in response to the data acquisition request sent by the terminal device; determines each sink according to the identification information of the first data. The host and the corresponding POD of each host; according to the preset data acquisition rules, obtain the second data from at least one POD, wherein the second data is processed by the host in at least one POD according to the preset data processing rules. The first data is processed and the second data is sent to the terminal device. It solves the problem of complex data processing procedures, simplifies the data processing procedures, improves the data processing efficiency, and improves the user experience. The technical solution disclosed in this disclosure aims to simplify the data processing process, improve data processing efficiency, and enhance user experience.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施例中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some steps, systems, and functional modules/units in the devices disclosed above can be implemented as software, firmware, hardware, and appropriate combinations thereof. In hardware embodiments, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components execute cooperatively. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, a digital signal processor, or a microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit . Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is known to those of ordinary skill in the art, the term computer storage media includes volatile and nonvolatile media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. removable, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disk (DVD) or other optical disk storage, magnetic cassettes, tapes, disk storage or other magnetic storage devices, or may Any other medium used to store the desired information and that can be accessed by a computer. Additionally, it is known to those of ordinary skill in the art that communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
应当理解,在本公开的说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。 It will be understood that the term "and/or" as used in the specification and appended claims of the present disclosure refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, as used herein, the terms "include", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or system that includes a list of elements not only includes those elements, but It also includes other elements not expressly listed or that are inherent to the process, method, article or system. Without further limitation, an element defined by the statement "comprises a..." does not exclude the presence of other identical elements in the process, method, article, or system that includes that element.
上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。以上所述,仅为本公开的具体实施例,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。 The above serial numbers of the embodiments of the present disclosure are only for description and do not represent the advantages and disadvantages of the embodiments. The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of various equivalent methods within the technical scope disclosed in the present disclosure. Modifications or substitutions, these modifications or substitutions should be covered by the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims (10)

  1. 一种数据处理方法,包括:A data processing method including:
    响应于终端设备发送的数据获取请求,基于所述数据获取请求确定第一数据的标识信息;In response to the data acquisition request sent by the terminal device, determine the identification information of the first data based on the data acquisition request;
    根据所述第一数据的标识信息,确定各个宿主机以及所述各个宿主机各自对应的POD;Determine each host and the POD corresponding to each host according to the identification information of the first data;
    根据预设的数据获取规则,从至少一个所述POD中获取第二数据,其中,所述第二数据由所述宿主机在至少一个所述POD中根据预设的数据处理规则,对所述第一数据进行处理得到;Acquire second data from at least one of the PODs according to preset data acquisition rules, wherein the second data is processed by the host in at least one of the PODs according to preset data processing rules. The first data is processed;
    将所述第二数据发送至所述终端设备。Send the second data to the terminal device.
  2. 根据权利要求1所述的数据处理方法,其中,所述根据所述第一数据的标识信息,确定各个宿主机以及所述各个宿主机各自对应的POD,包括:The data processing method according to claim 1, wherein determining each host and the corresponding POD of each host according to the identification information of the first data includes:
    根据所述第一数据的标识信息,从预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系中,确定各个宿主机以及所述各个宿主机各自对应的POD。According to the identification information of the first data, each host and the corresponding POD of each host are determined from the preset association relationship between the data identification information, the host identification information and the POD identification information.
  3. 根据权利要求1所述的数据处理方法,其中,所述根据预设的数据获取规则,从至少一个所述POD中获取第二数据,包括:The data processing method according to claim 1, wherein obtaining the second data from at least one of the PODs according to preset data acquisition rules includes:
    当所述数据获取请求用于指示获取相同类别数据时,根据需要获取的类别数据的数据量,确定存储有对应类别数据的各个所述POD需反馈的数据量;When the data acquisition request is used to indicate the acquisition of data of the same category, the amount of data to be fed back by each POD that stores the corresponding category data is determined based on the data amount of the category data that needs to be acquired;
    分别从对应的所述POD中获取对应数据量的所述第二数据。The second data corresponding to the data amount is obtained from the corresponding POD respectively.
  4. 根据权利要求1所述的数据处理方法,其中,所述根据预设的数据获取规则,从至少一个所述POD中获取第二数据,包括:The data processing method according to claim 1, wherein obtaining the second data from at least one of the PODs according to preset data acquisition rules includes:
    当所述数据获取请求用于指示获取不同类别数据时,根据需要获取的任一类别数据的数据量,确定存储有该任一类别数据的任一POD需反馈的数据量;When the data acquisition request is used to indicate the acquisition of data of different categories, the amount of data to be fed back by any POD that stores the data of any category is determined based on the amount of data of any category that needs to be acquired;
    从该任一POD中获取对应数据量的所述第二数据。Obtain the second data corresponding to the amount of data from any POD.
  5. 根据权利要求1至4中任一项所述的数据处理方法,其中,还包括:The data processing method according to any one of claims 1 to 4, further comprising:
    检测是否存在第三数据;Detect whether there is third data;
    若存在第三数据,基于预设的数据存储规则,确定存储各个类别的第三数据的各个宿主机 以及各个宿主机对应的POD;If third data exists, determine each host that stores the third data of each category based on preset data storage rules. And the POD corresponding to each host;
    分别指示对应的宿主机将对应类别的第三数据存储至对应的POD中。Respectively instruct the corresponding host to store the third data of the corresponding category in the corresponding POD.
  6. 根据权利要求5所述的数据处理方法,其中,所述基于预设的数据存储规则,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD,包括:The data processing method according to claim 5, wherein determining each host that stores third data of each category and the POD corresponding to each host based on preset data storage rules includes:
    确定所述第三数据对应的类别数量;Determine the number of categories corresponding to the third data;
    根据第三数据的类别数量以及宿主机的数量,确定各个类别的第三数据各自对应的副本数;According to the number of categories of third data and the number of hosts, determine the number of copies corresponding to each category of third data;
    根据各个类别的第三数据各自对应的副本数,确定存储对应类别的第三数据的宿主机的个数;Determine the number of hosts that store the third data of the corresponding category according to the corresponding number of copies of each category of third data;
    根据预设的数据标识信息、宿主机标识信息和POD标识信息之间的关联关系,确定存储各个类别的第三数据的各个宿主机以及各个宿主机对应的POD。According to the preset correlation between the data identification information, the host identification information and the POD identification information, each host that stores the third data of each category and the POD corresponding to each host are determined.
  7. 根据权利要求1至4中任一项所述的数据处理方法,其中,在所述响应于终端设备发送的数据获取请求,基于所述数据获取请求确定第一数据的标识信息之前,还包括:The data processing method according to any one of claims 1 to 4, wherein before determining the identification information of the first data based on the data acquisition request sent by the terminal device in response to the data acquisition request, it further includes:
    向各个所述宿主机发送标识信息上报指令,以指示各个所述宿主机中的各个POD分别上报数据标识信息、宿主机标识信息以及POD标识信息;Send identification information reporting instructions to each of the hosts to instruct each POD in each of the hosts to respectively report data identification information, host identification information and POD identification information;
    根据所述数据标识信息、所述宿主机标识信息以及所述POD标识信息,确定数据标识信息、宿主机标识信息、POD标识信息之间的关联关系。According to the data identification information, the host identification information and the POD identification information, the association relationship between the data identification information, the host identification information and the POD identification information is determined.
  8. 根据权利要求1至4中任一项所述的数据处理方法,其中,还包括:The data processing method according to any one of claims 1 to 4, further comprising:
    检测是否存在宿主机变更;Detect whether there are host changes;
    当检测到存在宿主机变更时,根据宿主机变更后的数据标识信息与宿主机变更前的数据标识信息之间的匹配情况,调整各个宿主机以及各个宿主机各自对应的POD中的数据。When a host change is detected, the data in each host and the corresponding POD of each host is adjusted based on the matching between the data identification information after the host change and the data identification information before the host change.
  9. 一种数据处理设备,包括处理器、存储器、存储在所述存储器上并可被所述处理器执行的计算机程序以及用于实现所述处理器和所述存储器之间的连接通信的数据总线,其中所述计算机程序被所述处理器执行时,实现如权利要求1至8中任一项所述的数据处理方法的步骤。A data processing device, including a processor, a memory, a computer program stored on the memory and executable by the processor, and a data bus for realizing connection communication between the processor and the memory, When the computer program is executed by the processor, the steps of the data processing method according to any one of claims 1 to 8 are implemented.
  10. 一种存储介质,用于计算机可读存储,所述存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现权利要求1至8中任一项所述的数据处理方法的步骤。 A storage medium for computer-readable storage. The storage medium stores one or more programs. The one or more programs can be executed by one or more processors to implement any of claims 1 to 8. The steps of the data processing method described in one item.
PCT/CN2023/074616 2022-07-15 2023-02-06 Data processing method and device and storage medium WO2024011896A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210833724.9A CN117439854A (en) 2022-07-15 2022-07-15 Data processing method, device and storage medium
CN202210833724.9 2022-07-15

Publications (1)

Publication Number Publication Date
WO2024011896A1 true WO2024011896A1 (en) 2024-01-18

Family

ID=89535353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/074616 WO2024011896A1 (en) 2022-07-15 2023-02-06 Data processing method and device and storage medium

Country Status (2)

Country Link
CN (1) CN117439854A (en)
WO (1) WO2024011896A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110543537A (en) * 2019-08-22 2019-12-06 广东省城乡规划设计研究院 Intelligent planning space-time cloud GIS platform based on Docker container and micro-service architecture
CN111966305A (en) * 2020-10-22 2020-11-20 腾讯科技(深圳)有限公司 Persistent volume allocation method and device, computer equipment and storage medium
WO2021003921A1 (en) * 2019-07-10 2021-01-14 平安科技(深圳)有限公司 Data processing method, and terminal device
CN112989330A (en) * 2021-02-08 2021-06-18 网宿科技股份有限公司 Container intrusion detection method and device, electronic equipment and storage medium
CN113127526A (en) * 2019-12-30 2021-07-16 中科星图股份有限公司 Distributed data storage and retrieval system based on Kubernetes
CN113301174A (en) * 2020-07-14 2021-08-24 阿里巴巴集团控股有限公司 Data processing and conversion rule deployment method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021003921A1 (en) * 2019-07-10 2021-01-14 平安科技(深圳)有限公司 Data processing method, and terminal device
CN110543537A (en) * 2019-08-22 2019-12-06 广东省城乡规划设计研究院 Intelligent planning space-time cloud GIS platform based on Docker container and micro-service architecture
CN113127526A (en) * 2019-12-30 2021-07-16 中科星图股份有限公司 Distributed data storage and retrieval system based on Kubernetes
CN113301174A (en) * 2020-07-14 2021-08-24 阿里巴巴集团控股有限公司 Data processing and conversion rule deployment method and device
CN111966305A (en) * 2020-10-22 2020-11-20 腾讯科技(深圳)有限公司 Persistent volume allocation method and device, computer equipment and storage medium
CN112989330A (en) * 2021-02-08 2021-06-18 网宿科技股份有限公司 Container intrusion detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117439854A (en) 2024-01-23

Similar Documents

Publication Publication Date Title
US11226847B2 (en) Implementing an application manifest in a node-specific manner using an intent-based orchestrator
US9934263B1 (en) Big-fast data connector between in-memory database system and data warehouse system
US11113158B2 (en) Rolling back kubernetes applications
WO2018149221A1 (en) Device management method and network management system
JP5705869B2 (en) Apparatus and method for loading and updating code of a cluster-based JAVA application system
US20160292249A1 (en) Dynamic replica failure detection and healing
US11948014B2 (en) Multi-tenant control plane management on computing platform
US11347684B2 (en) Rolling back KUBERNETES applications including custom resources
US8843581B2 (en) Live object pattern for use with a distributed cache
WO2021139224A1 (en) Method and apparatus for file backup in cloud scenario, and medium and electronic device
US9836516B2 (en) Parallel scanners for log based replication
US20220075757A1 (en) Data read method, data write method, and server
US20210240544A1 (en) Collaboration service to support cross-process coordination between active instances of a microservice
CN112463290A (en) Method, system, apparatus and storage medium for dynamically adjusting the number of computing containers
CN110633046A (en) Storage method and device of distributed system, storage equipment and storage medium
US11886225B2 (en) Message processing method and apparatus in distributed system
CN109923533B (en) Method and apparatus for separating computation and storage in a database
WO2017157111A1 (en) Method, device and system for preventing memory data loss
US11134121B2 (en) Method and system for recovering data in distributed computing system
US10127270B1 (en) Transaction processing using a key-value store
US11386153B1 (en) Flexible tagging and searching system
WO2024011896A1 (en) Data processing method and device and storage medium
CN113760522A (en) Task processing method and device
CN111767126A (en) System and method for distributed batch processing
CN111444148A (en) Data transmission method and device based on MapReduce

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23838398

Country of ref document: EP

Kind code of ref document: A1