CN112748879A - Data acquisition method, system, device, computer equipment and storage medium - Google Patents

Data acquisition method, system, device, computer equipment and storage medium Download PDF

Info

Publication number
CN112748879A
CN112748879A CN202011622404.6A CN202011622404A CN112748879A CN 112748879 A CN112748879 A CN 112748879A CN 202011622404 A CN202011622404 A CN 202011622404A CN 112748879 A CN112748879 A CN 112748879A
Authority
CN
China
Prior art keywords
mirror image
data
client
data packet
cache node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011622404.6A
Other languages
Chinese (zh)
Other versions
CN112748879B (en
Inventor
原帅
张晋锋
吕灼恒
贾冬冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Shuguang International Information Industry Co ltd
Original Assignee
Zhongke Shuguang International Information Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Shuguang International Information Industry Co ltd filed Critical Zhongke Shuguang International Information Industry Co ltd
Priority to CN202011622404.6A priority Critical patent/CN112748879B/en
Publication of CN112748879A publication Critical patent/CN112748879A/en
Application granted granted Critical
Publication of CN112748879B publication Critical patent/CN112748879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The application relates to a data acquisition method, a system, a device, a computer device and a storage medium. The method comprises the following steps: the method comprises the steps that a dispatcher receives a data acquisition request sent by a first client, if an idle mirror image cache node exists in a distributed mirror image warehouse, at least one mirror image cache node is determined from the idle mirror image cache node, the at least one mirror image cache node is called to obtain at least one data block corresponding to a data packet identifier in the data acquisition request from a distributed storage system in the distributed mirror image warehouse, and the at least one data block is composed into a data packet to be sent to the first client. In the method, each mirror image data is stored in blocks based on the distributed storage system, the horizontal expansion of service capacity and storage capacity is realized, the problem of single-machine storage capacity is solved, and the bandwidth pressure of the distributed storage system is greatly reduced and the reading delay of the data packet is shortened by deploying a plurality of mirror image cache nodes based on the distributed storage system.

Description

Data acquisition method, system, device, computer equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a data acquisition method, system, apparatus, computer device, and storage medium.
Background
Linux containers (LXC) is a kernel virtualization technology, which can provide lightweight virtualization to isolate processes and resources, and a more sophisticated technology is now to simplify software deployment and distribution through Docker container technology. The core of the container technology is that the operating environment is packaged through mirroring, and in the distribution management of the container mirroring, a mirroring warehouse is responsible for storage and distribution of the mirroring.
In the prior art, storage and distribution of a whole image file are realized based on a Docker Registry open-source private image warehouse and a disk file system, and the Docker Registry open-source private image warehouse is expanded and enhanced, for example, image libraries such as VMware Harbor, Sonatype Nexus, SUSE ports and the like.
However, the above mirror image warehouse based on the disk file system stores or distributes the whole mirror image file as a unit, which causes a large process reading delay of the mirror image warehouse and cannot meet the requirement of high-concurrency high-throughput mirror image request.
Disclosure of Invention
In view of the above, it is necessary to provide a data acquisition method, system, apparatus, computer device and storage medium capable of reducing the read delay.
In a first aspect, a data acquisition method is provided, and the method includes:
receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
if an idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one mirror image cache node from the idle mirror image cache node;
calling at least one mirror image cache node to obtain at least one data block corresponding to the data packet identifier from a distributed storage system in a distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and at least one data block is formed into a data packet and sent to the first client.
In this embodiment, the distributed storage system is configured to perform block storage on each mirror data, implement horizontal expansion of service capacity and storage capacity, and solve the problem of single-machine storage capacity, and by deploying a plurality of mirror cache nodes based on the distributed storage system, bandwidth pressure of the distributed storage system is greatly reduced, and read delay of a data packet is shortened.
In one embodiment, the method further includes:
if no idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one second client according to the data packet identifier; the second client is other clients storing the data blocks corresponding to the data packet identifiers;
acquiring at least one data block corresponding to the data packet identifier from at least one second client;
and at least one data block is formed into a data packet and sent to the first client.
In this embodiment, the scheduler may further perform, under the condition that the mirror cache node is occupied, acquisition of a data packet from another client node, and shorten a read delay of the data packet, thereby satisfying a mirror operation in a high concurrency scenario.
In one embodiment, the method further includes:
determining whether the resource of each mirror cache node is being scheduled;
and if the resources of the mirror image cache node are not scheduled, determining the mirror image cache node as an idle mirror image cache node.
In this embodiment, the scheduler may determine an idle mirror cache node according to a resource scheduling condition of each mirror cache node to execute a corresponding data packet obtaining operation, so as to avoid a situation that a data packet obtaining task is suspended all the time due to resource occupation of the mirror cache node, and the determining method is simple and effective, thereby improving the efficiency of obtaining the data packet.
In one embodiment, the invoking the mirror cache node to obtain the data block corresponding to the data packet identifier from the distributed storage system includes:
sending a data acquisition request to each mirror image cache node so that the mirror image cache nodes respectively acquire data blocks corresponding to the data packet identifiers from the distributed storage system according to the data packet identifiers in the data acquisition request;
and receiving the data block returned by the mirror image cache node.
In this embodiment, the scheduler calls the mirror cache node to acquire the data block from the distributed system, and acquires the data packet from the distributed storage system through the mirror cache node, thereby greatly reducing the bandwidth pressure of the distributed storage system.
In a second aspect, a data acquisition system is provided, the system comprising: the system comprises a distributed storage system, a plurality of mirror image cache nodes, a scheduler and a first client;
the first client is used for sending a first data acquisition request to the scheduler; the first data acquisition request comprises a data packet identifier;
the system comprises a dispatcher, a storage module and a data processing module, wherein the dispatcher is used for determining at least one mirror cache node from idle mirror cache nodes under the condition that the idle mirror cache nodes exist in a distributed mirror warehouse;
the scheduler is further used for calling the mirror image cache node to acquire at least one data block corresponding to the data packet identifier from a distributed storage system in the distributed mirror image warehouse, and transmitting the data packet formed by the at least one data block to the first client; the distributed storage system is used for storing each mirror image data in blocks.
In this embodiment, the distributed storage system is configured to perform block storage on each mirror data, implement horizontal expansion of service capacity and storage capacity, and solve the problem of single-machine storage capacity, and the scheduler determines and calls an idle mirror cache node by deploying a plurality of mirror cache nodes based on the distributed storage system, so as to obtain a data packet from the distributed storage system, thereby greatly reducing bandwidth pressure of the distributed storage system.
In one embodiment, the system further comprises a plurality of second clients;
the scheduler is further used for determining at least one second target client according to the data packet identifier under the condition that no idle mirror image cache node exists in the distributed mirror image warehouse, and the second target client sends a first data acquisition request; the second target client is a second client which stores the data block corresponding to the data packet identifier;
the second target client is used for sending at least one data block corresponding to the data packet identifier to the scheduler;
the scheduler is further used for transmitting the data packet composed of at least one data block to the first client.
In this embodiment, the scheduler may further perform, under the condition that the mirror cache node is occupied, acquisition of a data packet from another client node, and shorten a read delay of the data packet, thereby satisfying a mirror operation in a high concurrency scenario.
In one embodiment, the data obtaining request further includes a client identifier of the first client;
the mirror image cache node is also used for generating a user identifier according to the client identifier in the first data acquisition request and sending a second data acquisition request comprising the user identifier and a data packet identifier to the distributed storage system;
and the distributed storage system is used for carrying out validity verification on the mirror image cache node according to the user identifier, and if the mirror image cache node is legal, feeding back a corresponding data block to the mirror image cache node according to the data packet identifier.
In this embodiment, the distributed storage system can perform validity authentication on the mirror cache node, thereby increasing the security and confidentiality of data in the distributed storage system.
In a third aspect, a data acquisition apparatus is provided, the apparatus comprising:
the receiving module is used for receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
the determining module is used for determining at least one mirror image cache node from the idle mirror image cache nodes if the idle mirror image cache nodes exist in the distributed mirror image warehouse;
the acquisition module is used for calling at least one mirror image cache node to acquire at least one data block corresponding to the data packet identifier from a distributed storage system in a distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and the sending module is used for forming a data packet by at least one data block and sending the data packet to the first client.
In a fourth aspect, a computer device is provided, which includes a memory and a processor, wherein the memory stores a computer program, and the processor implements the data acquisition method according to any one of the first aspect when executing the computer program.
In a fifth aspect, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data acquisition method of any of the first aspects described above.
According to the data acquisition method, the data acquisition system, the data acquisition device, the computer equipment and the storage medium, the dispatcher receives a data acquisition request sent by a first client, if an idle mirror image cache node exists in the distributed mirror image warehouse, at least one mirror image cache node is determined from the idle mirror image cache node, at least one mirror image cache node is called to obtain at least one data block corresponding to a data packet identifier in the data acquisition request from the distributed storage system in the distributed mirror image warehouse, and the at least one data block is formed into a data packet to be sent to the first client. In the method, each mirror image data is stored in blocks based on the distributed storage system, the horizontal expansion of service capacity and storage capacity is realized, the problem of single-machine storage capacity is solved, and the bandwidth pressure of the distributed storage system is greatly reduced and the reading delay of the data packet is shortened by deploying a plurality of mirror image cache nodes based on the distributed storage system.
Drawings
FIG. 1 is a diagram of an application environment of a data acquisition method in one embodiment;
FIG. 2 is a schematic diagram of the data acquisition system in one embodiment;
FIG. 3 is a schematic flow chart diagram illustrating a data acquisition method in one embodiment;
FIG. 4 is a schematic flow chart diagram illustrating a data acquisition method in one embodiment;
FIG. 5 is a schematic flow chart diagram illustrating a data acquisition method in one embodiment;
FIG. 6 is a schematic flow chart diagram illustrating a data acquisition method in one embodiment;
FIG. 7 is a schematic flow chart diagram illustrating a data acquisition method in one embodiment;
FIG. 8 is a block diagram showing the structure of a data acquisition apparatus according to an embodiment;
FIG. 9 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The data acquisition method provided by the application can be applied to the application environment shown in fig. 1. Fig. 1 shows a schematic structural diagram of a data acquisition system, wherein a first client 101 communicates with a scheduler 102 through a network, the scheduler 102 communicates with a mirror cache node 103 through the network, and the mirror cache node 103 communicates with a distributed storage system 104 through the network. The first client 101 may be, but is not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices; scheduler 102 may be a processor that deploys a load balancer; the mirror image cache node 103 is a cache node of a distributed mirror image warehouse constructed based on a container technology; the distributed storage system 104 is a backend distributed storage, realizes horizontal expansion of service capability and storage capacity, and solves the problem of stand-alone storage capacity, and optionally, the distributed storage system 104 may be a ParaStor 300S.
In one embodiment, as shown in FIG. 1, there is provided a data acquisition system comprising: the system comprises a distributed storage system, a plurality of mirror image cache nodes, a scheduler and a first client;
the first client is used for sending a first data acquisition request to the scheduler; the first data acquisition request comprises a data packet identifier;
the system comprises a dispatcher, a storage module and a data processing module, wherein the dispatcher is used for determining at least one mirror cache node from idle mirror cache nodes under the condition that the idle mirror cache nodes exist in a distributed mirror warehouse;
the scheduler is further used for calling the mirror image cache node to acquire at least one data block corresponding to the data packet identifier from a distributed storage system in the distributed mirror image warehouse, and transmitting the data packet formed by the at least one data block to the first client; the distributed storage system is used for storing each mirror image data in blocks.
In this embodiment, a first client sends a first data acquisition request including a data packet identifier and a first client identifier to a scheduler, and after receiving the data acquisition request, the scheduler determines at least one mirror image cache node from the idle mirror image cache nodes under the condition that the idle mirror image cache nodes exist in a distributed mirror image warehouse, calls the mirror image cache node to acquire at least one data block corresponding to the data packet identifier from a distributed storage system in the distributed mirror image warehouse, receives the data block returned by the at least one mirror image cache node, and returns the data packet composed of the at least one data block to the first client, thereby completing primary data acquisition of the first client.
Optionally, in a large-scale cluster, in order to implement traffic load balancing for a plurality of mirror cache nodes, a DNS load balancing processor may be further deployed in the scheduler, and the DNS load balancing processor is configured to obtain a resource scheduling condition of each mirror cache node and determine an idle mirror cache node.
In this embodiment, the distributed storage system is configured to perform block storage on each mirror data, implement horizontal expansion of service capacity and storage capacity, and solve the problem of single-machine storage capacity, and the scheduler determines and calls an idle mirror cache node by deploying a plurality of mirror cache nodes based on the distributed storage system, so as to obtain a data packet from the distributed storage system, thereby greatly reducing bandwidth pressure of the distributed storage system.
In one embodiment, as shown in fig. 2, the system further comprises a plurality of second clients 105;
the scheduler is further used for determining at least one second target client according to the data packet identifier and sending a first data acquisition request to the second target client under the condition that no idle mirror image cache node exists in the distributed mirror image warehouse; the second target client is a second client which stores the data block corresponding to the data packet identifier;
the second target client is used for sending at least one data block corresponding to the data packet identifier to the scheduler;
the scheduler is further used for transmitting the data packet composed of at least one data block to the first client.
In this embodiment, the scheduler determines that there is no idle mirror cache node currently according to the resource occupation condition of each mirror cache node, at this time, the scheduler may acquire the data storage condition of other clients in the same network as the first client, that is, acquire the data block identifiers stored by the other clients, and determine the current client as the second client when it is determined that there is the data block identifier required by the first client. Optionally, at least one second client may be determined according to the number of data blocks required by the first client.
Optionally, the scheduler may also directly obtain, from other clients, the data block corresponding to the data packet identifier in the data obtaining request when receiving the data obtaining request of the first client, and since the data block in the other clients is an existing data block, obtaining the data block from the other clients may reduce data obtaining delay and reduce concurrent operations of the mirror cache node. From a technical aspect, the embodiment can realize the P2P mode pulling mirror image based on the Dragonfly framework.
In this embodiment, the scheduler may further perform, under the condition that the mirror cache node is occupied, acquisition of a data packet from another client node, and shorten a read delay of the data packet, thereby satisfying a mirror operation in a high concurrency scenario.
In one embodiment, the data obtaining request further includes a client identifier of the first client;
the mirror image cache node is also used for generating a user identifier according to the client identifier in the first data acquisition request and sending a second data acquisition request comprising the user identifier and a data packet identifier to the distributed storage system;
and the distributed storage system is used for carrying out validity verification on the mirror image cache node according to the user identifier, and if the mirror image cache node is legal, feeding back a corresponding data block to the mirror image cache node according to the data packet identifier.
The user identifier may be a random character string generated according to the client identifier. In this embodiment, after receiving the second data obtaining request, the distributed storage system does not determine that it is the mirror cache node to send the second data obtaining request, and at this time, the distributed storage system may parse the second data obtaining request, determine whether the second data obtaining request includes the user identifier, and if not, determine that the current access node is an illegal node; and if the second data acquisition request comprises the user identification, determining that the current access node is a legal node. From the technical implementation point of view, the present embodiment may use a base64 bit encoding mode of base auth provided by Docker Registry official for legality authentication.
In this embodiment, after determining that the current access node is a valid node, the distributed storage system returns a corresponding data block to the access node according to a data packet identifier or a data block identifier in the second data acquisition request, where the access node is a mirror cache node, and the mirror cache node returns the database to the scheduler after acquiring the corresponding data block.
In this embodiment, the distributed storage system can perform validity authentication on the mirror cache node, thereby increasing the security and confidentiality of data in the distributed storage system.
The following describes in detail the technical solutions of the present application and how the technical solutions of the present application solve the above technical problems by embodiments and with reference to the drawings. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. It should be noted that the data acquisition method provided in the embodiments of fig. 3 to fig. 7 of the present application is mainly executed by a scheduler in the data acquisition system, and may also be executed by a data acquisition apparatus, which may be a part or all of the scheduler through software, hardware, or a combination of software and hardware. In the following method embodiments, the execution subject is a scheduler, for example.
In an embodiment, as shown in fig. 3, a data obtaining method is provided, which is applied to the data obtaining system provided in the foregoing embodiment, and relates to a process that a scheduler receives a data obtaining request sent by a first client, determines at least one mirror cache node from idle mirror cache nodes if the idle mirror cache nodes exist in a distributed mirror warehouse, calls the at least one mirror cache node to obtain at least one data block corresponding to a packet identifier in the data obtaining request from a distributed storage system in the distributed mirror warehouse, and sends the at least one data block to the first client as a data packet, and includes the following steps:
s201, receiving a data acquisition request sent by a first client; the data acquisition request includes a packet identification.
The data acquisition request is a request sent by a first client for acquiring a required data packet, and optionally, the data acquisition request includes a data packet identifier and a client identifier of the first client; since the application scenario of the method is a distributed mirror repository, the data packet identifier may include an identifier of at least one data block.
In this embodiment, the scheduler receives a data obtaining request sent by a first client, where the first client is any one of the clients currently initiating the data obtaining request, and here, we may default that the first client is a client that first sends the data obtaining request to the scheduler in a current time period, which is not limited in this embodiment.
S202, if the idle mirror image cache nodes exist in the distributed mirror image warehouse, determining at least one mirror image cache node from the idle mirror image cache nodes.
The distributed mirror image warehouse comprises a distributed storage system and a plurality of mirror image cache nodes, and each mirror image cache node is used for receiving a data acquisition request sent by a scheduler and executing corresponding data packet acquisition operation according to the acquisition request.
In this embodiment, after receiving the data obtaining request sent by the first client, the scheduler needs to determine at least one mirror cache node to obtain the data packet according to the data obtaining request. Before determining the mirror cache nodes, the scheduler may determine whether there is an idle mirror cache node to perform the packet obtaining operation according to the resource status of each mirror cache node. If an idle mirror cache node exists in the distributed mirror repository, the scheduler determines that at least one idle mirror cache node is used for performing a packet obtaining operation, which is not limited in this embodiment.
S203, calling at least one mirror image cache node to obtain at least one data block corresponding to the data packet identifier from a distributed storage system in a distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks.
The mirror image data is stored in the distributed storage system in blocks, so that the scheduler needs to determine the number of the data blocks to be acquired according to the data packet identifier of the first client, and accordingly, the idle mirror image cache nodes of corresponding number are determined to be called to execute the operation of acquiring the data blocks.
In this embodiment, the scheduler calls a corresponding number of mirror cache nodes to obtain the data blocks from the distributed storage system according to the number of the data blocks, for example, if the scheduler determines that the data packet identifier includes two data block identifiers, two mirror cache nodes are called to obtain the data blocks corresponding to the two data block identifiers from the distributed storage system, which is not limited in this embodiment.
S204, at least one data block is formed into a data packet and sent to the first client.
In this embodiment, after acquiring the data block returned by the at least one mirror cache node, the scheduler performs a packet processing on the data packet, and returns the data packet to the first client. For example, the scheduler receives data blocks sent by two mirror cache nodes, packages the two data blocks to generate a data packet, and returns the data packet to the first client, which is not limited in this embodiment.
In the data acquisition method, a scheduler receives a data acquisition request sent by a first client, if an idle mirror image cache node exists in a distributed mirror image warehouse, at least one mirror image cache node is determined from the idle mirror image cache node, at least one mirror image cache node is called to acquire at least one data block corresponding to a data packet identifier in the data acquisition request from a distributed storage system in the distributed mirror image warehouse, and the at least one data block is composed into a data packet to be sent to the first client. In the method, each mirror image data is stored in blocks based on the distributed storage system, the horizontal expansion of service capacity and storage capacity is realized, the problem of single-machine storage capacity is solved, and the bandwidth pressure of the distributed storage system is greatly reduced and the reading delay of the data packet is shortened by deploying a plurality of mirror image cache nodes based on the distributed storage system.
In a highly concurrent scenario, there is also a situation where a mirror cache node is occupied, in this case, in an embodiment, as shown in fig. 4, the method further includes:
s301, if no idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one second client according to the data packet identifier; the second client is other clients storing the data blocks corresponding to the data packet identifiers.
The second client is the client which is determined from other clients in the same network with the first client and stores the data blocks needed by the first client.
In this embodiment, the scheduler determines that there is no idle mirror cache node currently according to the resource occupation condition of each mirror cache node, at this time, the scheduler may obtain the data storage condition of other clients in the same network as the first client, that is, obtain the data block identifiers stored by the other clients, and determine the current client as the second client under the condition that it is determined that there is the data block identifier required by the first client, and optionally, may determine at least one second client according to the data block required by the first client. Optionally, the scheduler may also directly obtain, from other clients, the data block corresponding to the data packet identifier in the data obtaining request when receiving the data obtaining request of the first client, and since the data block in the other clients is an existing data block, obtaining the data block from the other clients may reduce data obtaining delay and reduce concurrent operations of the mirror cache node.
S302, at least one data block corresponding to the data packet identification is obtained from at least one second client.
In this embodiment, optionally, the data packet identifier includes at least one data block identifier, and after determining the second client, the scheduler obtains a data block corresponding to the data block identifier from the corresponding second client according to the data block identifier.
S303, at least one data block is formed into a data packet and sent to the first client.
In this embodiment, after acquiring the at least one data block, the scheduler performs a packet processing on the data packet, and returns the data packet to the first client. For example, the scheduler receives two data blocks, packages the two data blocks to generate a data packet, and returns the data packet to the first client, which is not limited in this embodiment.
In this embodiment, the scheduler may also obtain the data packet from other client nodes under the condition that the mirror cache node is occupied, so as to reduce the access requirement of the mirror cache node, and shorten the reading delay of the data packet, thereby satisfying the mirror operation in a high-concurrency scenario.
When determining the idle mirror cache node, the scheduler may determine whether the mirror cache node is occupied by determining a resource status of the mirror cache node, and in one embodiment, as shown in fig. 5, the method further includes:
s401, determining whether the resource of each mirror cache node is being scheduled.
In this embodiment, the scheduler may determine whether the resource of the current mirror cache node is being occupied by sending a task request to the mirror cache node. For example, the scheduler sends a task request to the mirror cache node, the mirror cache node returns a request response to the scheduler according to its resource scheduling condition after receiving the task request, and the scheduler determines whether the resource of the current mirror cache node is being scheduled according to the returned request response, which is not limited in this embodiment.
S402, if the resources of the mirror image cache node are not scheduled, determining that the mirror image cache node is an idle mirror image cache node.
In this embodiment, by using the example provided in step S401, if the request response received by the scheduler is the first value, which indicates that the resource of the current mirror cache node is idle, the scheduler may determine that the mirror cache node is an idle mirror cache node. Alternatively, the first value may be a designated character string, such as "0", or the like, which is not limited in this embodiment.
In this embodiment, the scheduler may determine an idle mirror cache node according to a resource scheduling condition of each mirror cache node to execute a corresponding data packet obtaining operation, so as to avoid a situation that a data packet obtaining task is suspended all the time due to resource occupation of the mirror cache node, and the determining method is simple and effective, thereby improving the efficiency of obtaining the data packet.
The scheduler calls the mirror cache node to obtain the data block corresponding to the data packet identifier from the distributed storage system, and in an embodiment, as shown in fig. 6, the calling the mirror cache node to obtain the data block corresponding to the data packet identifier from the distributed storage system includes:
s501, sending a data acquisition request to each mirror image cache node, so that the mirror image cache nodes respectively acquire data blocks corresponding to data packet identifiers from the distributed storage system according to the data packet identifiers in the data acquisition request.
And S502, receiving the data block returned by the mirror image cache node.
In this embodiment, the scheduler sends a data obtaining request to the mirror cache node, and the mirror cache node may obtain a data block corresponding to a data packet identifier from the distributed storage system according to the data packet identifier in the data obtaining request; optionally, the data packet identifier includes at least one data block identifier, the mirror cache node obtains a data block corresponding to the data block identifier from the distributed storage system according to the data block identifier, and returns the data block to the scheduler, and the scheduler receives the data block returned by each mirror cache node. This embodiment is not limited to this.
In this embodiment, the scheduler calls the mirror cache node to acquire the data block from the distributed system, and acquires the data packet from the distributed storage system through the mirror cache node, thereby greatly reducing the bandwidth pressure of the distributed storage system.
To better explain the above method, as shown in fig. 7, the present embodiment provides a data acquisition method, which specifically includes:
s101, a first client sends a first data acquisition request to a scheduler; the first data acquisition request comprises a client identifier and a data packet identifier;
s102, the scheduler determines whether the resource of each mirror image cache node is being scheduled, and if the resource of the mirror image cache node is not scheduled, the mirror image cache node is determined to be an idle mirror image cache node;
s103, the scheduler determines at least one mirror image cache node from the idle mirror image cache nodes and sends a second data acquisition request to the mirror image cache node; the second data acquisition request comprises a client identifier and a data packet identifier;
s104, the mirror image cache node generates a user identifier according to the client identifier in the second data acquisition request, and sends a third data acquisition request comprising the user identifier and the data packet identifier to the distributed storage system;
s105, the distributed storage system carries out validity verification on the mirror image cache node according to the user identification;
s106, if the mirror image cache node is legal, the distributed storage system feeds back a corresponding data block to the mirror image cache node according to the data packet identifier;
s107, the mirror image cache node returns the data block to the scheduler;
s108, the scheduler determines whether the resources of each mirror image cache node are being scheduled, and if no idle mirror image cache node exists in the distributed mirror image warehouse, at least one second client is determined according to the data packet identifier;
s109, the scheduler sends a second data acquisition request to at least one second client;
s110, the second client returns at least one data block to the scheduler according to the data packet identifier in the second data acquisition request;
and S111, the scheduler sends the data packet formed by at least one data block to the first client.
In this embodiment, the distributed storage system is configured to perform block storage on each mirror data, implement horizontal expansion of service capacity and storage capacity, and solve the problem of single-machine storage capacity, and a plurality of mirror cache nodes are deployed based on the distributed storage system, and acquire a data packet from the distributed storage system through the mirror cache nodes, thereby greatly reducing bandwidth pressure of the distributed storage system.
The data obtaining method provided by the above embodiment has similar implementation principle and technical effect to those of the above method embodiment, and is not described herein again.
It should be understood that although the various steps in the flow charts of fig. 3-7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 3-7 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 8, there is provided a data acquisition apparatus including: the device comprises a receiving module 01, a determining module 02, an obtaining module 03 and a sending module 04, wherein:
the receiving module 01 is used for receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
the determining module 02 is configured to determine at least one mirror image cache node from the idle mirror image cache nodes if the idle mirror image cache nodes exist in the distributed mirror image warehouse;
the obtaining module 03 is configured to call at least one mirror cache node to obtain at least one data block corresponding to the data packet identifier from a distributed storage system in a distributed mirror warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and the sending module 04 is configured to send a data packet composed of at least one data block to the first client.
In an embodiment, the determining module 02 is further configured to determine, if there is no idle mirror cache node in the distributed mirror repository, at least one second client according to the data packet identifier; the second client is other clients storing the data blocks corresponding to the data packet identifiers; acquiring at least one data block corresponding to the data packet identifier from at least one second client; and at least one data block is formed into a data packet and sent to the first client.
In an embodiment, the determining module 02 is further configured to determine whether resources of each mirror cache node are being scheduled; and if the resources of the mirror image cache node are not scheduled, determining the mirror image cache node as an idle mirror image cache node.
In an embodiment, the obtaining module 03 is configured to send a data obtaining request to each mirror image cache node, so that the mirror image cache nodes respectively obtain data blocks corresponding to data packet identifiers from the distributed storage system according to the data packet identifiers in the data obtaining request; and receiving the data block returned by the mirror image cache node.
For specific limitations of the data acquisition device, reference may be made to the above limitations of the data acquisition method, which are not described herein again. The modules in the data acquisition device can be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 9. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a data acquisition method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
if an idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one mirror image cache node from the idle mirror image cache node;
calling at least one mirror image cache node to obtain at least one data block corresponding to the data packet identifier from a distributed storage system in a distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and at least one data block is formed into a data packet and sent to the first client.
The implementation principle and technical effect of the computer device provided by the above embodiment are similar to those of the above method embodiment, and are not described herein again.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
if an idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one mirror image cache node from the idle mirror image cache node;
calling at least one mirror image cache node to obtain at least one data block corresponding to the data packet identifier from a distributed storage system in a distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and at least one data block is formed into a data packet and sent to the first client.
The implementation principle and technical effect of the computer-readable storage medium provided by the above embodiments are similar to those of the above method embodiments, and are not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method for data acquisition, the method comprising:
receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
if an idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one mirror image cache node from the idle mirror image cache node;
calling the at least one mirror image cache node to acquire at least one data block corresponding to the data packet identifier from a distributed storage system in the distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and the at least one data block is formed into a data packet and is sent to the first client.
2. The method of claim 1, further comprising:
if no idle mirror image cache node exists in the distributed mirror image warehouse, determining at least one second client according to the data packet identifier; the second client is other clients storing the data blocks corresponding to the data packet identifiers;
acquiring at least one data block corresponding to the data packet identifier from the at least one second client;
and the at least one data block is formed into a data packet and is sent to the first client.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
determining whether resources of each of the mirror cache nodes are being scheduled;
and if the resources of the mirror image cache node are not scheduled, determining that the mirror image cache node is an idle mirror image cache node.
4. The method of claim 1, wherein the invoking the mirror cache node to obtain the data block corresponding to the data packet identifier from a distributed storage system comprises:
sending the data acquisition request to each mirror image cache node, so that the mirror image cache nodes respectively acquire data blocks corresponding to the data packet identifiers from the distributed storage system according to the data packet identifiers in the data acquisition request;
and receiving the data block returned by the mirror image cache node.
5. A data acquisition system, characterized in that the system comprises: the system comprises a distributed storage system, a plurality of mirror image cache nodes, a scheduler and a first client;
the first client is used for sending a first data acquisition request to the scheduler; the first data acquisition request comprises a data packet identifier;
the scheduler is used for determining at least one mirror cache node from the idle mirror cache nodes under the condition that the idle mirror cache nodes exist in the distributed mirror warehouse;
the scheduler is further configured to invoke the mirror cache node to obtain at least one data block corresponding to the data packet identifier from a distributed storage system in the distributed mirror warehouse, and to compose a data packet from the at least one data block and send the data packet to the first client; the distributed storage system is used for storing each mirror image data in blocks.
6. The system of claim 5, further comprising a plurality of second clients;
the scheduler is further configured to determine at least one second target client according to the data packet identifier when no idle mirror cache node exists in the distributed mirror repository, and the second target client sends the first data acquisition request; the second target client is a second client which stores the data block corresponding to the data packet identifier;
the second target client is used for sending at least one data block corresponding to the data packet identifier to the scheduler;
the scheduler is further configured to compose a data packet with the at least one data block and send the data packet to the first client.
7. The system according to claim 5, wherein the data acquisition request further includes a client identifier of the first client;
the mirror image cache node is further configured to generate a user identifier according to the client identifier in the first data acquisition request, and send a second data acquisition request including the user identifier and a data packet identifier to the distributed storage system;
and the distributed storage system is used for carrying out validity verification on the mirror image cache node according to the user identification, and if the mirror image cache node is legal, feeding back a corresponding data block to the mirror image cache node according to the data packet identification.
8. A data acquisition apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving a data acquisition request sent by a first client; the data acquisition request comprises a data packet identifier;
the determining module is used for determining at least one mirror image cache node from the idle mirror image cache nodes if the idle mirror image cache nodes exist in the distributed mirror image warehouse;
the acquisition module is used for calling the at least one mirror image cache node to acquire at least one data block corresponding to the data packet identifier from a distributed storage system in the distributed mirror image warehouse; the distributed storage system is used for storing each mirror image data in blocks;
and the sending module is used for forming a data packet by the at least one data block and sending the data packet to the first client.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 4.
CN202011622404.6A 2020-12-30 2020-12-30 Data acquisition method, system, device, computer equipment and storage medium Active CN112748879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011622404.6A CN112748879B (en) 2020-12-30 2020-12-30 Data acquisition method, system, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011622404.6A CN112748879B (en) 2020-12-30 2020-12-30 Data acquisition method, system, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112748879A true CN112748879A (en) 2021-05-04
CN112748879B CN112748879B (en) 2023-03-10

Family

ID=75650370

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011622404.6A Active CN112748879B (en) 2020-12-30 2020-12-30 Data acquisition method, system, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112748879B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568586A (en) * 2021-09-17 2021-10-29 支付宝(杭州)信息技术有限公司 Data access method and device for distributed image learning architecture
CN114760116A (en) * 2022-03-30 2022-07-15 北京奇艺世纪科技有限公司 Verification method, verification device, electronic equipment and storage medium
CN114785770A (en) * 2022-04-01 2022-07-22 京东科技信息技术有限公司 Mirror layer file sending method and device, electronic equipment and computer readable medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130227100A1 (en) * 2012-02-27 2013-08-29 Jason Edward Dobies Method and system for load balancing content delivery servers
CN103455577A (en) * 2013-08-23 2013-12-18 中国科学院计算机网络信息中心 Multi-backup nearby storage and reading method and system of cloud host mirror image file
CN103905503A (en) * 2012-12-27 2014-07-02 中国移动通信集团公司 Data storage method, data scheduling method, device and system
US20160142487A1 (en) * 2014-11-13 2016-05-19 Tenoware R&D Limited Grid Distributed Cache
CN105740048A (en) * 2016-01-26 2016-07-06 华为技术有限公司 Image management method, device and system
CN106371889A (en) * 2016-08-22 2017-02-01 浪潮(北京)电子信息产业有限公司 Method and device for realizing high-performance cluster system for scheduling mirror images
CN107733977A (en) * 2017-08-31 2018-02-23 北京百度网讯科技有限公司 A kind of cluster management method and device based on Docker
US20180364915A1 (en) * 2017-06-16 2018-12-20 Alibaba Group Holding Limited Method and system for distributed storage using client-side global persistent cache
CN109314653A (en) * 2016-06-06 2019-02-05 讯宝科技有限责任公司 The client device and method of the associated predefined parameter collection of radio for analyzing with being coupled to WLAN
CN110046901A (en) * 2018-12-28 2019-07-23 阿里巴巴集团控股有限公司 Reliability verification method, system, device and the equipment of alliance's chain
CN110427270A (en) * 2019-08-09 2019-11-08 华东师范大学 The dynamic load balancing method of distributed connection operator under a kind of network towards RDMA
CN111190547A (en) * 2019-12-30 2020-05-22 中国电子科技集团公司信息科学研究院 Distributed container mirror image storage and distribution system and method
CN111399764A (en) * 2019-12-25 2020-07-10 杭州海康威视系统技术有限公司 Data storage method, data reading device, data storage equipment and data storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130227100A1 (en) * 2012-02-27 2013-08-29 Jason Edward Dobies Method and system for load balancing content delivery servers
CN103905503A (en) * 2012-12-27 2014-07-02 中国移动通信集团公司 Data storage method, data scheduling method, device and system
CN103455577A (en) * 2013-08-23 2013-12-18 中国科学院计算机网络信息中心 Multi-backup nearby storage and reading method and system of cloud host mirror image file
US20160142487A1 (en) * 2014-11-13 2016-05-19 Tenoware R&D Limited Grid Distributed Cache
CN105740048A (en) * 2016-01-26 2016-07-06 华为技术有限公司 Image management method, device and system
CN109314653A (en) * 2016-06-06 2019-02-05 讯宝科技有限责任公司 The client device and method of the associated predefined parameter collection of radio for analyzing with being coupled to WLAN
CN106371889A (en) * 2016-08-22 2017-02-01 浪潮(北京)电子信息产业有限公司 Method and device for realizing high-performance cluster system for scheduling mirror images
US20180364915A1 (en) * 2017-06-16 2018-12-20 Alibaba Group Holding Limited Method and system for distributed storage using client-side global persistent cache
CN107733977A (en) * 2017-08-31 2018-02-23 北京百度网讯科技有限公司 A kind of cluster management method and device based on Docker
CN110046901A (en) * 2018-12-28 2019-07-23 阿里巴巴集团控股有限公司 Reliability verification method, system, device and the equipment of alliance's chain
CN110427270A (en) * 2019-08-09 2019-11-08 华东师范大学 The dynamic load balancing method of distributed connection operator under a kind of network towards RDMA
CN111399764A (en) * 2019-12-25 2020-07-10 杭州海康威视系统技术有限公司 Data storage method, data reading device, data storage equipment and data storage medium
CN111190547A (en) * 2019-12-30 2020-05-22 中国电子科技集团公司信息科学研究院 Distributed container mirror image storage and distribution system and method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568586A (en) * 2021-09-17 2021-10-29 支付宝(杭州)信息技术有限公司 Data access method and device for distributed image learning architecture
CN113568586B (en) * 2021-09-17 2021-12-17 支付宝(杭州)信息技术有限公司 Data access method and device for distributed image learning architecture
CN114760116A (en) * 2022-03-30 2022-07-15 北京奇艺世纪科技有限公司 Verification method, verification device, electronic equipment and storage medium
CN114760116B (en) * 2022-03-30 2024-04-12 北京奇艺世纪科技有限公司 Verification method, verification device, electronic equipment and storage medium
CN114785770A (en) * 2022-04-01 2022-07-22 京东科技信息技术有限公司 Mirror layer file sending method and device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN112748879B (en) 2023-03-10

Similar Documents

Publication Publication Date Title
CN112748879B (en) Data acquisition method, system, device, computer equipment and storage medium
Wu et al. A cooperative computing strategy for blockchain-secured fog computing
JP6332766B2 (en) Trusted Service Manager Trusted Security Zone Container for data protection and confidentiality
KR20190020073A (en) Acceleration resource processing method and apparatus, and network function virtualization system
KR20130142961A (en) Automatic application updates
EP3293969A1 (en) Method of terminal-based conference load-balancing, and device and system utilizing same
CN112099979B (en) Access control method, device, computer equipment and storage medium
US20200228572A1 (en) Event-restricted credentials for resource allocation
US11818576B2 (en) Systems and methods for low latency cloud computing for mobile applications
US20220075890A1 (en) Secure storage access through rate limitation
WO2021126329A1 (en) Context-aware obfuscation and unobfuscation of sensitive content
WO2023131058A1 (en) System and method for scheduling resource service application in digital middle office of enterprise
TW202301118A (en) Dynamic microservices allocation mechanism
CN113553178A (en) Task processing method and device and electronic equipment
CN111163140A (en) Method, apparatus and computer readable storage medium for resource acquisition and allocation
US20180101485A1 (en) Method and apparatus for accessing private data in physical memory of electronic device
CN115269198A (en) Access request processing method based on server cluster and related equipment
CN114924888A (en) Resource allocation method, data processing method, device, equipment and storage medium
US9184996B2 (en) Thin client system, management server, client environment management method and program
US9774640B2 (en) Method and system for sharing applications among a plurality of electronic devices
CN113254150A (en) Load balancing method, system, device, computer equipment and storage medium
CN113271228A (en) Bandwidth resource scheduling method, device, equipment and computer readable storage medium
US20210173724A1 (en) System and method to securely broadcast a message to accelerators using virtual channels
CN112543194A (en) Mobile terminal login method and device, computer equipment and storage medium
US11431826B2 (en) Secure demand-driven file distribution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant