CN116450657B

CN116450657B - Data fragment scheduling method, device, equipment and storage medium

Info

Publication number: CN116450657B
Application number: CN202310721024.5A
Authority: CN
Inventors: 陈冠伟; 徐锋; 黄一鹏; 唐杰
Original assignee: Beijing Haoxin Internet Hospital Co ltd
Current assignee: Good Feeling Health Industry Group Co ltd
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-08-29
Anticipated expiration: 2043-06-19
Also published as: CN116450657A

Abstract

The embodiment of the specification provides a data slicing scheduling method, a device, equipment and a storage medium, wherein the data slicing scheduling method comprises the following steps: acquiring a data request, determining initial query data according to the data request, performing data processing on the initial query data, determining target query data, inputting the target query data into a deep learning model, determining slicing information, determining target data based on the slicing information, and performing slicing scheduling on the target data. The method comprises the steps of obtaining a data request, determining initial query data according to the data request, performing data processing on the initial query data, determining target query data, inputting the target query data into a deep learning model, determining slicing information, determining target data based on the slicing information, and performing slicing scheduling on the target data, so that hardware resource saving and query efficiency improvement are achieved.

Description

Data fragment scheduling method, device, equipment and storage medium

Technical Field

The embodiment of the specification relates to the technical field of data processing, in particular to a data slicing scheduling method.

Background

When processing user data requests, in order to cope with sudden user traffic increase, a general data processing system applies and deploys hardware resources according to the peak value of user traffic, in addition, in order to accelerate data processing efficiency, the same data is generally deployed in a plurality of systems, and simultaneously, the data requests are processed in parallel in a plurality of hardware, so as to achieve the purposes of reducing user queuing and improving data processing efficiency.

The existing data slicing is generally used for carrying out data slicing and increasing or reducing hardware resources according to experience of operators, the basis of data support and data verification is lacked, the flexibility and the scientificity are not realized, if the operation is to increase the resources uniformly, the resource waste is caused, the situation that the user data request cannot be supported is possibly caused by reducing the resources, the system is unstable, and the user experience is reduced.

Disclosure of Invention

In view of this, the present embodiments provide a data slicing scheduling method. One or more embodiments of the present specification relate to a data slicing scheduling apparatus, a computing device, a computer-readable storage medium, and a computer program, which solve the technical drawbacks existing in the prior art.

According to a first aspect of embodiments of the present disclosure, there is provided a data slicing scheduling method, including:

acquiring a data request, and determining initial query data according to the data request;

performing data processing on the initial query data to determine target query data;

inputting target query data into a deep learning model, and determining fragment information;

and determining target data based on the slicing information, and carrying out slicing scheduling on the target data.

In one possible implementation, obtaining a data request, determining initial query data from the data request includes:

acquiring a data request, and determining a data query statement according to the data request;

determining user information and resource use information based on the data query statement;

initial query data is determined based on the user information and the resource usage information.

In one possible implementation, the data processing is performed on the initial query data to determine target query data, including:

determining a data processing rule, performing data cleaning, data conversion and data filtering on the initial query data based on the data processing rule, and determining target query data.

In one possible implementation, inputting the target query data into a deep learning model, determining the shard information, includes:

carrying out numerical processing on the target query data to obtain numerical data;

converting the numeric data into a data matrix, and carrying out normalization processing on the data matrix to determine a target data set;

model training is carried out based on the target data set to determine a deep learning model;

the fragmentation information is determined based on a deep learning model.

In one possible implementation, model training based on a target dataset determines a deep learning model, comprising:

dividing the target data set into a training data set and a verification data set;

and determining model parameters of the deep learning model based on the training data set, the verification data set and the cross entropy loss function to obtain the trained deep learning model.

In one possible implementation, determining the target data based on the fragmentation information includes:

determining a target corresponding relation according to the fragment information; the target corresponding relation comprises a corresponding relation between target data and a target database and a corresponding relation between the target data and an initial database;

target data is determined from an initial database.

In one possible implementation, performing the slice scheduling on the target data includes:

carrying out capacity reduction processing on the data fragments corresponding to the target data of the initial database;

and performing capacity expansion processing on the target database based on the target data.

According to a second aspect of embodiments of the present specification, there is provided a data slicing scheduling apparatus, including:

the data acquisition module is configured to acquire a data request and determine initial query data according to the data request;

the data processing module is configured to perform data processing on the initial query data and determine target query data;

the slicing determination module is configured to input target query data into the deep learning model and determine slicing information;

and the slicing scheduling module is configured to determine target data based on the slicing information and perform slicing scheduling on the target data.

According to a third aspect of embodiments of the present specification, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions that, when executed by the processor, perform the steps of the data slicing scheduling method described above.

According to a fourth aspect of embodiments of the present specification, there is provided a computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the data slicing scheduling method described above.

According to a fifth aspect of embodiments of the present specification, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the data slicing scheduling method described above.

Drawings

Fig. 1 is a schematic view of a scenario of a data slicing scheduling method according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a method for data slicing scheduling according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a processing procedure of a data slicing scheduling method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data slicing scheduling apparatus according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of a computing device provided in one embodiment of the present description.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many other forms than described herein and similarly generalized by those skilled in the art to whom this disclosure pertains without departing from the spirit of the disclosure and, therefore, this disclosure is not limited by the specific implementations disclosed below.

The terminology used in the one or more embodiments of the specification is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the specification. As used in this specification, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of this specification to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

In the present specification, a data slicing scheduling method is provided, and the present specification relates to a data slicing scheduling apparatus, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.

Referring to fig. 1, fig. 1 is a schematic diagram of a scenario of a data slicing scheduling method according to an embodiment of the present disclosure.

In the application scenario of fig. 1, the computing device 101 may obtain the data request 102, determine the initial query data 103 according to the data request, the computing device 101 may perform data processing on the initial query data 103, determine the target query data 104, the computing device 101 may input the target query data 104 into the deep learning model 105, determine the fragmentation information 106, the computing device 101 may determine the target data 107 based on the fragmentation information 106, and perform the fragmentation scheduling on the target data 107.

The computing device 101 may be hardware or software. When the computing device 101 is hardware, it may be implemented as a distributed cluster of multiple servers or terminal devices, or as a single server or single terminal device. When the computing device 101 is embodied as software, it may be installed in the hardware devices listed above. It may be implemented as a plurality of software or software modules, for example, for providing distributed services, or as a single software or software module. The present invention is not particularly limited herein.

Referring to fig. 2, fig. 2 shows a flowchart of a data slicing scheduling method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 201: and acquiring a data request, and determining initial query data according to the data request.

In one possible implementation, obtaining a data request, determining initial query data from the data request includes: acquiring a data request, and determining a data query statement according to the data request; determining user information and resource use information based on the data query statement; initial query data is determined based on the user information and the resource usage information.

In practical application, user inquiry submits a data request in a user terminal (s 100) (including but not limited to PC & APP & H5& various applets), an inquiry Statement (SQL) and user information (including user ID, user authority and desensitization information of the user) related to the inquiry are reported through a sample acquisition system (s 101), reporting information designed by the inquiry is organized in a JSON mode, and global unique ID (queryId) is generated for the inquiry content and reporting time; the data processing system (s 104) reports the global unique ID (queryId) of the user data request, the CPU, disk IO, network delay, fragmentation and local hardware information consumed by the query, other computing resource conditions (including the number of threads used, thread waiting time, memory use condition of the data query and system load information) and other information to the sample collection system through the resource reporting module (s 101), and specific data formats reported by s101 and s104 are shown in the drawings, namely, the report information of the user terminal and the report information of the data processing system.

Step 202: and carrying out data processing on the initial query data to determine target query data.

In one possible implementation, the data processing is performed on the initial query data to determine target query data, including: determining a data processing rule, performing data cleaning, data conversion and data filtering on the initial query data based on the data processing rule, and determining target query data.

In practical application, the sample collection system (s 101) collects information in the user side (s 101) and the data processing side (s 104) at the same time, and through data cleaning, conversion and filtering, the primarily processed information is integrated into a data set of data samples through a data integration module in the sample collection system (s 101) in the queryId and time dimension for data storage, and the collected data set is reported to the deep learning system (s 102) after the time of T+1.

Step 203: inputting the target query data into a deep learning model, and determining the fragment information.

In one possible implementation, inputting the target query data into a deep learning model, determining the shard information, includes: carrying out numerical processing on the target query data to obtain numerical data; converting the numeric data into a data matrix, and carrying out normalization processing on the data matrix to determine a target data set; model training is carried out based on the target data set to determine a deep learning model; the fragmentation information is determined based on a deep learning model.

In practical application, the deep learning system (s 102) performs the numerical processing of the sample according to the T+1 increment data reported by the sample collection system (s 101),

the incremental data is converted into a data matrix by taking the queryId as a row dimension (see the drawing: data normalization processing), and the data needs to be normalized for facilitating the subsequent model matrix calculation because the difference of each numerical value in the data matrix is larger, and the normalization processing formula is as follows:

wherein the mean function represents averaging the matrix and the std function represents variance of the matrix.

In one possible implementation, model training based on a target dataset determines a deep learning model, comprising: dividing the target data set into a training data set and a verification data set; and determining model parameters of the deep learning model based on the training data set, the verification data set and the cross entropy loss function to obtain the trained deep learning model.

In practical application, the normalized data set is divided into a training data set (80%) and a verification data set (20%), the data set is subjected to deep learning training, model data characteristic extraction and model training are produced, the obtained prediction result of the model training and the real data are compared by using a cross entropy loss function, whether training model parameters need to be adjusted for learning rate adjustment and retraining are evaluated, the verification result set is used for effect evaluation by using the data model obtained through retraining, if the result deviation is large, the step is continuously repeated until the evaluation effect reaches an expected threshold value, and the available data model is produced.

The learning rate formula:

=/>

wherein, the liquid crystal display device comprises a liquid crystal display device,for the learning rate of the ith node at the kth iteration, delta is the constant of the ith node,/for the ith node>An adaptive learning efficiency gradient value for the ith node;

the model results are exemplified as follows:

wherein, the mapping relation between the number serial number and the table name is pre-established;

1:t_order

2:t_user

3:t_account

4:t_msg

the fragmentation data for each library table is as follows:

[[1,3],[2,1],[3,1],[4,1]]

wherein 1 in 1 and 3 represents a t_order table, and the slice is 3.

In one possible implementation, determining the target data based on the fragmentation information includes: determining a target corresponding relation according to the fragment information; the target corresponding relation comprises a corresponding relation between target data and a target database and a corresponding relation between the target data and an initial database; target data is determined from an initial database.

In practical application, deployment is performed through a slicing scheduling system (s 103) according to a result set obtained in the model training system. The slicing scheduling system (s 103) distributes data slicing information related to the data model result set to the data processing system (s 104) through the slicing scheduling module, and waits for data slicing scheduling to be executed.

In one possible implementation, performing the slice scheduling on the target data includes: carrying out capacity reduction processing on the data fragments corresponding to the target data of the initial database; and performing capacity expansion processing on the target database based on the target data.

In practical application, each machine in the data processing system (s 104) performs capacity expansion or capacity contraction processing on the data fragments according to the fragment scheduling information in the fragment scheduling system (s 103), and reports the fragment scheduling result, the local resources and other information to the sample collection system (s 101) and the fragment scheduling system (s 103) at the same time so as to prepare for subsequent sample data processing and fragment scheduling processing. Step 204: and determining target data based on the slicing information, and carrying out slicing scheduling on the target data.

Corresponding to the method embodiment, the present disclosure further provides an embodiment of a data slicing scheduling device, and fig. 4 shows a schematic structural diagram of the data slicing scheduling device provided in one embodiment of the present disclosure. As shown in fig. 4, the apparatus includes:

a data acquisition module 401 configured to acquire a data request, determine initial query data according to the data request;

a data processing module 402 configured to perform data processing on the initial query data to determine target query data;

a shard determination module 403 configured to input the target query data into a deep learning model, determining shard information;

the slicing scheduling module 404 is configured to determine target data based on the slicing information, and perform slicing scheduling on the target data.

In one possible implementation, the data acquisition module 401 is further configured to:

In one possible implementation, the data processing module 402 is further configured to:

In one possible implementation, the fragmentation determination module 403 is further configured to:

the fragmentation information is determined based on a deep learning model.

In one possible implementation, the tile scheduling module 404 is further configured to:

target data is determined from an initial database.

The embodiment of the specification provides a data slicing scheduling method, a device, equipment and a storage medium, wherein the data slicing scheduling device comprises the following steps: acquiring a data request, determining initial query data according to the data request, performing data processing on the initial query data, determining target query data, inputting the target query data into a deep learning model, determining slicing information, determining target data based on the slicing information, and performing slicing scheduling on the target data. The method comprises the steps of obtaining a data request, determining initial query data according to the data request, performing data processing on the initial query data, determining target query data, inputting the target query data into a deep learning model, determining slicing information, determining target data based on the slicing information, and performing slicing scheduling on the target data, so that hardware resource saving and query efficiency improvement are achieved.

The foregoing is a schematic scheme of a data slicing scheduling apparatus of this embodiment. It should be noted that, the technical solution of the data slicing scheduling apparatus and the technical solution of the data slicing scheduling method belong to the same concept, and details of the technical solution of the data slicing scheduling apparatus, which are not described in detail, can be referred to the description of the technical solution of the data slicing scheduling method.

Fig. 5 illustrates a block diagram of a computing device 500 provided in accordance with one embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530 and database 550 is used to hold data.

Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, wired or wireless (e.g., network interface card (NIC, network interface controller)), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, near field communication (NFC, near Field Communication).

In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device shown in FIG. 5 is for exemplary purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 500 may also be a mobile or stationary server.

Wherein the processor 520 is configured to execute computer-executable instructions that, when executed by the processor, perform the steps of the data slicing scheduling method described above. The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the data slicing scheduling method belong to the same concept, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the data slicing scheduling method.

An embodiment of the present disclosure also provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data slicing scheduling method described above.

The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the data slicing scheduling method belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the data slicing scheduling method.

An embodiment of the present disclosure further provides a computer program, where the computer program, when executed in a computer, causes the computer to perform the steps of the data slicing scheduling method described above.

The above is an exemplary version of a computer program of the present embodiment. It should be noted that, the technical solution of the computer program and the technical solution of the data slicing scheduling method belong to the same concept, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the data slicing scheduling method.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the embodiments are not limited by the order of actions described, as some steps may be performed in other order or simultaneously according to the embodiments of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all required for the embodiments described in the specification.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are merely used to help clarify the present specification. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of the embodiments. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This specification is to be limited only by the claims and the full scope and equivalents thereof.

Claims

1. A data slicing scheduling method, comprising:

inputting the target query data into a deep learning model, and determining fragment information;

determining target data based on the slicing information, and carrying out slicing scheduling on the target data;

the determining target data based on the slice information includes:

determining a target corresponding relation according to the fragment information; the target corresponding relation comprises a corresponding relation between the target data and a target database and a corresponding relation between the target data and an initial database;

determining the target data from the initial database; the initial database is a database in which the target data are located, and the target database is a database to which the target data are scheduled;

the performing the slice scheduling on the target data includes:

and carrying out capacity reduction processing on the data fragments corresponding to the target data of the initial database, and carrying out capacity expansion processing on the target database based on the target data.

2. The method of claim 1, wherein the obtaining the data request, determining initial query data based on the data request, comprises:

and determining initial query data according to the user information and the resource use information.

3. The method of claim 1, wherein the data processing the initial query data to determine target query data comprises:

4. The method of claim 1, wherein the inputting the target query data into a deep learning model, determining shard information, comprises:

performing numerical processing on the target query data to obtain numerical data;

converting the numerical data into a data matrix, and carrying out normalization processing on the data matrix to determine a target data set;

model training is carried out based on the target data set to determine the deep learning model;

determining the shard information based on the deep learning model.

5. The method of claim 4, wherein the model training based on the target dataset to determine the deep learning model comprises:

and determining model parameters of the deep learning model based on the training data set, the verification data set and the cross entropy loss function to obtain a trained deep learning model.

6. A data slicing scheduling apparatus, comprising:

the slicing determination module is configured to input the target query data into a deep learning model and determine slicing information;

the slicing scheduling module is configured to determine target data based on the slicing information and perform slicing scheduling on the target data;

the determining target data based on the slice information includes:

the performing the slice scheduling on the target data includes:

7. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer executable instructions, and the processor is configured to execute the computer executable instructions, which when executed by the processor, implement the steps of the data slicing scheduling method of any one of claims 1 to 5.

8. A computer readable storage medium storing computer executable instructions which when executed by a processor implement the steps of the data slicing scheduling method of any one of claims 1 to 5.