CN115907031A

CN115907031A - Service processing method, device, equipment and storage medium

Info

Publication number: CN115907031A
Application number: CN202211439090.5A
Authority: CN
Inventors: 郑达明
Original assignee: Bigo Technology Singapore Pte Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-04-04

Abstract

The application discloses a service processing method, a device, equipment and a storage medium, wherein the method comprises the following steps: in this embodiment, each deep learning model is allocated with at least one first process, all the deep learning models are allocated with one second process, each deep learning model is loaded into at least one third process, and the first process receives a request for invoking the deep learning model and transmits service data in the request to the second process; when the business data are accumulated to meet the preset conditions, the second process merges all the accumulated business data into a batch of source data and transmits the source data to the third process; and calling a deep learning model to process the source data in batch by the third process to obtain target data. The division of labor among the processes is clear, the repeated occupation of resources is reduced, the deep learning model is called through the parallel computing architecture for batch processing, and the operation efficiency can be obviously improved under the condition that the resources are limited.

Description

Service processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a device, and a storage medium for processing a service.

Background

With the continuous increase of resources such as processors and memories and the continuous increase of data volume, deep learning models such as CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) have been developed greatly, and the deep learning models are also applied to a plurality of services such as security, automatic driving, and speech synthesis more and more.

At present, each new deep learning model is deployed, a set of independent codes are independently compiled for the new deep learning model, business data is reasoned, the number of deployed deep learning models is increased along with the increase of businesses, the consumption of resources is increased, and the operating efficiency of the deep learning models is low under the condition that the resources are limited.

Disclosure of Invention

The application provides a business processing method, a business processing device, business processing equipment and a storage medium, and aims to solve the problem of how to improve the operation efficiency of a deep learning model under the condition of resource limitation.

According to an aspect of the present application, a business processing method is provided, where each deep learning model is allocated to at least one first process, all the deep learning models are allocated to one second process, and each deep learning model is loaded into at least one third process, the method including:

the first process receives a request for calling the deep learning model and transmits service data in the request to the second process;

when the second process accumulates the service data until a preset condition is met, merging all the accumulated service data into source data of one batch, and transmitting the source data to the third process;

and calling the deep learning model to process the source data in batch by the third process to obtain target data.

According to another aspect of the present application, there is provided a service processing apparatus, the apparatus including at least one first process, one second process, and at least one third process; the deep learning models are distributed with at least one first process, all the deep learning models are distributed with one second process, and all the deep learning models are loaded into at least one third process;

the first process is used for receiving a request for calling the deep learning model and transmitting the service data in the request to the second process;

the second process is configured to, when the service data are accumulated until a preset condition is met, merge all the accumulated service data into a batch of source data, and transmit the source data to the third process;

and the third process is used for calling the deep learning model to process the source data in batch to obtain target data.

According to another aspect of the present application, there is provided a service processing apparatus, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a business process method as described in any of the embodiments of the present application.

According to another aspect of the present application, there is provided a computer-readable storage medium storing a computer program for causing a processor to implement a business processing method according to any one of the embodiments of the present application when the computer program is executed.

According to another aspect of the present application, a computer program product is provided, the computer program product comprising a computer program which, when executed by a processor, implements the business processing method of any of the embodiments of the present application.

In this embodiment, each deep learning model is allocated with at least one first process, all the deep learning models are allocated with one second process, each deep learning model is loaded into at least one third process, and the first process receives a request for invoking the deep learning model and transmits service data in the request to the second process; when the second process accumulates the service data until the service data meet the preset conditions, all the accumulated service data are combined into source data of one batch, and the source data are transmitted to a third process; and calling a deep learning model to process the source data in batch by the third process to obtain target data. In the embodiment, the scheduling system is constructed by collecting the first process, the second process, the third process and other structures, the division of labor among the processes is clear, the repetitive operation is reduced, the resource occupation of the repeatability is reduced, the expansibility of the scheduling system is fully reserved, the deep learning model is called through the architecture of the parallel computing to perform batch processing, and the operation efficiency can be obviously improved under the condition of resource limitation.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present application, nor are they intended to limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a flowchart of a service processing method according to an embodiment of the present application;

fig. 2 is an architecture diagram of a scheduling system according to an embodiment of the present application;

fig. 3 is a flowchart of a service processing method according to the second embodiment of the present application;

fig. 4 is a schematic structural diagram of a service processing apparatus according to a third embodiment of the present application;

fig. 5 is a schematic structural diagram of a service processing device for implementing the fourth embodiment of the present application.

Detailed Description

In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example one

Fig. 1 is a flowchart of a service processing method according to an embodiment of the present application, where the present embodiment is applicable to a case where a plurality of deep learning models are scheduled in a unified architecture, and the method may be executed by a service processing apparatus, where the service processing apparatus may be implemented in a form of hardware and/or software, and the service processing apparatus may be configured in a service processing device. Business processing devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The business processing apparatus may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices.

In addition to the processors supporting batch Processing, processors generally having a logic Processing function, such as a Central Processing Unit (CPU), may also be configured in the service Processing device.

A plurality of deep learning models may be deployed in the service processing device according to the service requirement, the structure of the deep learning model is not limited to the manually designed deep learning model, for example, resNet (residual error network), LSTM (Long Short Term Memory, long Short Term Memory network), GAN (generic adaptive Networks, generative countermeasure network), and the like, and the deep learning model may also be optimized by a model quantization method, and the deep learning model may be searched for service characteristics by an NAS (Neural network Architecture Search) method, and the like, which is not limited in this embodiment.

In some designs, a deep learning model independently provides a complete business service, such as face recognition, TTS (Text To Speech), and the like.

In some designs, at least two deep learning models provide a complete business service simultaneously, e.g., one deep learning model identifies traffic lights in the image data, another deep learning model identifies countdown in traffic lights, and so on.

Further, for the same deep learning model, the provided service can be used as a complete business service in some cases, and can also be used as a part of a complete business service in some cases.

In this embodiment, in order to improve the scheduling efficiency, one or more deep learning models providing a complete business service may be divided into the same set for scheduling, that is, one or more deep learning models in the same set provide a complete business service.

In the same set, as shown in fig. 2, each deep learning model is allocated to at least one first process, all the deep learning models are allocated to one second process, and each deep learning model is loaded to at least one third process.

In this embodiment, a framework for scheduling a plurality of deep learning models may be compiled into a scheduling system (program), and the scheduling system creates a first process, a second process, a third process, and a fourth process, respectively, when being started.

The first Process, the second Process, the third Process and the fourth Process all belong to a Process (Process), the program is a description of instructions, data and an organization form thereof, the Process is a running activity of the program on a certain data set and is a basic unit for resource allocation and scheduling, that is, the Process is an entity of the program. Is a running activity of a program in a computer on a certain data set, and is a basic unit for resource allocation and scheduling of a system.

When the second process is started, a first thread, a second thread and a third thread are respectively created, and the first thread, the second thread and the third thread are cooperatively matched to complete the function of the second process.

The first thread, the second thread and the third thread all belong to threads (threads), and the threads are the minimum units capable of performing operation scheduling. It is included in the process and is the actual unit of operation in the process. A thread refers to a single sequential control flow in a process, multiple threads (a program with many relatively independent execution flows shares most data structures of a program) can be concurrently executed in one process, each thread executes different tasks in parallel, and the threads belonging to the same process share all resources owned by the process.

In the same set, each deep learning model is distributed with at least one first process, all the deep learning models are distributed with one second process, and each deep learning model is loaded into at least one third process.

The first process, also called data process, as the gateway of the whole program, processes the logic related to the business, receives the business data and feeds back the target data inferred by the deep learning model according to the business data, and it can configure the initialization parameters of the program, such as the name of the deep learning model, the path of the deep learning model, the maximum batch number, the input meta information (i.e. first meta information), the output meta information (i.e. second meta information), and so on.

The second process is also called a back-end process and is used as an entrance and exit of the deep learning model and is responsible for scheduling and integrating the business data to the deep learning model and target data inferred by the deep learning model.

The third process is also called a model process and is responsible for loading and starting the deep learning model and calling the deep learning model to perform reasoning.

The fourth process, also called context process, is responsible for maintaining the communication between the first process and the third process in each set.

Further, the first process and the fourth process communicate with each other in a queue manner, and the second process and the fourth process communicate with each other in a queue manner, so that the first process and the second process communicate with each other in a transfer manner through the fourth process, and the first process and the third process communicate with each other in a memory sharing manner and exchange service data and data of target data.

Under the condition that the formats of the service data and the target data are image data, video data, audio data and the like, the sizes of the service data and the target data are larger, if the service data and the target data are exchanged in a queue mode, higher delay can be generated, the service data and the target data are exchanged in a memory sharing mode, data such as addresses and meta information are transmitted in the queue mode, the service data and the target data are not exchanged directly in the queue mode, the data quantity of the data transmitted in the queue is small, the influence on the transmission performance of the queue is avoided, the shared memory is fully utilized, the resource waste caused by the fact that the memory is occupied but the data are not stored is avoided, and the delay is reduced.

The first process, the second process and the fourth process can be deployed in a processor (such as a CPU) with a logic processing function, and the third process can be deployed in a processor (such as a GPU) supporting batch processing, so that the running efficiency of each process is improved.

The first process registers its own information, e.g., ID, name of responsible deep learning model, path of responsible deep learning model, etc., at the fourth process after it is created.

As shown in fig. 1, the method includes:

step 101, the first process receives a request for calling a deep learning model, and transmits service data in the request to the second process.

According to different services, the scheduling system can provide service services for users in the intranet and also provide service services for users in the extranet, so that the first process can receive requests for calling the service services, which are sent by service modules located in the intranet and/or clients located in the extranet, and the service services are provided by one or more deep learning models, which is equivalent to receiving the requests for calling the one or more deep learning models.

In practical application, the first process and the third process allocated to the deep learning model are generally in a one-to-one correspondence relationship, the quantity of the first process and the third process allocated to the deep learning model is positively correlated with the busy degree of business service in charge of the deep learning model, the quantity of the first process and the third process allocated to the deep learning model can be manually set by technical personnel according to the busy degree of the business service, and the busy degree can also be represented by indexes such as resource occupancy rate, calling frequency and the like of the deep learning model in real time, so that the quantity of the first process and the third process allocated to the deep learning model is dynamically adjusted according to the busy degree (index), new first process and third process are dynamically allocated to the deep learning model when the deep learning model is busy, and the first process and the third process allocated to the deep learning model are released when the deep learning model is idle.

For some deep learning models with idle business services, a first process and a third process can be allocated to the deep learning models.

For some deep learning models with busy traffic service, at least two second processes and at least two third processes can be allocated to the deep learning models.

If the deep learning model is assigned a first process, the first process may directly receive a request to invoke the deep learning model.

If the same deep learning model allocates at least two first processes, the at least two first processes may receive requests for invoking the deep learning model in a load balancing manner.

In a load balancing mode, a sequencing order is set between at least two first processes, and the at least two first processes can receive requests for calling the deep learning model in turn according to the sequencing order.

For example, a deep learning model provides a service for content auditing, a service module responsible for content publishing logic is provided in an intranet, the deep learning model is allocated with a first process _1, a first process _2 and a first process _3, a client a publishes a short video, the service module sends a request _1 for invoking content auditing to the client a, the first process _1 receives the request _1, a client B publishes a short video, the service module sends a request _2 for invoking content auditing to the client B, the first process _2 receives the request _2, the client C publishes a short video, the service module sends a request _3 for invoking content auditing to the client C, the first process _3 receives the request _3, the client D publishes a short video, the service module sends a request _4 for invoking content auditing to the client D, the first process _1 receives the request _4, and so on.

The request for invoking the deep learning model contains business data, and for different business scenarios, the business data is a data object with characteristics in the business scenario.

Further, the format of the service data may include text data, image data, video data, audio data, point cloud data, and the like, the form of presentation of the service data is related to the service scene, and the form of presentation of the service data in the same format in different service scenes is different.

In addition, for the same service data, the service data may be service data in a single format, or service data in multiple formats may be combined, which is not limited in this embodiment.

For example, in an automatic driving service scenario, the service data may be video data, audio data, and point cloud data acquired by a vehicle for its surrounding environment, the point cloud data may be projected into the video data to infer the semantics of the surrounding environment, and the audio data and the video data may infer the semantics of the surrounding environment separately or jointly, and so on.

For another example, in a business scene of photo entertainment, the business data may be a photo (i.e., image data) taken by a user using a mobile terminal such as a mobile phone, and the photo may be individually subjected to style migration.

For another example, in a business scenario of e-book reading and short video production, the business data may be a novel or a documentary (i.e., text data), and the novel or the documentary may be converted into voice data separately.

Then, as shown in fig. 2, the first process may read the business data from the request that invokes the deep learning model and transmit the business data to the second process.

In a specific implementation, as shown in fig. 2, a fourth process is commonly allocated to all sets (deep learning models), in the second process, at least one first thread is allocated to each deep learning model in the same set, and the first thread loads a part of code in charge of data transceiving logic in the second process, which is also called a data receiving processor.

Generally, the first threads and the first processes are in a one-to-one correspondence relationship, and therefore, the number of the first threads and the number of the first processes are generally the same.

The first process receives a request for calling the deep learning model, writes service data in the request into a Shared Memory (Shared Memory) established with the third process, and obtains a first address and first meta information.

Each process (e.g., the first process and the third process) has a Process Control Block (PCB) and an Address Space (Address Space) belonging to the process, and has a page table corresponding to the process, and is responsible for mapping a virtual Address and a physical Address of the process and managing the process through a Memory Management Unit (MMU). Two different virtual addresses are mapped to the same region of physical space through a page table, and the region of the block to which they point shares memory.

Shared memory allows two unrelated processes (i.e., the first process, the third process) to access the same logical memory, and thus, shared memory is a way to share and transfer data (e.g., business data, target data) between two running processes (i.e., the first process, the third process).

The first meta information is data describing the service data stored in the first address, such as a directory tree structure, a mapping relationship between a file, a data block and a copy storage place, and the like.

The first process writes the first address and the first meta-information into a first queue established by the first process and a fourth process, wherein the first queue is a bidirectional queue, namely the first process and the fourth process can operate.

And the fourth process reads the first address and the first meta information from the first queue and queries the second process distributed for the deep learning model.

If the second process is inquired, the fourth process writes the first address and the first meta-information into a second queue established by the fourth process and the first thread in the second process, wherein the second queue is a bidirectional queue, namely the fourth process and the first thread in the second process can both operate.

And if the second process is not inquired, the fourth process creates the second process for the first process, and writes the first address and the first meta information into a second queue established by the fourth process and the first process in the second process.

The fourth process is provided for managing communication between the first process and the second process, and the scheduling flexibility can be improved under the condition that a plurality of deep learning models are deployed.

At this time, the first thread in the second process reads the first address and the first meta information from the second queue.

And 102, when the business data are accumulated to meet the preset condition, the second process merges all the accumulated business data into source data of one batch and transmits the source data to the third process.

Generally, as shown in fig. 2, the second process faces multiple first processes, and after receiving the service data transmitted by the first process, the second process does not immediately forward to the third process, but caches and accumulates the service data, and when the accumulated service data meets a preset condition, merges all the accumulated service data into a batch of source data, and transmits the source data to the third process.

Illustratively, the condition may include the number of accumulations reaching a threshold, the time of accumulation reaching a threshold, and so forth.

In a specific implementation, as shown in fig. 2, a second thread is also allocated to all deep learning models in the same set in the second process, and the second thread loads part of the code in the second process, which is responsible for merging the batch logic, also called a data batch processor.

If the first process transmits the first address and the first meta information of the service data to the second process, but the service data is not the service data itself, the first thread in the second process may configure an identifier for the service data received by the first thread, so as to establish a mapping relationship for the service data, and transmit the mapping relationship to the second thread.

The mapping relation comprises a first address, first meta information and an identifier of the same service data, and the identifier is used for marking unique information of the service data so as to distinguish the service data in the subsequent processing process and distinguish target data obtained by a deep learning model according to inference of the service data.

Generally, in the second process, the second thread faces a plurality of first threads, when the second thread receives a mapping relationship transmitted by a certain first thread, the second thread accumulates the mapping relationship, when the number of the mapping relationship is accumulated to a preset threshold, all the currently accumulated mapping relationships are packed into a batch of source data, the source data is written into a third queue established with a third thread in the second process, and at this time, the second thread accumulates the mapping relationship again.

The third queue is a bidirectional queue, that is, a queue in which both the second process and the at least one third process are operable.

At this time, each third process reads the source data from the third queue, reads the mapping relationship (the first address, the first meta information, and the identifier) responsible for the source data from the source data, and reads the data itself of the service data from the first address of the shared memory according to the first meta information.

And 103, calling a deep learning model to process the source data in batch by the third process to obtain target data.

As shown in fig. 2, after the third process receives the source data, the deep learning model loaded to the third process is called in parallel according to a mechanism of parallel computing of the processor to process the source data in batch, so as to obtain the target data.

Taking GPU as an example of a processor, parallel computation is a computation mode that decomposes a specific computation into independent smaller computations that can be performed simultaneously, and recombines or synchronizes the results of the smaller computations to form the result of the original larger computation.

The number of tasks into which a larger task may be broken down depends on the number of cores contained on the particular hardware. Cores are the units that actually perform the computations in a given processor, a CPU typically has 4, 8, or 16 cores, while a GPU may typically have thousands, making the GPU suitable for parallel computations, completing tasks in parallel.

Since the deep learning model is highly parallel (embrarossingly parallel), it is suitable for parallel computation, while the neural network is highly parallel. In parallel computing, a highly parallel task refers to a task that divides an entire task into a set of smaller tasks to be computed in parallel, and a highly parallel task is a set of tasks in which small tasks are independent of each other. Because the deep learning model is highly parallel, computations made using the deep learning model in many business scenarios can be decomposed into smaller computations, so that small sets of computations do not depend on each other.

In one case, one deep learning model is provided in one set, that is, one deep learning model independently provides a complete business service, and at this time, each third process independently inputs business data into the loaded deep learning model for processing, and outputs target data.

In another case, a plurality of deep learning models are provided in one set, that is, the deep learning models cooperate to provide a complete business service, at this time, the dependency relationship existing among the deep learning models is determined in each third process in the same set, and the deep learning models are sequentially called according to the dependency relationship to process the source data in batch to obtain the target data.

In the dependency relationship, if the current deep learning model is depended on by other deep learning models, that is, other deep learning models depend on the current deep learning model, data output by the current deep learning model is data input into other deep learning models, and initially, business data is input into a deep learning model which does not depend on any deep learning model for processing.

For example, in a business scene of automatic driving, two deep learning models simultaneously provide complete traffic light detection service, wherein one deep learning model identifies traffic lights in image data and is recorded as a traffic light identification network, the other deep learning model identifies countdown in the traffic lights and is recorded as a countdown identification network, the traffic light identification network is depended on by the countdown identification network, namely the countdown identification network is depended on the traffic light identification network, then, initially, the image data is input into the traffic light identification network for processing, an area where the traffic lights in the image data are located is output, an area where the traffic lights are located is input into the countdown identification network for processing, and countdown (target data) in the traffic lights is output.

Example two

Fig. 3 is a flowchart of a service processing method provided in the second embodiment of the present application, where this embodiment adds a process of returning a result based on the second embodiment. As shown in fig. 3, the method includes:

and 301, the first process receives a request for calling the deep learning model and transmits service data in the request to the second process.

Step 302, when the service data are accumulated until the preset condition is met, the second process merges all the accumulated service data into a batch of source data, and transmits the source data to the third process.

And step 303, calling a deep learning model to process the source data in batch by the third process to obtain target data.

Step 304, the third process transmits the target data to the second process.

As shown in fig. 2, when the deep learning model completes the processing and outputs the target data, the third process transfers the target data to the second process.

In one case, there is one deep learning model in a set, i.e., one deep learning model independently provides one complete business service, at which time each third process independently transfers the target data to the second process.

In another case, there are multiple deep learning models in a set, that is, multiple deep learning models cooperate to provide a complete business service, in this case, each third process in the same set determines the dependency existing among the multiple deep learning models, and the third process at the end of the dependency transmits the target data to the second process.

In a specific implementation, as shown in fig. 2, in the second process, a third thread is allocated to all deep learning models in the same set, and the third thread loads a part of codes in charge of result processing logic in the second process, which is also called a data result processor.

Then, the third process may write the target data into the shared memory established by the third process and the first process, and obtain the second address and the second meta information.

Wherein the first meta information is data for describing the target data stored in the second address, such as a directory tree structure, a mapping relation of files, data blocks and copy storage places, and the like.

And the third process inquires the identifier corresponding to the service data cached locally, and writes the second address, the second element information and the identifier into a third queue established by the third process and the second process.

At this time, the third thread in the second process reads the second address, the second meta information, and the identifier from the third queue.

And 305, the second process transmits the target data to the first process for outputting.

As shown in fig. 2, the second process transmits the target data to the first process providing the service data, and the first process may output the target data to a service module of the intranet or a client of the extranet when receiving the target data.

In a specific implementation, as shown in fig. 2, in the second process, at least one first thread is also allocated to each deep learning model in the same set.

And the third thread in the second process transmits the second address and the second element information to the first thread according to the identifier, namely, the third thread queries the first thread providing the identifier, transmits the second address and the second element information to the first thread, and the first thread writes the second address and the second element information into a second queue established by the first thread and a fourth process in the second process.

At this time, the fourth process reads the second address and the second meta-information from the second queue, and writes the second address and the second meta-information into the first queue established by the fourth process and the first process.

And the first process reads the second address and the second element information from the first queue, reads the target data in the second address of the shared memory according to the second element information and outputs the target data to a service module of the intranet or a client of the extranet.

In this embodiment, the third process transmits the target data inferred by the deep learning model to the second process, and the second process transmits the target data to the first process for output, so as to ensure normal execution of the service.

EXAMPLE III

Fig. 4 is a schematic structural diagram of a service processing apparatus according to a third embodiment of the present application. As shown in fig. 3, the apparatus includes at least one first process 401, one second process 402, and at least one third process 403; each deep learning model is distributed with at least one first process, all the deep learning models are distributed with one second process, and each deep learning model is loaded into at least one third process;

the first process 401 is configured to receive a request for invoking the deep learning model, and transmit service data in the request to the second process 402;

the second process 402 is configured to, when the service data are accumulated until a preset condition is met, merge all the accumulated service data into a batch of source data, and transmit the source data to the third process 403;

the third process 403 is configured to invoke the deep learning model to batch process the source data, so as to obtain target data.

In an embodiment of the present application, the apparatus further includes a fourth process commonly assigned to all the deep learning models, and the second process 402 assigns at least one first thread to each of the deep learning models;

the first process 401 is further configured to receive a request for calling the deep learning model, write service data in the request into a shared memory established with the third process, obtain a first address and first meta-information, and write the first address and the first meta-information into a first queue established with the fourth process;

the fourth process is configured to read the first address and the first meta-information from the first queue, query the second process allocated to the deep learning model, and write the first address and the first meta-information into a second queue established with the first thread in the second process;

the first thread is used for reading the first address and the first meta-information from the second queue.

In an embodiment of the application, the first process 401 is further configured to receive a request for invoking the deep learning model in a load balancing manner if the deep learning model allocates at least two first processes.

In an embodiment of the present application, the second process 402 further assigns a second thread to all the deep learning models;

the first thread is further configured to configure an identifier for the service data, and transmit a mapping relationship to the second thread, where the mapping relationship includes the first address, the first meta information, and the identifier;

the second thread is configured to, when the number of the mapping relationships is accumulated to a preset threshold, pack all the currently accumulated mapping relationships into a batch of source data, and write the source data into a third queue established with the third thread;

the third process 403 is further configured to read the source data from the third queue, and read the service data in the first address of the shared memory according to the first meta information.

In an embodiment of the present application, each of the third processes 403 is further configured to independently input the service data into the deep learning model for processing, and output target data;

or

Each third process 403 is further configured to determine a dependency relationship existing between the multiple deep learning models, and sequentially invoke the multiple deep learning models to batch process the source data according to the dependency relationship, so as to obtain target data;

in the dependency relationship, if the current deep learning model is depended on by other deep learning models, the data output by the current deep learning model is the data input into other deep learning models.

In an embodiment of the present application, the third process 403 is further configured to transmit the target data to the second process 402;

the second process 402 is further configured to transmit the target data to the first process for output.

In an embodiment of the present application, the second process 402 assigns a third thread to all the deep learning models;

the third process 403 is further configured to write the target data into a shared memory established with the first process, obtain a second address and second meta information, query an identifier corresponding to the service data, and write the second address, the second meta information, and the identifier into a third queue established with the second process;

the third thread is configured to read the second address, the second meta information, and the identifier from the third queue.

In an embodiment of the present application, the apparatus further includes a fourth process commonly assigned to all the deep learning models, and at least one first thread is further assigned to each of the deep learning models in the second process;

the third thread is further configured to transmit the second address and the second meta information to the first thread according to the identifier;

the first thread is used for writing the second address and the second element information into a second queue established with the fourth process;

the fourth process is configured to read the second address and the second meta-information from the second queue, and write the second address and the second meta-information into a first queue established with the first process;

the first process 401 is further configured to read the second address and the second meta information from the first queue, read the target data in the second address of the shared memory according to the second meta information, and output the target data.

The service processing device provided by the embodiment of the application can execute the service processing method provided by any embodiment of the application, and has the corresponding functional module and beneficial effect of executing the service processing method.

Example four

Fig. 5 shows a schematic structural diagram of a service processing device 10 that can be used to implement an embodiment of the present application. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the service processing device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the service processing device 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

A number of components in the service processing device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the service processing device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. Processor 11 performs the various methods and processes described above, such as the business process methods.

In some embodiments, the business process method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed on the business processing device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the business process method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the traffic processing method by any other suitable means (e.g. by means of firmware).

EXAMPLE five

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the service processing method provided in any embodiment of the present application.

Computer program product in implementing the present application, computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solution of the present application can be achieved, and the present invention is not limited thereto.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A business processing method, wherein each deep learning model is assigned to at least one first process, all deep learning models are assigned to a second process, and each deep learning model is loaded to at least one third process, the method comprising:

2. The method according to claim 1, wherein at least one first thread is assigned to each deep learning model in the second process, and a fourth process is assigned to all the deep learning models;

the first process receives a request for calling the deep learning model, and transmits service data in the request to the second process, wherein the process comprises the following steps:

the first process receives a request for calling the deep learning model, writes service data in the request into a shared memory established with the third process to obtain a first address and first meta-information, and writes the first address and the first meta-information into a first queue established with the fourth process;

a fourth process reads the first address and the first meta-information from the first queue, queries the second process distributed for the deep learning model, and writes the first address and the first meta-information into a second queue established with the first thread in the second process;

the first thread reads the first address and the first meta-information from the second queue.

3. The method of claim 1, wherein the first process receives a request to invoke the deep learning model, comprising:

and if the deep learning model is distributed with at least two first processes, the first processes receive a request for calling the deep learning model in a load balancing mode.

4. The method of claim 2, wherein the second process further assigns a second thread to all of the deep learning models; when the second process accumulates the service data until a preset condition is met, merging all the accumulated service data into source data of one batch, and transmitting the source data to the third process, where the merging includes:

the first thread configures an identifier for the service data, and transmits a mapping relation to the second thread, wherein the mapping relation comprises the first address, the first meta information and the identifier;

when the number of the mapping relations is accumulated to a preset threshold value, the second thread packs all the currently accumulated mapping relations into a batch of source data and writes the source data into a third queue established with the third thread;

and the third process reads the source data from the third queue, and reads the service data in the first address of the shared memory according to the first meta-information.

5. The method of claim 4, wherein the calling the deep learning model by the respective third processes batches the source data to obtain target data, comprises:

each third process independently inputs the service data into the deep learning model for processing and outputs target data;

or alternatively

Determining a dependency relationship existing among the deep learning models by each third process, and calling the deep learning models in sequence according to the dependency relationship to process the source data in batch to obtain target data;

6. The method according to any one of claims 1-5, further comprising:

the third process transmitting the target data to the second process;

and the second process transmits the target data to the first process for outputting.

7. The method of claim 6, wherein a third thread is assigned to all the deep learning models in the second process;

the third process transmitting the target data to the second process, including:

the third process writes the target data into a shared memory established with the first process to obtain a second address and second element information, inquires an identifier corresponding to the service data, and writes the second address, the second element information and the identifier into a third queue established with the second process;

the third thread reads the second address, the second meta information, and the identification from the third queue.

8. The method according to claim 7, wherein all the deep learning models are jointly assigned to a fourth process, and the second process is further assigned to at least one first thread for each of the deep learning models; the second process transmits the target data to the first process for outputting, and the method comprises the following steps:

the third thread transmits the second address and the second element information to the first thread according to the identification;

the first thread writes the second address and the second element information into a second queue established with the fourth process;

the fourth process reads the second address and the second element information from the second queue and writes the second address and the second element information into a first queue established with the first process;

and the first process reads the second address and the second element information from the first queue, reads the target data in the second address of the shared memory according to the second element information, and outputs the target data.

9. A service processing apparatus, comprising at least one first process, one second process and at least one third process; the deep learning models are distributed with at least one first process, all the deep learning models are distributed with one second process, and all the deep learning models are loaded into at least one third process;

10. A service processing device, characterized in that the service processing device comprises:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the traffic processing method of any of claims 1-8.

11. A computer-readable storage medium, characterized in that it stores a computer program for causing a processor to implement a method of traffic processing according to any of claims 1-8 when executed.

12. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, implements a business process method according to any one of claims 1-8.