CN116974654A

CN116974654A - Image data processing method and device, electronic equipment and storage medium

Info

Publication number: CN116974654A
Application number: CN202311228987.8A
Authority: CN
Inventors: 殷俊; 黄鹏; 岑鑫; 虞响; 钱康; 李琦; 郭佳
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2023-09-21
Filing date: 2023-09-21
Publication date: 2023-10-31
Anticipated expiration: 2043-09-21
Also published as: CN116974654B

Abstract

The application discloses a processing method, a device, electronic equipment and a storage medium of image data, and belongs to the technical field of image processing. In this way, the data flow relations among a plurality of computing nodes after distributed deployment of the large model are obtained through the configuration file, the data flow relations are packaged into relatively independent strategy nodes, model reasoning calculation is completed by means of strategy node scheduling computing nodes, the large model can be rapidly deployed and applied at the equipment end no matter how the large model is split or what the image processing task of the large model is.

Description

Image data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and apparatus for processing image data, an electronic device, and a storage medium.

Background

The technical field of image processing often uses a large model, and because the large model has huge parameters, the large model cannot be simply deployed on a single card during reasoning calculation, so distributed reasoning is usually adopted, and the distributed reasoning refers to the reasoning calculation by adopting multiple cards in the reasoning stage.

Compared with single-card reasoning, distributed reasoning needs to split a large model into network structures with smaller units, each split network structure is deployed on a computing node, and then the computing node is used for bearing the computing task of the corresponding network structure. In order to enable all computing nodes to calculate correctly and realize the image processing task of the large model, a deployment program is required to be developed to specify the data transfer relationship among the computing nodes, and as the data transfer relationship among the computing nodes is different in each splitting mode, the management and the transfer of the data after each large model is split are also different, so that the splitting mode change of the large model or the image processing task change of the large model is determined, and the corresponding deployment program is required to be deployed and developed again. In the related art, a large model is repeatedly deployed and corresponding deployment programs are developed, and the model deployment and the deployment programs are strongly coupled. Therefore, the deployment time cost is high, and the rapid deployment application of different large models at the equipment end cannot be supported rapidly and flexibly.

Disclosure of Invention

The embodiment of the application provides a processing method and device of image data, electronic equipment and a storage medium, which are used for solving the problem that a large model in the related technology cannot be rapidly and flexibly deployed and applied at the equipment end.

In a first aspect, an embodiment of the present application provides a method for processing image data, where each network layer of a large model is distributed and deployed on a plurality of computing nodes, and the method is applied to a master control device, and includes:

acquiring a configuration file of the large model, wherein the configuration file is preconfigured according to the data flow requirements among computing nodes;

analyzing the configuration file to obtain a data flow relation among the computing nodes, wherein the data flow relation comprises a data flow rule and a data flow opportunity;

generating a plurality of strategy nodes for connecting a network layer based on the data flow relation;

and scheduling the plurality of computing nodes by utilizing the plurality of strategy nodes, and carrying out reasoning calculation on the input image data to obtain a model output result.

In some embodiments, if any data flow rule is that the computing node corresponding to the first network layer and the computing node corresponding to the second network layer process different batches of data in parallel, the corresponding computing nodes are scheduled according to the following steps:

Indicating a prepositive strategy node of the first network layer to distribute current batch data to each computing node corresponding to the first network layer for computing;

after the calculation of the current batch of data is completed, the front strategy node of the first network layer is instructed to distribute the next batch of data to each calculation node corresponding to the first network layer, and the front strategy node of the second network layer is instructed to process the calculation result of the current batch of data.

In some embodiments, if any data flow rule is that the computing nodes corresponding to the first network layer process different batches of data in parallel, the corresponding computing nodes are scheduled according to the following steps:

and if any computing node corresponding to the first network layer completes the computation, indicating the front strategy node of the first network layer to continue distributing the next batch of data to the computing node, and indicating the front strategy node of the second network layer to process the computing result of the computing node.

In some embodiments, if the data flow timing between the first network layer and the second network layer is synchronous flow, the calculation result sent by each calculation node corresponding to the first network layer is cached until it is determined that the calculation results of all calculation nodes corresponding to the first network layer are received, and the pre-policy node of the second network layer is instructed to process the calculation results of all calculation nodes.

In some embodiments, if the data flow timing between the first network layer and the second network layer is asynchronous flow, each time a computing node corresponding to the first network layer completes computation, a pre-policy node of the second network layer is instructed to process a computation result of the computing node.

In a second aspect, an embodiment of the present application provides an apparatus for processing image data, where each network layer of a large model is distributed and deployed on a plurality of computing nodes, where the apparatus is applied to a master control device, including:

the acquisition module is used for acquiring a configuration file of the large model, wherein the configuration file is preconfigured according to the data flow requirements among the computing nodes;

the analysis module is used for analyzing the configuration file to obtain a data flow relation among the computing nodes, wherein the data flow relation comprises a data flow rule and a data flow opportunity;

the generation module is used for generating a plurality of strategy nodes for connecting a network layer based on the data flow relation;

and the scheduling module is used for scheduling the plurality of computing nodes by utilizing the plurality of strategy nodes, and carrying out reasoning calculation on the input image data to obtain a model output result.

In some embodiments, if any data flow rule is that the computing node corresponding to the first network layer and the computing node corresponding to the second network layer process different batches of data in parallel, the scheduling module is specifically configured to schedule the corresponding computing node according to the following steps:

In some embodiments, if any data flow rule is that the computing nodes corresponding to the first network layer process different batches of data in parallel, the scheduling module is specifically configured to schedule the corresponding computing nodes according to the following steps:

In a third aspect, an embodiment of the present application provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores a computer program executable by at least one processor to enable the at least one processor to perform the above-described image data processing method.

In a fourth aspect, an embodiment of the present application provides a storage medium, which when executed by a processor of an electronic device, is capable of executing the above-described image data processing method.

In the embodiment of the application, each network layer of the large model is distributed and deployed on a plurality of computing nodes, configuration files are configured in advance according to data flow requirements among the computing nodes, then, the configuration files are analyzed to obtain data flow relations among the computing nodes, a plurality of strategy nodes for connecting the network layers are generated based on the data flow relations, each computing node is further scheduled by the plurality of strategy nodes, and the input image data is subjected to reasoning calculation to obtain a model output result, wherein the data flow relations comprise data flow rules and data flow opportunities. In this way, the data flow relation among a plurality of calculation nodes after distributed deployment of the large model is obtained in a configuration file mode, the data flow relation is packaged into relatively independent strategy nodes, the calculation nodes are scheduled to complete model reasoning calculation by means of each strategy node, the large model can be rapidly and flexibly deployed no matter how the large model is split or what the image processing task of the large model is, and the large model can be rapidly deployed and applied at the equipment end.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic diagram of a data path according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an image data processing architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of parallel processing of a large model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of parallel processing of yet another large model according to an embodiment of the present application;

fig. 5 is a schematic diagram of data result circulation acquisition between two layers of networks according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of executing a data result obtaining policy by each policy node according to an embodiment of the present application;

fig. 7 is a flowchart of a method for processing image data according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an apparatus for processing image data according to an embodiment of the present application;

fig. 9 is a schematic hardware structure of an electronic device for implementing a processing method of image data according to an embodiment of the present application.

Detailed Description

In order to solve the problem that different large models in the related art cannot be rapidly and flexibly deployed on an equipment side, the embodiment of the application provides a processing method and device of image data, electronic equipment and a storage medium.

The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it being understood that the preferred embodiments described herein are for illustration and explanation only, and not for limitation of the present application, and embodiments of the present application and features of the embodiments may be combined with each other without conflict.

In order to facilitate understanding of the present application, the present application relates to the technical terms:

the card refers to a chip with certain operation capability.

And the computing nodes are divided into a plurality of small network structures after the large model is split, and the computing nodes which bear the computing task of each small network structure are the computing nodes, and one computing node is an instantiated computing object. Because the computational complexity of each small network structure is different, one small network structure can deploy one, two or more computational nodes, and because of the different computational capabilities of a single card, one small network structure's computational nodes can deploy onto one, two or more cards.

The large model refers to an image network model in which each network layer is distributed and deployed on a plurality of computing nodes, the computing tasks of the large model are jointly completed by the plurality of computing nodes, and the data flow relation (comprising data flow rules and data flow time) among the computing nodes is determined by the image processing task and the splitting mode of one large model, namely, the data flow relation among the computing nodes can be different according to the image processing task of the large model and can be also different according to the structural splitting mode of the large model.

The strategy nodes are strategy objects generated according to the data flow relation among all the calculation nodes after the large model distributed deployment. There is typically a policy node between adjacent network layers, except for the first network layer of the large model, each network layer has a front policy node and a back policy node, and the back policy node of one network layer is the front policy node of the next adjacent network layer. The first network layer of the large model has only one pre-policy node.

In general, the inference network structure of the large model can be understood as a plurality of interconnected computing nodes in an abstract way, from an input end of initial image data to an output end of a final model inference result, a line connecting the computing nodes can be called a data path, and the image data and the computing results inferred by the computing nodes in the large model network all circulate through the data path.

Fig. 1 is a schematic diagram of a data path, i.e. a data path, from an input to an output by means of directional arrows. Generally, policy nodes are inserted between different network layers, the input requirements of the post-computation nodes can be written in the policy nodes, and after policy operations such as integration, circulation, waiting and the like are performed on data of the pre-computation nodes according to the requirements, the post-computation nodes are notified and controlled to acquire, and the output results of the post-computation nodes can be stored and managed for the post-computation nodes. In addition, the policy nodes can communicate with each other, such as interaction data results, that is, the calculation nodes arranged at the rear position on one data path can obtain the calculation results of the front calculation nodes.

Fig. 2 is a schematic diagram of an image data processing architecture according to an embodiment of the present application, including a main control device and 3 cards: card 1, card 2, card 3. The network layers of the large model are distributed across multiple compute nodes running on cards 1, 2 and 3. The main program of the main control equipment analyzes the configuration file of the large model when running (generally, each computing node has some requirements such as format requirements, synchronization requirements and the like on input data, the requirements can be written in the configuration file, each computing node can write the data synchronization requirements of different nodes in the front in the configuration file) to perform strategy node initialization generation, and then, each computing node can be scheduled to perform reasoning calculation on the image data input into the large model through a plurality of strategy nodes, so that a model output result is obtained.

In the embodiment of the application, the data management requirement and the data circulation requirement of the large model can be configured in the configuration file, when the large model is changed, the related information in the configuration file is changed, the deployment of the large model on a plurality of cards is not required to be continuously adjusted due to the change of the large model, and a general model deployment scheme insensitive to the model deployment mode and the model structure is provided, so that the large model can be deployed in a quick landing way.

In order to flexibly adapt to different scene services and improve the reasoning efficiency of a large model, the embodiment of the application also provides two parallel strategies of large model distributed reasoning, and in order to flexibly synchronize data among different network layers, the embodiment of the application also provides two strategies of obtaining data results.

In the following, two parallel strategies of large model distributed reasoning are introduced.

1. Pipelined parallelism

As shown in fig. 3, the network layer of the whole large model is divided longitudinally into A, B, C three network layers, wherein the a network layer is deployed on the computing node 1-3, the B network layer is deployed on the computing node 4-6, the C network layer is deployed on the computing node 7-9, and a, B, C, d is a policy node. A. B, C, wherein the three network layers can be deployed on different reasoning cards, A, B, C have respective input data requirements and output data formats, the requirements can be configured in a configuration file, the policy node a, b, c, d is generated according to configuration information in the configuration file, and the corresponding reasoning work can be completed by providing input data to the corresponding network layer through the policy node a, b, c, d. Specifically, if two batches of data are fed from the input end, the card of the network layer A can start to process the second batch of data after the first batch of data is processed, and the reasoning of the first batch of data is completed without waiting for the whole large model. The parallel strategy realizes the pipelining parallel of different batches of data by different network layers.

For example, the image data is input from the main control device, the main control device instructs the policy node a to distribute the first batch of image data to the computing nodes 1-3 (assuming that the computing nodes 1-3 are all disposed on the card 1) for reasoning calculation, the card 1 sends a signal to the main control device after completing the processing of the first batch of data, and the main control device instructs the policy node b to perform subsequent processing. At this time, the second batch of image data can be sent from the main control equipment to the computing node 1-3 again through the policy node a for reasoning calculation, and the reasoning of the whole model (namely A, B, C three network layers) is completed without waiting for the first batch of data, so that the pipelining parallelism of different network layers to different batches of data is achieved.

In addition, if the network structures of different large models have the same network layer (or called middle sub-graph structure) after being segmented, the sharing use of resources can be realized, namely, the network layer in one reasoning card can support two different large models to carry out reasoning work.

2. Computing node parallelism

As shown in fig. 4, the large model includes A, B, C three network layers, a network layer is deployed on computing node 1-3, B network layer is deployed on computing node 4-6, C network layer is deployed on computing node 7-9, a, B, C, d is a policy node. The whole large model network can be divided according to the dimension of the computing nodes, the computing nodes can be deployed on different inference cards, after the computing nodes corresponding to the same network layer are respectively computed, the respective computing results can be integrated through the communication of the strategy nodes to obtain the final result of the network layer, or the computing results of the single computing node can be directly circulated backwards. I.e. the computing nodes corresponding to the network layer can process different batches of data in parallel. Under the parallel strategy, one computing node deployment can be instantiated to form a plurality of computing nodes according to the computing complexity of the computing nodes, namely, the computing nodes with high complexity are deployed on a plurality of inference cards, and then data are equally divided into at most Zhang Tuili cards for parallel processing, so that the problem of low efficiency caused by insufficient inference resources on a single card is solved.

Specifically, the image data is input from the main control device, the main control device instructs the policy node a to distribute the data to the computing nodes 1-3 corresponding to the a network layer for processing, and it is assumed that the computing nodes 1-3 corresponding to the a network layer are deployed on different cards (i.e. the computing nodes 1-3 are independent of each other), after each computing node completes processing of the first batch of data, a message can be sent to inform the main control device, then the main control device can instruct the policy node b to perform subsequent data circulation and distribution processing, at this time, the second batch of data can be continuously distributed from the policy node a to the computing node that has processed the first batch of data for performing inference computation, without waiting for other computing nodes corresponding to the a network layer to complete computation of the previous batch of data. That is, after the first batch of data is processed by the computing node 1, the processing conditions of the computing node 2 and the computing node 3 are not required to be concerned, and the next batch of data can be directly processed. This is the parallelism of the compute nodes.

Next, two data result acquisition strategies are described.

Fig. 5 is a schematic diagram of data result flow acquisition between two layers of networks, on the basis of which two data result acquisition strategies are described, wherein A, B is two network layers of a large model, an a network layer is deployed on a computing node 1-3, and a B network layer is deployed on a computing node 4-6.

1. Node data result synchronization strategy

The input data of each computing node corresponding to the B network layer is a data result after the A network layer completely reasoning is finished, namely, after all the computing nodes 1 to 3 corresponding to the A network layer need to process the same batch of data, the data are sent to a downstream strategy node, the downstream strategy node recombines the output data of the three computing nodes according to the network structure condition to obtain a final A layer computing result, and then the computing nodes corresponding to the B layer are informed to acquire the data result.

For example, if the B network layer needs the complete result of the a network layer as input, the policy node B will wait for the computing nodes 1 to 3 to output the computing results and then perform merging processing, and finally send the merged result data to the computing nodes 4 to 6 corresponding to the B network layer for processing.

2. Asynchronous policy for node data results

The input data of each computing node corresponding to the network layer B can be directly used for the data input of part of computing nodes after the reasoning of part of computing nodes is completed without the complete reasoning of the network layer A. For example, the computing node 4 corresponding to the B network layer only needs the computing result of the computing node 1 corresponding to the a network layer, and at this time, the policy node B may directly notify the computing node 4 corresponding to the B network layer to obtain the result after receiving the computing result obtained by reasoning of the computing node 1. As for the requirement of the computing nodes 5 and 6 for obtaining the results of the computing nodes 1 and 2, the requirement is independent from the requirement of the computing node 4 for obtaining the results of the computing node 1, and is an asynchronous strategy of node data.

For example, if the computing node 4 corresponding to the B network layer only needs the computing result of the computing node 1 corresponding to the a network layer, the policy node B may directly transmit the computing result of the computing node 1 to the computing node 4 for processing after receiving the computing result, without waiting for the results of the computing nodes 2 and 3.

In the embodiment of the application, each computing node can be informed by the front-end strategy node and then acquire the data result, then execute respective processing, and the strategy node can execute different data result acquisition strategies according to the configuration information of the rear-end computing node. Referring to fig. 6, fig. 6 is a schematic flow chart of executing a data result acquisition policy for each policy node, and specific steps are as follows:

step 601, receiving a calculation result of a single calculation node of a front layer;

step 602, determining a data result acquisition strategy between each computing node corresponding to the post layer and the computing node, if the data result acquisition strategy is an asynchronous strategy, directly jumping to step 607; if the synchronization strategy is the synchronization strategy, jumping to step 603;

step 603, judging whether all calculation results of the front layer calculation nodes required by the calculation node corresponding to the rear layer are aligned, and if not, jumping to step 604; if all are in order, jumping to step 605;

Step 604, waiting for the input of the calculation results of the required rest of the front layer calculation nodes, and returning to step 601;

step 605, integrating and reorganizing the calculation results of all calculation nodes of the front layer to finally generate a data result of the whole front layer;

step 606, notifying the data result of the whole front layer to the corresponding relevant computing node of the rear layer;

step 607, notifying the calculation result of the calculation node to the relevant calculation node corresponding to the post layer.

The embodiment of the application provides a data path suitable for data management and data transfer in large-model distributed reasoning, wherein the data management comprises the steps of converting a data format, integrating calculation results and the like, and the data transfer comprises the steps of indicating a data transfer path, data transfer time and the like, and also supports a parallel strategy of two large-model distributed reasoning and an acquisition strategy of two data results. The capabilities can be adaptively adjusted and supported through configuration files of the large model, deployment of the large model on multiple cards is not required to be continuously adjusted due to different characteristics of a split structure and the like of the large model, rapid and efficient deployment of the large model with different network structures can be supported, application landing cost of the large model with different structures is reduced, and distributed reasoning efficiency of the large model can be improved through two parallel strategies with different forms.

Fig. 7 is a flowchart of a method for processing image data, where each network layer of a large model is distributed and deployed on a plurality of computing nodes, and the method is applied to the master control device of fig. 1, and includes the following steps.

In step 701, a configuration file of a large model is obtained, wherein the configuration file is preconfigured according to data flow requirements between computing nodes.

Generally, the large model has different purposes (such as pedestrian detection, vehicle detection, etc.), and after being deployed on a plurality of computing nodes, the data flow requirements among the computing nodes will be different. Whichever requirement is configurable in advance in the configuration file.

In step 702, the configuration file is parsed to obtain a data flow relationship between computing nodes, where the data flow relationship includes a data flow rule and a data flow opportunity.

The data flow rules are used for indicating to whom the input image data or the calculation results among the calculation nodes are transmitted, whether to integrate, whether to perform format conversion, which format is converted, and the like, and the data flow time is used for indicating when the input image data or the calculation results among the calculation nodes are transmitted.

In step 703, a plurality of policy nodes for connecting to the network layer are generated based on the data flow relationship.

Here, the policy nodes located between the adjacent network layers may be generated according to the data flow relationship of the computing nodes corresponding to the adjacent network layers; and for the prepositive strategy node of the first network layer, generating according to the data flow relation of each calculation node corresponding to the first network layer.

In step 704, a plurality of calculation nodes are scheduled by using a plurality of policy nodes, and the input image data is subjected to inference calculation to obtain a model output result.

When the large model is used, the image data is input from the main control equipment, the reasoning calculation process of the image data is completed on the distributed deployment calculation nodes, the data flow and management work in the process are born by the strategy nodes, and finally, the main control equipment still outputs the model processing result.

In the embodiment of the application, each network layer of the large model is distributed and deployed on a plurality of computing nodes, the data flow relation among the computing nodes is obtained by a configuration file mode, the data flow rules are packaged into relatively independent strategy nodes, each computing node is scheduled by each strategy node to complete model reasoning calculation, no matter how the large model is split, no matter what the image processing task of the large model is, the large model can be rapidly and flexibly deployed, and the large model can be rapidly applied to equipment ends.

In some embodiments, if any data flow rule is that the computing node corresponding to the first network layer and the computing node corresponding to the second network layer process different batches of data in parallel, the corresponding computing nodes may be scheduled according to the following steps:

indicating a prepositive strategy node of the first network layer to distribute current batch data to each computing node corresponding to the first network layer for computing; after the calculation of the current batch of data is completed, the front strategy node of the first network layer is instructed to distribute the next batch of data to each calculation node corresponding to the first network layer, and the front strategy node of the second network layer is instructed to process the calculation result of the current batch of data.

When the first network layer is the first network layer, the first batch of data is partial image data, and when the network layer is not the first network layer, the first batch of data is intermediate data processed by at least one network layer. Typically, the first network layer and the second network layer are adjacent, although the first network layer and the second network layer may not be adjacent.

That is, one network layer can begin processing a second batch of data after processing the first batch of data without waiting until all network layers of the entire large model complete reasoning about the first batch of data. The parallel strategy realizes the pipelining parallel of different network layers to different batches of data, and can improve the reasoning efficiency of a large model.

In some embodiments, if any data flow rule is that the computing nodes corresponding to the first network layer process different batches of data in parallel, the corresponding computing nodes may be scheduled according to the following steps:

indicating a prepositive strategy node of the first network layer to distribute current batch data to each computing node corresponding to the first network layer for computing; if any computing node corresponding to the first network layer completes the computation, the front-end strategy node of the first network layer is instructed to continuously distribute the next batch of data to the computing nodes, and the front-end strategy node of the second network layer is instructed to process the computation result of the computing nodes.

Similarly, when the first network layer is the first network layer, the first batch of data is partial image data, and when the network layer is not the first network layer, the first batch of data is intermediate data of partial image data processed by at least one network layer. Typically, the first network layer and the second network layer are adjacent, although the first network layer and the second network layer may not be adjacent.

That is, the computing nodes corresponding to the same network layer can process different batches of data in parallel, so that the computing capability of the computing nodes is exerted to the maximum as much as possible, and the reasoning efficiency of the large model is improved.

The data transfer timing between the first network layer and the second network layer may be synchronous transfer or asynchronous transfer, regardless of the data transfer rules. And when the data flow time between the first network layer and the second network layer is synchronous flow, the calculation result sent by each calculation node corresponding to the first network layer can be cached until the calculation results of all calculation nodes corresponding to the first network layer are received, and the front strategy node of the second network layer is instructed to process the calculation results of all calculation nodes. When the data flow time between the first network layer and the second network layer is asynchronous flow, a calculation node corresponding to the first network layer can instruct a prepositive strategy node of the second layer network to process the calculation result of the calculation node every time the calculation node finishes calculation. Therefore, different data flow opportunities are supported, different service scenes are convenient to apply, and model reasoning efficiency is improved to the greatest extent on the premise of ensuring correct reasoning.

Based on the same technical conception, the embodiment of the application also provides an image data processing device, and the principle of solving the problem of the image data processing device is similar to that of the image data processing method, so that the implementation of the image data processing device can refer to the implementation of the image data processing method, and the repetition is omitted.

Fig. 8 is a schematic structural diagram of a processing device for image data according to an embodiment of the present application, which includes an obtaining module 801, an analyzing module 802, a generating module 803, and a scheduling module 804.

An obtaining module 801, configured to obtain a configuration file of the large model, where the configuration file is preconfigured according to a data flow requirement between computing nodes;

the parsing module 802 is configured to parse the configuration file to obtain a data flow relationship between computing nodes, where the data flow relationship includes a data flow rule and a data flow opportunity;

a generating module 803, configured to generate a plurality of policy nodes for connecting to a network layer based on the data flow relation;

the scheduling module 804 is configured to schedule the plurality of computing nodes by using the plurality of policy nodes, and perform inference computation on the input image data to obtain a model output result.

In some embodiments, if any data flow rule is that the computing node corresponding to the first network layer and the computing node corresponding to the second network layer process different batches of data in parallel, the scheduling module 804 is specifically configured to schedule the corresponding computing node according to the following steps:

In some embodiments, if any data flow rule is that the computing nodes corresponding to the first network layer process different batches of data in parallel, the scheduling module 804 is specifically configured to schedule the corresponding computing nodes according to the following steps:

In some embodiments, if the data flow timing between the first network layer and the second network layer is asynchronous flow, each time a computing node corresponding to the first network layer completes computation, a layer pre-policy node of the second network is instructed to process a computation result of the computing node.

The division of the modules in the embodiments of the present application is schematically only one logic function division, and there may be another division manner in actual implementation, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, or may exist separately and physically, or two or more modules may be integrated in one module. The coupling of the individual modules to each other may be achieved by means of interfaces which are typically electrical communication interfaces, but it is not excluded that they may be mechanical interfaces or other forms of interfaces. Thus, the modules illustrated as separate components may or may not be physically separate, may be located in one place, or may be distributed in different locations on the same or different devices. The integrated modules may be implemented in hardware or in software functional modules.

Having described the method and apparatus for processing image data according to an exemplary embodiment of the present application, next, an electronic device according to another exemplary embodiment of the present application is described.

An electronic device 130 implemented according to such an embodiment of the present application is described below with reference to fig. 9. The electronic device 130 shown in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present application.

As shown in fig. 9, the electronic device 130 is embodied in the form of a general-purpose electronic device. Components of electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 connecting the various system components, including the memory 132 and the processor 131.

Bus 133 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, and a local bus using any of a variety of bus architectures.

Memory 132 may include readable media in the form of volatile memory such as Random Access Memory (RAM) 1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), one or more devices that enable a user to interact with the electronic device 130, and/or any device (e.g., router, modem, etc.) that enables the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur through an input/output (I/O) interface 135. Also, electronic device 130 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 130, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

In an exemplary embodiment, there is also provided a storage medium, the electronic device being capable of executing the above-described image data processing method when a computer program in the storage medium is executed by a processor of the electronic device. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, the electronic device of the present application may include at least one processor, and a memory communicatively connected to the at least one processor, where the memory stores a computer program executable by the at least one processor, and the computer program when executed by the at least one processor may cause the at least one processor to perform the steps of any of the image data processing methods provided by the embodiments of the present application.

In an exemplary embodiment, a computer program product is also provided, which, when executed by an electronic device, is capable of carrying out any one of the exemplary methods provided by the application.

Also, a computer program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (EPROM), flash Memory, optical fiber, compact disc read-Only Memory (Compact Disk Read Only Memory, CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for processing image data in embodiments of the present application may take the form of a CD-ROM and include program code that can run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, radio Frequency (RF), etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In cases involving remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, such as a local area network (Local Area Network, LAN) or wide area network (Wide Area Network, WAN), or may be connected to an external computing device (e.g., connected over the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, the present application also includes such modifications and variations provided they come within the scope of the claims and their equivalents.

Claims

1. A method for processing image data, wherein each network layer of a large model is distributed and deployed on a plurality of computing nodes, the method being applied to a master control device and comprising:

2. The method of claim 1, wherein if any one of the data flow rules is that the computing node corresponding to the first network layer and the computing node corresponding to the second network layer process different batches of data in parallel, scheduling the corresponding computing node according to the following steps:

3. The method of claim 1, wherein if any of the data flow rules is that the computing nodes corresponding to the first network layer process different batches of data in parallel, scheduling the corresponding computing nodes according to the following steps:

4. The method of claim 2 or 3, wherein if the data flow timing between the first network layer and the second network layer is synchronous flow, the calculation result sent by each calculation node corresponding to the first network layer is cached until the calculation results of all calculation nodes corresponding to the first network layer are determined to be received, and the pre-policy node of the second network layer is instructed to process the calculation results of all calculation nodes.

5. A method according to claim 2 or 3, wherein if the data flow timing between the first network layer and the second network layer is asynchronous flow, each time a corresponding computing node of the first network layer completes computation, a pre-policy node of the second network layer is instructed to process the computation result of the computing node.

6. An image data processing apparatus, wherein each network layer of a large model is distributed and deployed on a plurality of computing nodes, the apparatus being applied to a master control device, comprising:

7. The apparatus of claim 6, wherein if any data flow rule is that the computing node corresponding to the first network layer and the computing node corresponding to the second network layer process different batches of data in parallel, the scheduling module is specifically configured to schedule the corresponding computing node according to the following steps:

8. The apparatus of claim 6, wherein if any data flow rule is that the computing nodes corresponding to the first network layer process different batches of data in parallel, the scheduling module is specifically configured to schedule the corresponding computing nodes according to the following steps:

9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein:

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

10. A storage medium, characterized in that a computer program in the storage medium, when executed by a processor of an electronic device, is capable of performing the method of any of claims 1-5.