CN117852573A

CN117852573A - Computing force execution system, operator computing flow management method, device, equipment and medium

Info

Publication number: CN117852573A
Application number: CN202410258729.2A
Authority: CN
Inventors: 孙德帅
Original assignee: Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Current assignee: Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date: 2024-03-07
Filing date: 2024-03-07
Publication date: 2024-04-09
Anticipated expiration: 2044-03-07
Also published as: CN117852573B

Abstract

The application relates to a computational power execution system, an operator computational flow management method, an operator computational flow management device, equipment and a medium, belonging to the technical field of artificial intelligence, wherein the computational power execution system comprises a deep learning module, an artificial intelligence computational power execution module and a hardware equipment module; the artificial intelligent computing power executing module is used for determining a second operator computing flow when the first operator computing flow is detected; the hardware equipment module is used for executing a task instruction corresponding to the second operator computation flow; the artificial intelligence computing power execution module comprises an initialization unit and an operator computing flow management unit; the initialization unit is used for initializing the artificial intelligence computing power execution module; the operator computation flow management unit is used for managing the second operator computation flow based on the first operator computation flow. The method and the device can ensure the consistency of the real operator execution time and the deep learning module, thereby avoiding the phenomenon of training gradient dispersion of the whole model, effectively reducing the system overhead and improving the operator calculation efficiency.

Description

Computing force execution system, operator computing flow management method, device, equipment and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a computing power execution system, an operator computing flow management method, an operator computing flow management device, an operator computing flow management apparatus, and a medium.

Background

Along with the continuous promotion of technical innovation in the artificial intelligence field, as the application scene facing the algorithm model in the deep learning is more and more complex, the facing hardware equipment is more and more diverse, the deployment scheme in the deep learning field is more and more diverse from the original single equipment and single deep learning frame, and the collocation is more abundant, so that the key requirements in the current industrial production are met by simultaneously meeting the rear ends of multiple deep learning frames and multiple computing chips under the same application scene.

Based on the above requirements, as shown in fig. 1, an intermediate AI (Artificial Intelligence ) computing force execution framework is constructed, a plurality of deep learning frameworks are upwards supported, downward compatibility is achieved by a plurality of hardware devices, and accordingly, the problem is how to ensure that each hardware device of each AI framework can perfectly operate through the AI computing force execution framework, wherein unification of the operator computing flows is one of the technical difficulties which must be solved, under the principle of not invading the deep learning frameworks, the AI computing force execution framework needs to maintain a set of operator computing flows by itself so as to be compatible with a plurality of deep learning frameworks, different deep learning frameworks realize the acceleration computing through the operator computing realized by the AI computing force execution framework and retrieve the acceleration computing library of a rear end chip, the AI computing force execution framework needs to maintain the operator computing flows of the AI computing force execution framework so as to ensure that the correct retrieval is achieved to the rear end acceleration computing library, and under the principle of not invading the deep learning frameworks, the self computing flows in the deep learning frameworks and the operator computing flows in the AI computing force execution framework are mutually independent, when the same force execution framework faces a plurality of deep learning frameworks, the situation of the self computing flows are managed by itself, the fact that the operator computing flows are mutually independent is completely, the fact that the result of the computing force is not completely consumed is completely or the process is completely, and the fact that the result of the computing is completely consumed by the fact that the computing force is completely is not completely or when the process is completely consumed.

Therefore, there is a need to propose a computing power execution system, a computing power flow management method, a computing power flow management device, a computing power flow management apparatus, a computing power flow management device and a computing power flow management medium capable of improving the accuracy of computing results of operators and the computing power execution framework.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a computing force execution system, an operator computing flow management method, an apparatus, a device, and a medium that can improve the accuracy of an operator computing result and the computing efficiency of a computing force execution framework.

In a first aspect, a computing force execution system is provided, the computing force execution system comprising a deep learning module, an artificial intelligence computing force execution module, and a hardware device module;

the artificial intelligence computing power execution module is connected with the deep learning module and is used for determining a second operator computing stream when the first operator computing stream is generated by the deep learning module;

the hardware equipment module is connected with the artificial intelligence computing power execution module and is used for executing a task instruction corresponding to the second operator computing flow;

the artificial intelligence computing power execution module comprises an initialization unit and an operator computing flow management unit;

the initialization unit is used for initializing the artificial intelligence computing power execution module;

The operator computation flow management unit is used for managing the second operator computation flow based on the first operator computation flow generated by the deep learning module.

Optionally, the operator computation flow management unit includes a flow pool initialization subunit, an operator computation flow acquisition subunit, an operator computation flow processing subunit, an operator computation flow destruction subunit and a flow pool;

the flow pool initialization subunit is used for initializing the flow pool;

the operator computation flow obtaining subunit is used for obtaining a second operator computation flow in the artificial intelligence computation power execution module;

the operator computation flow processing subunit is used for carrying out unified processing on a second operator computation flow in the artificial intelligent computation force execution module based on the first operator computation flow;

the operator computation flow destroying subunit is used for unbinding and destroying the operator computation flow;

the flow pool is used for storing flows corresponding to hardware equipment resources in the artificial intelligence computing power execution module.

Optionally, the flow pool includes flows corresponding to at least one hardware device resource in the artificial intelligence computing power execution module;

the preset size of the flow pool is N, and the first identification numbers of a plurality of flows in the flow pool are sequentially M to N-1, wherein N-1 is greater than or equal to M;

And the first identification number is consistent with a second identification number corresponding to the operator calculation flow in the operator calculation flow management unit.

In a second aspect, there is provided an operator computation flow management method, the method comprising:

acquiring a first operator computation flow in a deep learning module in response to detection of an operator computation task execution instruction of the deep learning module;

in response to detecting the first operator computing stream, acquiring a second operator computing stream in an artificial intelligent computing power execution module, and carrying out unified processing on the first operator computing stream and the second operator computing stream;

based on the unified processing result, whether to execute the operator calculation task is determined, so that management of the operator calculation flow is realized.

Optionally, the method for acquiring the first operator computation flow in the deep learning module includes:

obtaining a model file of a deep learning model in the deep learning module in response to detection of an operator calculation task execution instruction of the deep learning module;

and analyzing the model file of the deep learning model to obtain a first operator computation flow in the deep learning module.

Optionally, analyzing the model file of the deep learning model, and obtaining the first operator computation flow in the deep learning module includes:

Analyzing the model file of the deep learning model based on a compiler to obtain a target operator sequence;

sequencing the target operator sequences according to the execution sequence of the deep learning model corresponding to the target operator sequences to obtain the operator execution sequence of the operator calculation flow in the deep learning module;

and acquiring the first operator computation flow based on the operator execution sequence of the operator computation flow.

Optionally, before the obtaining the second operator computation flow in the artificial intelligence computation power execution module, the method further includes:

in response to detecting the first operator computation flow, detecting whether a target operator computation flow exists in the computing force execution module through an operator computation flow acquisition subunit;

in response to detecting that the target operator computation flow exists, defining the target operator computation flow as the second operator computation flow;

and in response to detecting that the target operator computing stream does not exist, constructing a stream pool by utilizing a stream pool initialization subunit, and initializing the stream pool to determine the second operator computing stream.

Optionally, the initializing the subunit with the flow pool to construct the flow pool includes:

Determining that the preset size of the flow pool is N based on a preset mapping table;

according to the preset size of the flow pool, applying for hardware equipment resources in a hardware equipment module, and distributing the hardware equipment resources to corresponding flows to generate the flow pool.

Optionally, after the flow pool is built by using the flow pool initialization subunit, initializing the flow pool includes:

carrying out identification numbers on the streams corresponding to the hardware equipment resources to generate first identification numbers, wherein the first identification numbers are sequentially M to N-1, N-1 is larger than or equal to M, and the first identification numbers are defined to be consistent with second identification numbers corresponding to operator calculation streams in the operator calculation stream management unit;

generating a mapping relation based on the target first identification number and the stream corresponding to the hardware equipment resource, and pre-storing the mapping relation in a constructed stream pool to finish the initialization processing of the stream pool.

Optionally, determining the second operator computing stream based on the stream pool initialization processing result includes:

obtaining mapping relations between a plurality of first identification numbers and streams of corresponding hardware equipment resources;

and selecting a stream of the target hardware equipment resource corresponding to the first identification number M in the mapping relation, and defining the stream of the target hardware equipment resource as the second operator computing stream.

Optionally, the obtaining the second operator computation flow in the artificial intelligence computation power execution module includes:

in response to detecting the first operator computation flow, acquiring a subunit through the operator computation flow to detect whether the flow pool is successfully initialized;

acquiring the second operator computing stream based on a stream pool initialization processing result in response to the fact that the stream pool initialization is successful;

and in response to detecting that the initialization of the flow pool is unsuccessful, initializing the flow pool by utilizing a flow pool initialization subunit to determine the second operator computing flow.

Optionally, before unifying the first operator computation flow and the second operator computation flow, the method further includes:

transmitting a second operator computation flow in the artificial intelligence computation power execution module to the deep learning module, and comparing the first operator computation flow with the second operator computation flow in the deep learning module;

based on the comparison result, determining whether to unify the first operator computation flow and the second operator computation flow.

Optionally, comparing the first operator computation flow with the second operator computation flow in the deep learning module includes:

Respectively acquiring a first memory address corresponding to the first operator computation flow and a second memory address corresponding to the second operator computation flow;

comparing the similarity between the first memory address and the second memory address;

and determining a target comparison result based on the similarity.

Optionally, determining whether to unify the first operator computation flow and the second operator computation flow based on the target comparison result includes:

in response to detecting that the similarity is greater than or equal to a first preset value, not performing unified processing on the first operator computing stream and the second operator computing stream;

and in response to detecting that the similarity is smaller than a first preset value, unifying the first operator computation flow and the second operator computation flow.

Optionally, the unifying the first operator computation flow and the second operator computation flow includes:

and in response to detecting that the similarity is smaller than a first preset value, replacing the second operator computing stream with the first operator computing stream by using an operator computing stream processing subunit as a target operator computing stream of the artificial intelligence computing force executing module.

Optionally, determining whether to execute the operator computing task based on the unified processing result includes:

and executing an operator computing task in response to detecting that the similarity between the first operator computing stream and the second operator computing stream is greater than or equal to a first preset value.

Optionally, after the operator computing task is performed, the method further includes:

the operator computing stream destroying subunit unbinds the deep learning module and the artificial intelligent computing force executing module, and destroys the operator computing stream and the stream pool of the artificial intelligent computing force executing module.

In another aspect, there is provided an operator computation flow management apparatus, the apparatus comprising:

the first operator computation flow acquisition module is used for acquiring a first operator computation flow in the deep learning module when an operator computation task execution instruction of the deep learning module is detected;

the unification processing module is used for acquiring a second operator computing stream in the artificial intelligent computing force execution module when the first operator computing stream is detected, and unifying the first operator computing stream and the second operator computing stream;

and the computation flow management module is used for determining whether to execute the operator computation task based on the unified processing result so as to realize the management of the operator computation flow.

In yet another aspect, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of:

In yet another aspect, a computer readable storage medium is provided, having stored thereon a computer program which when executed by a processor performs the steps of:

The computing force execution system, the operator computing flow management method, the device, the equipment and the medium, wherein the computing force execution system comprises: the system comprises a deep learning module, an artificial intelligence computing power execution module and a hardware equipment module; the artificial intelligence computing power execution module is connected with the deep learning module and is used for determining a second operator computing stream when the first operator computing stream is generated by the deep learning module; the hardware equipment module is connected with the artificial intelligence computing power execution module and is used for executing a task instruction corresponding to the second operator computing flow; the artificial intelligence computing power execution module comprises an initialization unit and an operator computing flow management unit; the initialization unit is used for initializing the artificial intelligence computing power execution module; the operator computing flow management unit is used for managing the second operator computing flow based on the first operator computing flow generated by the deep learning module, and unifies the operator computing flow of the deep learning module and the operator computing flow of the artificial intelligent computing force execution module, so that the consistency of real operator execution time and the deep learning module is well ensured, the problem that the training gradient dispersion phenomenon of the whole model occurs due to the fact that accurate computing results are not obtained because of the fact that the operator execution time is asynchronous is solved, system overhead is effectively reduced, and operator computing efficiency is improved.

Drawings

FIG. 1 is a diagram illustrating an example of conventional AI computing force execution framework flow management in the background;

FIG. 2 is a schematic diagram of an existing operator computation flow execution flow in a specific embodiment;

FIG. 3 is a schematic diagram of the overall structure of a computing force execution system in one embodiment;

FIG. 4 is a flow diagram of an operator computing flow management method in one embodiment;

FIG. 5 is a schematic diagram of a unified operator workflow execution flow for an operator workflow management method in one embodiment;

FIG. 6 is a block diagram of an operator computational flow management apparatus in one embodiment;

fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

It should be understood that throughout this description, unless the context clearly requires otherwise, the words "comprise," "comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is, it is the meaning of "including but not limited to".

It should also be appreciated that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

It should be noted that the terms "S1", "S2", and the like are used for the purpose of describing steps only, and are not intended to be limited to the order or sequence of steps or to limit the present application, but are merely used for convenience in describing the method of the present application and are not to be construed as indicating the sequence of steps. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be regarded as not exist and not within the protection scope of the present application.

According to the background technology, with the continuous advancement of technical innovation in the artificial intelligence field, the application scene facing the algorithm model in the deep learning is more and more complex, the facing hardware equipment is more and more diverse, the deep learning framework goes through the sprouting stage, the growing stage and the stabilizing stage to reach the current deepening stage, with the continuous development of the deep learning framework, the international market is mainly formed by technologies such as Google (Google), meta (internet science and technology companies of products such as a business community network service, a virtual reality and a metauniverse), the double-head pattern represented by Google-Tensorflow (open source software library for numerical computation by data flow diagrams) and Meta-PyTorch (open source machine learning framework based on Torch and using Python programming language) is gradually advanced, the market pattern of the AI framework is developed towards multiple elements, the full-scale PaddlePadddPaddy pulp framework is developed towards multiple elements, the MindSpcore framework is developed towards multiple elements, the chip is developed towards multiple elements in the national element frames, the multiple computing frames is developed towards multiple computing fields, and the chip is developed towards the same industry, and the chip is more advanced simultaneously, and the current computing market is more, and the application scene is more developed towards the same level, and the deep learning market is more, and the application scene is more advanced, and the computing scene is more than the current, and the computing scene is more and the production market is more than the important.

Based on the above requirement, the prior art cannot synchronize calculation information with each other, which ultimately results in inaccurate calculation results, and as shown in fig. 2, the deep learning framework issues a calculation task one and a task two to StreamA (a stream) when performing operator calculation, and invokes the result of the task two at a proper time, the task one and the task two invoke the back-end hardware device through the computing force execution framework, at this time, the computing force execution framework issues the task one and the task two which are also acquired by the computing force execution framework to StreamB (B stream), and finally the hardware device threads execute the task in StreamB, as shown in fig. 2, the StreamA is a stream maintained by the deep learning framework, and the StreamB is a stream maintained by the computing force execution framework, when the deep learning framework needs to acquire the result of the task two, the StreamB truly executes the task two cannot return a correct result, so that the deep learning framework acquires the error result to cause errors, and when the deep learning framework invokes the calculation result, the computing in the computing force execution framework does not end or prematurely end the computing in the computing force execution framework, so that the computing result does not end or end the resource, which causes the situation of extracting the computing result to not cause the gradient to occur, and the training process is wasted or the force is caused.

In order to solve the technical problems, the application provides an operator computing flow management method, an operator computing flow management device, equipment and a medium, and the operator computing flow of a deep learning module and the operator computing flow of an artificial intelligent operator computing flow management module are subjected to unified treatment, so that the consistency of real operator execution time and the deep learning module is well ensured, the problem that a training gradient dispersion phenomenon of the whole model occurs due to the fact that accurate computing results are not obtained because of the fact that the operator execution time is not synchronous is solved, the system overhead is effectively reduced, and the operator computing efficiency is improved.

In one embodiment, as shown in FIG. 3, a computing force execution system is provided that includes a deep learning module, an artificial intelligence computing force execution module, and a hardware device module;

Specifically, the deep learning module is a deep learning framework, is a software tool for realizing a deep learning algorithm, can be PaddlePaddle, tensorflow, caffe and the like, the artificial intelligence computing power execution module is an AI computing power execution framework, and also comprises an event management unit, an operator scheduling unit, a memory management unit and the like, is a set of standard interfaces, a characteristic library and a tool kit for designing, training and verifying an AI algorithm model, integrates encapsulation of the algorithm, data calling and use of computational resources, simultaneously provides a development interface and an efficient execution platform for a developer, is a necessary tool for the development of the AI algorithm at the present stage, is the core of the whole computing power execution system, and the AI computing power execution framework is in accordance with the principles of intrusion prevention, modularization and detachability, wherein, the non-invasive principle indicates that the development of the AI computing force execution frame should not be invaded into the deep learning frame, the error caused by the updating of the deep learning frame is avoided, the modularization principle indicates that each part in the computing force execution frame should appear in a modularized and small group mode, the decoupling of each part is realized, the detachable principle indicates that three parties between the computing force execution frame and the hardware equipment can be freely detached before the deep learning frame and the computing force execution frame, and normal use is not affected, therefore, in the application, the artificial intelligence computing force execution module upwards receives a plurality of deep learning modules and is downwards compatible with a plurality of hardware equipment for executing operator computing tasks, further, in order to ensure the consistency of operator computing flows in different modules, so as to realize the effects of improving the operation efficiency of the whole system and reducing the system cost, the operator computing flow management unit is arranged in the artificial intelligence computing force execution module, the operator computing flow management unit specifically comprises a flow pool initialization subunit, an operator computing flow acquisition subunit, an operator computing flow processing subunit, an operator computing flow destruction subunit and a flow pool, wherein:

The flow pool initialization subunit is used for initializing the flow pool;

the flow pool is used for storing the flow corresponding to the hardware equipment resource in the artificial intelligence computing power execution module, the hardware equipment resource is the hardware equipment resource applied from the hardware equipment module when the flow pool is constructed, and the hardware equipment resource is distributed to the flow, so that the flow corresponding to the hardware equipment resource is obtained and stored in the flow pool.

In some embodiments, the flow pool includes flows corresponding to at least one hardware device resource in the artificial intelligence computing power execution module;

the preset size of the flow pool is N, the first identification numbers of the multiple flows in the flow pool are sequentially M to N-1, wherein N-1 is greater than or equal to M, the size of the flow pool is obtained according to a special method, namely, the preset size of the flow pool is determined according to expert priori knowledge, the optimal value of the preset size of the flow pool is 16, the corresponding first identification numbers of the flows in the flow pool are sequentially 0 to 15, and as StreamM, streamM + … StreamN-1 in fig. 3 can be Stream0 and Stream1 … Stream15;

In the above embodiment, through the operator computation flow management unit set in the artificial intelligence computation power execution module, the operator computation flows in the multiple deep learning modules and the operator computation flows in the artificial intelligence computation power execution module can be unified, the flow pool structure is introduced, so that the performance loss caused by the system overhead can be effectively reduced, the computation efficiency and the operation efficiency of the artificial intelligence computation power execution module are improved, and meanwhile, resources can be more safely and effectively called by using the flow pool, and resource waste is avoided.

The various modules in the computing power execution system described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, as shown in fig. 4, there is provided an operator computing stream management method, including the steps of:

s1: and in response to detecting an operator calculation task execution instruction of the deep learning module, acquiring a first operator calculation flow in the deep learning module.

It should be noted that, the first operator computation flow is an operator computation flow of the deep learning framework, and when the operator computation task needs to be executed is detected, the computation flow of the deep learning module is obtained, where the computation flow refers to a flow used for operator computation, and generally only one computation flow exists.

In some embodiments, the method for obtaining the first operator computation flow in the deep learning module includes:

In some embodiments, parsing the model file of the deep learning model, the obtaining a first operator computation flow in the deep learning module includes:

analyzing the model files of the deep learning model based on a compiler to obtain a target operator sequence, wherein the compiler is generally arranged on a terminal device and can analyze the model files (such as pb, tflite, onnx and the like) of the deep learning model with different formats into the target operator sequence (such as one-dimensional operator sequence);

and acquiring the first operator computing stream based on the operator execution sequence of the operator computing stream, wherein the first operator computing stream is determined according to the sequencing result, and if the deep learning module executes the operator computing task for the first time, the operator computing stream sequenced at the first position is selected as the first operator computing stream.

In the above embodiment, the first operator computation flow is obtained by analyzing the deep learning model and sorting the analysis result, so as to be used for performing unified processing on the second operator computation flow of the artificial intelligence computation power execution module.

S2: and responding to the detection of the first operator computing stream, acquiring a second operator computing stream in the artificial intelligent computing power execution module, and carrying out unified processing on the first operator computing stream and the second operator computing stream.

It should be noted that, the second operator computing flow is an operator computing flow of the AI computing force execution framework, which exists in the flow pool of the artificial intelligence computing force execution module, when the first operator computing flow is detected, the second operator computing flow needs to be called in the flow pool, and the unification processing refers to that the two computing flows are kept consistent by utilizing the unification processing rule.

In some embodiments, prior to the obtaining the second operator computational flow in the artificial intelligence computational power execution module, the method further comprises:

in response to detecting that the target operator computing stream exists, defining the target operator computing stream as the second operator computing stream, wherein if the target operator computing stream exists, the fact that the stream pool is constructed at a certain time before is indicated, the operator computing stream in the constructed stream pool can be directly extracted as the second operator computing stream without constructing the stream pool again;

In some embodiments, the constructing the flow pool using the flow pool initialization subunit includes:

determining that the preset size of the flow pool is N based on a preset mapping table, wherein the preset mapping table comprises a mapping relation between at least one flow pool size and the number of hardware devices, the mapping relation can determine the preset size N of the flow pool corresponding to the number of different hardware devices according to a proprietary method, namely expert priori knowledge, and in the embodiment, the preferred value of N can be 16;

According to the preset size of the flow pool, applying for hardware device resources in a hardware device module, and allocating the hardware device resources to corresponding flows to generate the flow pool, wherein the hardware device resources generally refer to memory, a CPU, a video memory, a disk and the like in a physical machine, in this embodiment, the hardware device resources may be resources of the video memory, such as GPU32G video memory, and besides, occupation of a processor CPU may also be hardware device resources, when a certain hardware device resource is applied, and after allocating the hardware device resource to a corresponding flow, the flow is a part of the flow pool, so that when applying for the hardware device resources of the preset size and allocating to the corresponding flow, the flow pool can be generated.

In some embodiments, after constructing the flow pool using the flow pool initialization subunit, initializing the flow pool comprises:

carrying out identification numbers on streams corresponding to the hardware equipment resources to generate a first identification number, namely generating a corresponding Stream ID, wherein the Stream ID is an attribute in a corresponding interface in a setting module, only through the Stream ID, the Stream used as a calculation Stream in an AI calculation power execution frame can be accurately known, so that the accuracy of unified processing of the operator calculation Stream is ensured, the first identification number is sequentially M to N-1, wherein N-1 is larger than or equal to M, the first identification number is defined to be consistent with a second identification number corresponding to the operator calculation Stream in the operator calculation Stream management unit, namely, the first identification number corresponding to the Stream in a Stream pool is kept consistent, the ID is still in the process of calling the Stream in the Stream pool, and the exemplary operator calculation Stream is the Stream in the Stream pool, namely, the Stream0 is used for carrying out operator calculation, namely, the so-called operator calculation Stream, the ID of the Stream0 is unchanged in the process, wherein the M preference value is 0, the first identification number corresponding to the Stream0 in the Stream pool is sequentially set to be 0, and the Stream1 is sequentially divided into the Stream1, the Stream1 and the Stream15 is sequentially allocated according to the serial number of the Stream15, and the Stream1 is shown as the serial number is shown as the Stream 0;

In some embodiments, determining the second operator computation flow based on a flow pool initialization process result includes:

selecting a stream of a target hardware device resource corresponding to a first identification number M in the mapping relation, defining the stream of the target hardware device resource as the second operator computing stream, namely, if a stream pool needs to be constructed, representing that the deep learning module calls the artificial intelligence computing force executing module for the first time, and returning a default M number stream (namely, a 0 number stream) as the operator computing stream to the deep learning module.

In some embodiments, the obtaining the second operator computation flow in the artificial intelligence computation power execution module comprises:

In some embodiments, before unifying the first operator computation flow and the second operator computation flow, the method further includes:

In some embodiments, comparing the first operator computation flow with the second operator computation flow in the deep learning module comprises:

respectively obtaining a first memory address corresponding to the first operator computing stream and a second memory address corresponding to the second operator computing stream, wherein the operator computing stream allocates a memory space when being generated, the memory space contains address information, and a judgment criterion of whether the two computing streams are equal is to judge whether the two computing streams point to the same memory address;

and determining a target comparison result based on the similarity.

In some embodiments, determining whether to unify the first operator computation flow and the second operator computation flow based on the target comparison result includes:

in response to detecting that the similarity is greater than or equal to a first preset value, performing no unified processing on the first operator computing stream and the second operator computing stream, wherein the first preset value can be set according to actual requirements and is generally 100%, namely, the first memory address and the second memory address are identical, and at the moment, the two computing streams are identical, and no unified processing is needed;

and in response to detecting that the similarity is smaller than a first preset value, performing unified processing on the first operator computing stream and the second operator computing stream, namely when the first memory address and the second memory address are not identical, indicating that the two computing streams are inconsistent at the moment, and solving the gradient dispersion problem caused by inconsistent operator computing streams, so that the two operator computing streams need to be subjected to unified processing.

In some embodiments, unifying the first operator computation flow and the second operator computation flow comprises:

and in response to detecting that the similarity is smaller than a first preset value, using an operator computing flow processing subunit to replace the first operator computing flow with the second operator computing flow to serve as a target operator computing flow of the artificial intelligent computing force executing module, namely when two computing flows are inconsistent, the deep learning module endows the artificial intelligent computing force executing module with the first operator computing flow of the deep learning module to replace the original second operator computing flow in the artificial intelligent computing force executing module, so that the computing flows in the two modules are consistent.

In the embodiment, the operator computing flows in different deep learning modules are unified into the artificial intelligent computing power executing module, so that correct computing results can be obtained when different deep learning frame modules face different computing chips, the problem that the operator computing results are failed to obtain due to inconsistent execution time of operator computing tasks caused by independence of the operator computing flows among the modules is effectively avoided, meanwhile, the operator computing flows are imported into a flow pool structure, identification numbers are carried out for the operator computing flows, namely, flow IDs are set, system overhead caused by introducing a large number of read-write operations due to frequent interaction among different modules is avoided while unified management is carried out, and further the problem of performance loss is caused, and the computing efficiency of the artificial intelligent computing power executing module is improved.

S3: based on the unified processing result, whether to execute the operator calculation task is determined, so that management of the operator calculation flow is realized.

It should be noted that, in some embodiments, based on the unified processing result, determining whether to execute the operator computing task includes:

in response to detecting that the similarity of the first operator computing stream to the second operator computing stream is less than a first preset value, not performing an operator computing task,

in response to detecting that the similarity between the first operator computing stream and the second operator computing stream is greater than or equal to a first preset value, executing an operator computing task, namely starting an operator computing force artificial intelligence computing force executing module and a deep learning module to execute the operator computing task in a multithreading manner, as shown in fig. 5, when the operator computing stream in the deep learning module is consistent with the operator computing stream in the AI computing force executing module, as shown in fig. 5, compared with the situation that the operator computing stream in the deep learning module is inconsistent, the starting execution time and the ending time of the task one and the task two are the same, when the deep learning module calls the result of the task two, because the AI computing force executing module has completed the computation of the task two and returns the result to the deep learning module, the deep learning module can obtain the result of the correct task two, and so on, the operator computing task of the whole model and the deep learning module can obtain the correct result, thereby solving the gradient dispersion problem caused by the inconsistent operator computing stream during model training.

In some embodiments, after the operator computing task execution is complete, the method further comprises:

after the operator computing tasks are executed, the operator computing flow destroying subunit is utilized to unbind the deep learning module and the artificial intelligent computing force executing module, and the operator computing flows and the flow pools of the artificial intelligent computing force executing module are destroyed so as to release occupied resources and avoid waste of the resources.

In the embodiment, after the operator computing task is executed, occupied resources are recovered, so that the resources can be more safely and effectively called, and the waste of the resources is avoided.

In the operator computation flow management method, the method comprises the following steps: acquiring a first operator computation flow in a deep learning module in response to detection of an operator computation task execution instruction of the deep learning module; in response to detecting the first operator computing stream, acquiring a second operator computing stream in an artificial intelligent computing power execution module, and carrying out unified processing on the first operator computing stream and the second operator computing stream; based on the unified processing result, whether the operator calculation task is executed is determined so as to realize management of the operator calculation flow, the operator calculation flow of the deep learning module and the operator calculation flow of the artificial intelligence calculation power execution module are subjected to unified processing, so that the consistency of real operator execution time and the deep learning module is well ensured, the problem that the training gradient dispersion phenomenon of the whole model occurs due to the fact that the correct calculation result cannot be obtained because of the asynchronous operator execution time is solved, the system cost is effectively reduced, and the operator calculation efficiency is improved.

It should be understood that, although the steps in the flowcharts of fig. 4-5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 3-4 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

In one embodiment, as shown in fig. 6, there is provided an operator computation flow management apparatus including: the system comprises a first operator computation flow acquisition module, a unified processing module and a computation flow management module, wherein:

As a preferred implementation manner, in the embodiment of the present invention, the first operator computation flow obtaining module is specifically configured to:

As a preferred implementation manner, in the embodiment of the present invention, the first operator computation flow obtaining module is specifically further configured to:

As a preferred implementation manner, in the embodiment of the present invention, the apparatus further includes a judging module, where the judging module is specifically configured to:

In an embodiment of the present invention, the determining module is specifically configured to:

As a preferred implementation manner, in the embodiment of the present invention, the determining module is specifically further configured to:

As a preferred implementation manner, in the embodiment of the present invention, the unified processing module is specifically configured to:

As a preferred implementation manner, in the embodiment of the present invention, the apparatus further includes a comparison module, where the comparison module is specifically configured to:

As a preferred implementation manner, in the embodiment of the present invention, the comparison module is specifically further configured to:

And determining a target comparison result based on the similarity.

As a preferred implementation manner, in the embodiment of the present invention, the unified processing module is specifically further configured to:

As a preferred implementation manner, in the embodiment of the present invention, the computing flow management module is specifically configured to:

As a preferred implementation manner, in the embodiment of the present invention, the apparatus further includes an unbinding destruction module, where the unbinding destruction module is specifically configured to:

For specific limitations on the operator computation flow management apparatus, reference may be made to the above limitation on the operator computation flow management method, which is not described herein. The respective modules in the operator computation flow management apparatus described above may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an operator computational flow management method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of when executing the computer program:

s1: acquiring a first operator computation flow in a deep learning module in response to detection of an operator computation task execution instruction of the deep learning module;

s2: in response to detecting the first operator computing stream, acquiring a second operator computing stream in an artificial intelligent computing power execution module, and carrying out unified processing on the first operator computing stream and the second operator computing stream;

In one embodiment, the processor when executing the computer program further performs the steps of:

and determining a target comparison result based on the similarity.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

and determining a target comparison result based on the similarity.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The computing force execution system is characterized by comprising a deep learning module, an artificial intelligent computing force execution module and a hardware equipment module;

2. The computing power execution system of claim 1, wherein the operator computing stream management unit comprises a stream pool initialization subunit, an operator computing stream acquisition subunit, an operator computing stream processing subunit, an operator computing stream destruction subunit, and a stream pool;

the flow pool initialization subunit is used for initializing the flow pool;

3. The computing force execution system of claim 2, wherein the flow pool comprises flows corresponding to at least one hardware device resource in the artificial intelligence computing force execution module;

4. An operator computation flow management method based on the computation force execution system of any one of claims 1 to 3, characterized in that the method comprises:

5. The method for managing operator computation flows according to claim 4, wherein the method for acquiring the first operator computation flow in the deep learning module comprises:

6. The operator computation flow management method of claim 5, wherein parsing a model file of the deep learning model to obtain a first operator computation flow in the deep learning module comprises:

7. The operator computation flow management method of claim 4, wherein prior to said acquiring a second operator computation flow in an artificial intelligence computation force execution module, said method further comprises:

8. The operator computing stream management method according to claim 7, wherein said constructing a stream pool using a stream pool initialization subunit comprises:

9. The operator computing stream management method according to claim 7, wherein after constructing a stream pool using a stream pool initialization subunit, initializing the stream pool comprises:

10. The operator computation flow management method of claim 7, wherein determining the second operator computation flow based on a flow pool initialization processing result comprises:

11. The operator computation flow management method of claim 4, wherein said acquiring a second operator computation flow in an artificial intelligence computation force execution module comprises:

12. The operator computing stream management method of claim 4, wherein prior to unifying the first operator computing stream and the second operator computing stream, the method further comprises:

13. The operator computation flow management method of claim 12, wherein comparing the first operator computation flow with the second operator computation flow in the deep learning module comprises:

and determining a target comparison result based on the similarity.

14. The operator computing stream management method according to claim 13, wherein determining whether to unify the first operator computing stream and the second operator computing stream based on a target comparison result comprises:

15. The operator computation flow management method of claim 13, wherein unifying the first operator computation flow and the second operator computation flow comprises:

16. The operator computing stream management method according to claim 4, wherein determining whether to execute the operator computing task based on the unified processing result comprises:

17. The operator computing stream management method of claim 4, wherein after completion of the operator computing task execution, the method further comprises:

18. An operator computation flow management apparatus, the apparatus comprising:

19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 4 to 17 when executing the computer program.

20. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 4 to 17.