CN114090262A

CN114090262A - Object processing method and device, electronic equipment and storage medium

Info

Publication number: CN114090262A
Application number: CN202111434793.4A
Authority: CN
Inventors: 祝叶华; 孙炜
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-02-25

Abstract

The disclosed embodiment relates to an object processing method and device, an electronic device and a storage medium, and relates to the technical field of computers, wherein the object processing method comprises the following steps: acquiring a plurality of types of processing algorithms for processing an object to be processed; distributing a target calculation engine for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the multiple types of processing algorithms; the candidate computing engines comprise a plurality of computing engines in an engine resource pool and special computing engines; and performing a corresponding processing algorithm on the object to be processed based on the target computing engine to execute processing operation, and acquiring an operation result corresponding to the object to be processed. The technical scheme disclosed by the invention can accurately allocate the computing resources and improve the balance of resource allocation.

Description

Object processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an object processing method, an object processing apparatus, an electronic device, and a computer-readable storage medium.

Background

With the wide application of artificial intelligence technology, in order to achieve good effect of software algorithms, each application scenario needs to be completed together by fusion of multiple algorithms.

In the related art, an artificial intelligence algorithm is generally calculated by an artificial intelligence acceleration circuit and a Digital Signal Processor (DSP), and other algorithms are processed by the respective DSPs.

In the above manner, the algorithm may have a waste of resources during the execution process, thereby resulting in uneven resource allocation. Moreover, each DSP can only execute one task, and has certain limitation and poor accuracy.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object processing method and apparatus, an electronic device, and a storage medium are provided to overcome, at least to some extent, the problem of resource allocation non-uniformity due to limitations and drawbacks of the related art.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to an aspect of the present disclosure, there is provided an object processing method including: acquiring a plurality of types of processing algorithms for processing an object to be processed; distributing a target calculation engine for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the multiple types of processing algorithms; the candidate computing engines comprise a plurality of computing engines in an engine resource pool and special computing engines; and performing a corresponding processing algorithm on the object to be processed based on the target computing engine to execute processing operation, and acquiring an operation result corresponding to the object to be processed.

According to an aspect of the present disclosure, there is provided an object processing apparatus including: the algorithm acquisition module is used for acquiring a plurality of types of processing algorithms contained in the processing operation of the object to be processed; the calculation engine distribution module is used for distributing a target calculation engine for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the multiple types of processing algorithms; the candidate computing engines comprise a plurality of computing engines in an engine resource pool and special computing engines; and the processing operation execution module is used for executing the processing operation by performing the corresponding processing algorithm on the object to be processed based on the target computing engine so as to obtain the operation result corresponding to the object to be processed.

According to an aspect of the present disclosure, there is provided an electronic device including: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform any one of the object processing methods described above via execution of the executable instructions.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object processing method of any one of the above.

In the object processing method, the object processing apparatus, the electronic device, and the computer-readable storage medium provided in the embodiments of the present disclosure, a target calculation engine is allocated to each type of processing algorithm through attribute information of the plurality of types of processing algorithms; the compute engines include a plurality of compute engines in an engine resource pool and a dedicated compute engine; and performing corresponding operation on the object to be processed based on the target computing engine to execute processing operation. On one hand, because a proper target computing engine can be allocated to each type of processing algorithm from the to-be-selected computing engines consisting of a plurality of computing engines in the engine resource pool and the special computing engine according to the attribute information of the plurality of types of processing algorithms, the engines corresponding to all types of processing algorithms are integrated, and a proper target computing engine is selected from the integrated to-be-selected computing engines, the limitation that only one computing engine can be adopted for processing operation in the related technology is avoided, the target computing engines can be accurately allocated, and the accuracy is improved. On the other hand, all the computing engines can be coordinated in a unified manner, the singleness of executing one task by each computing engine is avoided, the balance of resource allocation is improved, the resource sharing of a plurality of computing engines is realized, the object to be processed can be quickly and accurately operated according to the target computing engine, the computing resources required in the processing process are reduced, the power consumption is reduced, and the processing efficiency and the operational performance of the network model are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 is a schematic diagram illustrating an application scenario to which an object processing method or an object processing apparatus according to an embodiment of the present disclosure may be applied.

FIG. 2 shows a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Fig. 3 shows a schematic diagram of an architecture suitable for use in implementing the related art.

Fig. 4 schematically illustrates a schematic diagram of an object processing method in an embodiment of the present disclosure.

Fig. 5 schematically illustrates an architecture diagram in an embodiment of the present disclosure.

FIG. 6 schematically illustrates a flow diagram for assigning a target compute engine in an embodiment of the present disclosure.

FIG. 7 schematically illustrates a flow diagram for assigning a target compute engine for a first type of processing algorithm in an embodiment of the disclosure.

Fig. 8 schematically illustrates an application diagram of a target computing engine in an embodiment of the present disclosure.

Fig. 9 schematically illustrates an overall flow diagram of a distribution calculation engine in an embodiment of the present disclosure.

Fig. 10 schematically illustrates a block diagram of an object processing apparatus in an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

An object processing method is provided in the embodiments of the present disclosure, and fig. 1 is a schematic diagram illustrating an application scenario to which the object processing method or the object processing apparatus according to the embodiments of the present disclosure may be applied.

Referring to fig. 1, the client 101 may be various types of devices with computing capabilities, such as a smartphone, a tablet, a desktop computer, an in-vehicle device, a wearable device, and so on. The object to be processed 102 may be a video, an image, or data, etc. The client 101 may obtain a plurality of types of processing algorithms included in the processing operation performed on the object to be processed; distributing a target calculation engine for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the multiple types of processing algorithms; and then, performing corresponding operation on the object to be processed 102 based on the target computing engine to execute the processing operation, and acquiring an operation result corresponding to the object to be processed 102.

It should be noted that the object processing method provided by the embodiment of the present disclosure may be completely executed by the client. Accordingly, the object processing apparatus may be provided in the client.

FIG. 2 shows a schematic diagram of an electronic device suitable for use in implementing exemplary embodiments of the present disclosure. The terminal of the present disclosure may be configured in the form of an electronic device as shown in fig. 2, however, it should be noted that the electronic device shown in fig. 2 is only one example, and should not bring any limitation to the functions and the use range of the embodiment of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory for storing one or more programs, which when executed by the processor, cause the processor to implement the method of the exemplary embodiments of the present disclosure.

Specifically, as shown in fig. 2, the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a Universal Serial Bus (USB) interface 230, a charging management Module 240, a power management Module 241, a battery 242, an antenna 1, an antenna 2, a mobile communication Module 250, a wireless communication Module 260, an audio Module 270, a speaker 271, a microphone 272, a microphone 273, an earphone interface 274, a sensor Module 280, a display 290, a camera Module 291, a pointer 292, a motor 293, a button 294, and a Subscriber Identity Module (SIM) card interface 295. The sensor module 280 may include a depth sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

It is to be understood that the illustrated structure of the embodiment of the present application does not specifically limit the electronic device 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units, such as: the processor 210 may include an application processor, a modem processor, a graphics processor, an image signal processor, a controller, a video codec, a digital signal processor, a baseband processor, and/or a Neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. Additionally, a memory may be provided in processor 210 for storing instructions and data. The model training method in the present exemplary embodiment may be performed by an application processor, a graphics processor, or an image signal processor, and may be performed by the NPU when the method involves neural network related processing.

Internal memory 221 may be used to store computer-executable program code, including instructions. The internal memory 221 may include a program storage area and a data storage area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 200.

The communication function of the mobile terminal 200 may be implemented by a mobile communication module, an antenna 1, a wireless communication module, an antenna 2, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module may provide a mobile communication solution of 2G, 3G, 4G, 5G, etc. applied to the mobile terminal 200. The wireless communication module may provide wireless communication solutions such as wireless lan, bluetooth, near field communication, etc. applied to the mobile terminal 200.

The display screen is used for realizing display functions, such as displaying user interfaces, images, videos and the like. The camera module is used for realizing shooting functions, such as shooting images, videos and the like. The audio module is used for realizing audio functions, such as audio playing, voice acquisition and the like. The power module is used for realizing power management functions, such as charging a battery, supplying power to equipment, monitoring the state of the battery and the like.

The present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device.

A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable storage medium may transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The computer-readable storage medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments below.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

Referring to the architecture diagram shown in diagram A in FIG. 3, the total acceleration Unit of the artificial intelligence algorithm is a general purpose NPU (Neural Network Processing Unit), as shown in the Generic NPU. The method mainly completes the acceleration of the artificial intelligence network. The artificial intelligence accelerating circuit module is a special accelerating module for an artificial intelligence operator, and mainly accelerates operators which are relatively fixed in an algorithm network, such as convolution operators and pooling operators. In order to be able to adapt to different network algorithms and maintain sufficient flexibility, a DSP module is added to the Generic NPU to handle tasks in the artificial intelligence algorithm network that are not suitable for using hardware acceleration. For the traditional image processing algorithm and Audio processing algorithm, due to the diversity and various kinds of algorithms, the image processing algorithm and the Audio processing algorithm are generally completed by using a DSP (digital signal processor), wherein the Video DSP completes the visual algorithm processing, and the Audio DSP completes the Audio data processing. Referring again to fig. 3, B, the processing algorithm includes multiple types of processing algorithms, and when performing task distribution, the overall computing engine is allocated according to the multiple types of processing algorithms, for example, only the DSP computing engine is allocated, or only the artificial intelligence acceleration circuit computing engine is allocated.

For the architecture in the related art, the algorithm cannot share resources in the execution process, and the resource is wasted to a certain extent. For example, if the current algorithm requiring Video DSP processing is complex, but the Audio DSP and the DSP inside the Generic NPU are in a light-load operation scenario, the hardware resources of each algorithm cannot be shared, so that the split engine design has a certain resource distribution inequality as a whole. The NPU is mainly used for hardware acceleration of an artificial intelligence network algorithm so as to process massive multimedia data such as videos and images.

In order to solve the above technical problem, in the embodiment of the present disclosure, an object processing method is provided. Next, an object processing method in the embodiment of the present disclosure is explained in detail with reference to fig. 4.

In step S410, a plurality of types of processing algorithms included in the processing operation performed on the object to be processed are acquired.

The embodiment of the disclosure can be particularly applied to an application process of a neural network model. The object to be processed may be a video, an image, or other type of object, and the like, and is determined according to an application scenario. The processing operation may be determined according to an actual application scenario. For example, when the actual application scenario is an identification scenario, the processing operation may be an identification operation; when the actual application scenario is a classification scenario, the processing operation may be a classification operation or the like.

In performing a processing operation on an object to be processed, multiple types of processing algorithms may be included. That is, a plurality of types of processing algorithms are combined to complete the processing operation. For example, in the process of identifying the object to be processed, the processing algorithm may include, but is not limited to, artificial intelligence algorithm, image processing algorithm, audio processing algorithm, and the like. Wherein the artificial intelligence algorithm may be identified as a first type of processing algorithm and the image processing algorithm and the audio processing algorithm may be identified as a second type of processing algorithm.

In step S420, a target calculation engine is allocated for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the plurality of types of processing algorithms; the candidate compute engines include a plurality of compute engines in an engine resource pool and a dedicated compute engine.

In the embodiment of the present disclosure, after the multiple types of processing algorithms are obtained, the multiple types of processing algorithms may be segmented according to the operator type to be divided into a fixed algorithm and a non-fixed algorithm. The operator type may be a fixed operator or a non-fixed operator. The fixed algorithm may be an algorithm that can be accelerated using hardware, such as a fixed operator in an artificial intelligence algorithm, convolution, pooling, and the like. Non-stationary algorithms refer to algorithms that cannot be accelerated using hardware, including non-stationary operators in artificial intelligence algorithms and all second type of processing algorithms, such as non-linear activation functions in artificial intelligence algorithms as well as audio algorithms, visual algorithms, and so forth.

When the processing algorithm is an artificial intelligence algorithm, it can be divided into a plurality of operators. The plurality of operators may include a fixed operator or a reference type operator (i.e., a non-fixed operator). For example, when the processing algorithm is an artificial intelligence algorithm, the artificial intelligence algorithm may be split into fixed operators, such as convolution, pooling, activation, and the like. However, there are some differences between operators used in different artificial intelligence algorithms, for example, a simpler Relu may be used for the activation function, and a nonlinear sigmoid or tanh function may also be used for the activation function.

After the artificial intelligence algorithm is divided into a plurality of operators, the similarity points of a plurality of processing algorithms can be extracted, and particularly, the non-fixed operator which is used for expressing a reference type operator in the plurality of processing algorithms can be extracted.

The attribute information is used to describe the state of the operator contained in each type of processing algorithm. For the first type of processing algorithm, its attribute information may be determined according to the degree of complexity. The complexity level is determined by the operators included in the first type of processing algorithm, and may specifically be determined according to the number of reference type operators included in the first type of processing algorithm. The reference type operator may be an operator that cannot be accelerated using dedicated hardware, i.e. a variable non-fixed operator, such as a non-linear activation function or other operator that cannot be accelerated using dedicated hardware of the artificial intelligence acceleration circuit module. And the complexity is positively correlated with the number of reference type operators comprised by the first type of processing algorithm. That is, the greater the number of reference type operators involved, the greater its complexity; the fewer the number of reference type operators involved, the less complex it is. For the second type of processing algorithm, the attribute information is determined according to the corresponding algorithm type, and for example, the attribute information is unchanged in complexity.

After obtaining the attribute information of the processing algorithms of multiple types, a corresponding target computing engine may be allocated to each type of processing algorithm from the candidate computing engines composed of the multiple computing engines in the engine resource pool and the dedicated computing engine according to the attribute information.

In the embodiment of the disclosure, different DSP engine resources can be collected together for unified management, so as to form a DSP engine resource pool. The engine resource pool can be used for processing algorithms which cannot be accelerated by hardware in visual algorithms, audio algorithms and artificial intelligence algorithms. And a special computing engine represented by a hardware circuit capable of using hardware acceleration in the artificial intelligence algorithm can be independently processed and can be used as an engine resource on a bus together with a DSP engine resource pool.

Referring to the structure of the engine resource shown in fig. 5, the engine resource pool 501 includes a plurality of DSP engine resources (i.e. a plurality of DSPs) of different types, and the number of DSP computing engines can be set according to actual requirements, for example, if the number of the second type of processing algorithms is large, the number of computing engines is large. The engine resource pool 501 may be connected with a dedicated compute engine 502 to facilitate inter-working communications. In addition to this, an engine resource pool 501 and a dedicated computing resource 502 may be connected to an OCM (on chip memory) 503 to store the processing result of the computing engine to the on-chip memory through a bus. In addition to this, the required data can be retrieved from the on-chip memory. A through channel is added between the DSP resource pool and the hardware acceleration unit of the artificial intelligent acceleration circuit module to control the special calculation engine and a plurality of calculation engines in the engine resource pool to carry out through communication, so that the processing of the intermediate processing result can be processed on line, and the data is prevented from being carried between the calculation unit and the storage unit for many times. By providing a pass-through path, the possibility is provided for the fusion of some operators. For example, if the AI algorithm completes the calculation, and some post-processing needs to be completed in the DSP pool, the result can be sent to the DSP pool through a direct path for direct calculation, without the need for the artificial intelligence acceleration circuit module to store the result into the on-chip memory OCM through a bus, and then the DSP obtains data from the OCM. Based on the method, the time delay is shortened, the system power consumption is reduced, and the system performance is improved.

For the structure of the engine resources shown in fig. 5, the engine resource pool 501 may be used for processing non-fixed algorithms, and the dedicated computing engine 502 is an artificial intelligence acceleration circuit, and may be used for processing fixed algorithms in artificial intelligence algorithms. By dividing the first type of processing algorithm and accelerating the fixed algorithm by using a special circuit, DSP operation engines in the system are uniformly coordinated, so that the singleness of each DSP executing task is avoided, and the whole hardware resource allocation is more balanced.

Based on this, the candidate compute engines include multiple compute engines in an engine resource pool and a dedicated compute engine. Further, an appropriate target compute engine may be assigned for each type of processing algorithm from among the multiple compute engines and dedicated compute engines included in the candidate compute engines. As can be seen, instead of directly using a certain fixed calculation engine as a target calculation engine for each type of processing algorithm, a plurality of calculation engines and dedicated calculation engines are used together as engine resources, and a target calculation engine corresponding to each type of processing algorithm is specified from all the engine resources. By selecting from all engine resources based on the whole engine design, the condition that one task is processed only through a corresponding engine can be avoided, the limitation and the singularity are avoided, the unified coordination can be carried out, the sharing of the engine resources is realized, and the whole hardware resource allocation is more balanced and reasonable.

FIG. 6 shows a flow diagram of a distributed target computing engine. As shown in fig. 6, mainly includes the following steps:

in step S610, the target computing engine is assigned to a first type of processing algorithm according to the complexity of the first type of processing algorithm.

In the embodiment of the present disclosure, the corresponding target calculation engine may be determined according to the number of reference type operators included in the first type of processing algorithm. It should be noted that, since the first type of processing algorithm is segmented, the corresponding target computing engine may be at least one computing engine.

A flow chart for assigning a target computing engine for a first type of processing algorithm is schematically shown in fig. 7, and with reference to fig. 7, mainly comprises steps S710 to S730, wherein:

in step S710, if the complexity level is a first level, the dedicated computing engine is used as the target engine;

in step S720, if the complexity level is a second level, using one of the dedicated computing engine and the engine resource pool as the target engine;

in step S730, if the complexity level is a third level, the dedicated computing engine and the multiple computing engines in the engine resource pool are used as the target engines.

In the embodiment of the present disclosure, the sequence of the complexity from large to small is: a third degree, a second degree, and a first degree. If the reference type operator is not contained, the target calculation engine is a special calculation engine; if the number of the included reference type operators is less, the target computing engine is a special computing engine and one computing engine in an engine resource pool; if the number of the included reference type operators is large, the target computing engine is a special computing engine and a plurality of computing engines in the engine resource pool.

Referring to a diagram a in fig. 8, for the AI algorithm with moderate complexity, that is, there are a small number of algorithms that cannot be accelerated by the hardware dedicated to the artificial intelligence acceleration circuit module in addition to the hardened operator part in the AI network algorithm, a DSP calculation engine is assigned to execute the algorithm. Referring to the diagram B in fig. 8, it is more single for the AI algorithm, in which the operator can be covered by the hardware acceleration unit without being allocated with the DSP calculation engine. As shown in the diagram C in fig. 8, the AI algorithm is complex, and besides the algorithm that can be accelerated using dedicated circuitry, there are a large number of operators that are not suitable for direct implementation by hardware circuitry, such as nonlinear activation functions. Two DSP calculation engines and one neural network dedicated accelerator (dedicated calculation engine) are allocated to operate specifically for the AI algorithm.

In step S620, the computing engine in the engine resource pool is used as the target computing engine corresponding to the second type of processing algorithm.

In the embodiment of the present disclosure, after allocating the target calculation engine for the first type of processing algorithm, all remaining engines in the candidate calculation engines may be used as the target calculation engines corresponding to the second type of processing algorithm. Referring to diagram a in fig. 8, for moderate AI algorithm complexity, the remaining two DSP compute engines in the engine resource pool are assigned to the second type of processing algorithm as target compute engines. Referring to fig. 8, diagram B, for a single AI algorithm, all DSP compute engines in the engine resource pool are assigned to the second type of processing algorithm as target compute engines. Referring to fig. 8, diagram C, for AI algorithms that are more complex, one DSP compute engine remaining in the engine resource pool is assigned to the second type of processing algorithm as the target compute engine. It should be noted that the combination manner of the calculation engines is not limited to the manner shown in fig. 8, and is determined according to the actual situation. For example, the multi-core artificial intelligence acceleration circuit module can be used as a special computing engine to form a candidate computing engine together with the multi-core DSP computing engine, and a target computing engine suitable for each type of processing algorithm is selected from the candidate computing engine.

In the embodiment of the disclosure, a plurality of DSP computing engines and a dedicated computing engine in an engine resource pool may be fused into a whole, and an operator segmentation is performed on the first type of processing algorithm, so as to select a target computing engine matched with the first type of processing algorithm and the second type of processing algorithm from the computing resources to be selected according to the type of the operator included in the first type of processing algorithm. And selecting a target computing engine suitable for each type of processing algorithm from the candidate computing engines by taking a plurality of computing engines in the engine resource pool and the special computing engines as the candidate computing engines. The method can integrate all the computing engines, avoid the condition that one task is processed only through the corresponding engine in the related technology, avoid limitation and unicity, and carry out unified coordination, thereby ensuring that the whole hardware resource allocation is more balanced and reasonable.

Next, with continuing reference to fig. 4, in step S430, a corresponding processing algorithm is performed on the object to be processed based on the target computing engine to perform a processing operation, so as to obtain an operation result corresponding to the object to be processed.

In the embodiment of the present disclosure, after the target calculation engine is selected, the corresponding processing algorithm may be performed on the object to be processed according to the target calculation engine, so as to execute the processing operation associated with the application scenario, and obtain the operation result of the object to be processed. For example, the special calculation engine is used for processing an artificial intelligence algorithm, and the DSP calculation engine is used for processing an audio algorithm and an image algorithm, so that the recognition result of the object to be processed is obtained.

The overall flow diagram of the allocation calculation engine is schematically shown in fig. 9, and with reference to fig. 9, mainly includes the following steps:

in step S910, a plurality of types of processing algorithms included in the processing operation are acquired; specifically, step S911 is included to obtain a first type of processing algorithm. Step S912 may be included to obtain a second type of processing algorithm.

In step S920, the first type of processing algorithm is segmented to obtain a plurality of operators;

in step S930, according to the attribute information corresponding to the plurality of operators, allocating a target calculation engine to the first type of processing algorithm;

in step S940, a target calculation engine is allocated for the second type of processing algorithm in combination with the target calculation engine corresponding to the first type of processing algorithm.

For example, a scene of video processing is taken as an example for illustration. In the process of video processing, the whole processing operation is divided into traditional processing algorithm and artificial intelligence-based algorithm processing. Conventional processing algorithms include audio algorithms as well as image algorithms. The artificial intelligence algorithm obtains fixed operators after being split, such as convolution, pooling, activation and the like. However, there are some differences between operators used in different artificial intelligence algorithms, for example, a simpler Relu may be used for the activation function, and a nonlinear sigmoid or tanh function may also be used for the activation function. Based on this, the fixed operators may be hardware accelerated by the dedicated compute engines, while the reference type operators included in the first type of processing algorithm may be hardware accelerated by one or more compute engines in the engine resource pool, depending on the number of reference type operators included in the first type of processing algorithm, and so on. That is, the target compute engine for the first type of processing algorithm is determined in conjunction with the dedicated compute engine and the compute engines in the engine resource pool. And the remaining compute engines in the engine resource pool are targeted compute engines for the second type of processing algorithm.

On the basis, hardware acceleration can be carried out on fixed algorithms in the special computing engines corresponding to the first type of processing algorithms, and hardware acceleration can be carried out on non-fixed algorithms in the special computing engines in the engine resource pool. And processing the processing algorithm of the second type through a plurality of computing engines in the engine resource pool corresponding to the processing algorithm of the second type.

According to the technical scheme in the embodiment of the disclosure, after the calculation result is obtained, the corresponding operation can be executed on the object to be processed on the basis of the calculation result. The corresponding operation is specifically determined according to the application scenario. For example, the object to be processed may be identified or the like according to the calculation result.

An object processing apparatus provided in an embodiment of the present disclosure, referring to fig. 10, the object processing apparatus 1000 may include:

an algorithm obtaining module 1001, configured to obtain multiple types of processing algorithms included in a processing operation performed on an object to be processed;

a calculation engine allocation module 1002, configured to allocate a target calculation engine for each type of processing algorithm from the candidate calculation engines according to the attribute information of the multiple types of processing algorithms; the candidate computing engines comprise a plurality of computing engines in an engine resource pool and special computing engines;

a processing operation executing module 1003, configured to perform a corresponding processing algorithm on the object to be processed based on the target computing engine to execute a processing operation, and obtain an operation result corresponding to the object to be processed.

In an exemplary embodiment of the present disclosure, the calculation engine allocation module includes: the first distribution module is used for distributing the target calculation engine for the first type of processing algorithm according to the complexity of the first type of processing algorithm; and the second distribution module is used for taking the computing engine in the engine resource pool as the target computing engine corresponding to the second type of processing algorithm.

In an exemplary embodiment of the present disclosure, the first distribution module includes: a first degree module, configured to take the special-purpose computing engine as the target engine if the complexity degree is a first degree; a second degree module, configured to take the dedicated computing engine and one computing engine in the engine resource pool as the target engine if the complexity degree is a second degree; a third degree module, configured to take the dedicated computing engine and the multiple computing engines in the engine resource pool as the target engine if the complexity degree is a third degree.

In an exemplary embodiment of the present disclosure, the apparatus further includes: and the complexity determining module is used for determining the complexity of the first type of processing algorithm according to a reference type operator contained in the first type of processing algorithm.

In an exemplary embodiment of the present disclosure, the apparatus further includes: and the algorithm dividing module is used for dividing the processing algorithms of the multiple types into a fixed algorithm and a non-fixed algorithm according to the types of the algorithms.

In an exemplary embodiment of the disclosure, the algorithm partitioning module is configured to: and determining a fixed operator in the first type of processing algorithm as the fixed algorithm, and determining a non-fixed operator in the first type of processing algorithm and an operator corresponding to the second type of processing algorithm as the non-fixed algorithm.

In an exemplary embodiment of the present disclosure, the apparatus further includes: and the communication control module is used for controlling the special calculation engine to carry out direct communication with the plurality of calculation engines in the engine resource pool through a direct communication channel.

It should be noted that, the specific details of each module in the object processing apparatus have been described in detail in the corresponding object processing method, and therefore are not described herein again.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. An object processing method, comprising:

acquiring a plurality of types of processing algorithms for processing an object to be processed;

distributing a target calculation engine for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the multiple types of processing algorithms; the candidate computing engines comprise a plurality of computing engines in an engine resource pool and special computing engines;

and performing a corresponding processing algorithm on the object to be processed based on the target computing engine to execute processing operation, and acquiring an operation result corresponding to the object to be processed.

2. The object processing method according to claim 1, wherein the allocating a target calculation engine for each type of processing algorithm from among candidate calculation engines through the attribute information of the plurality of types of processing algorithms comprises:

allocating the target computing engine for a first type of processing algorithm according to the complexity of the first type of processing algorithm;

and taking the computing engine in the engine resource pool as the target computing engine corresponding to the second type of processing algorithm.

3. The object processing method according to claim 2, wherein said assigning the target computing engine to the processing algorithm of the first type according to the complexity of the processing algorithm of the first type comprises:

if the complexity degree is a first degree, taking the special calculation engine as the target engine;

if the complexity degree is a second degree, taking the special calculation engine and one calculation engine in the engine resource pool as the target engine;

and if the complexity degree is a third degree, taking the special calculation engine and a plurality of calculation engines in the engine resource pool as the target engines.

4. The object processing method according to claim 2, characterized in that the method further comprises:

and determining the complexity of the first type of processing algorithm according to a reference type operator contained in the first type of processing algorithm.

5. The object processing method according to claim 1, characterized in that the method further comprises:

the plurality of types of processing algorithms are divided into fixed algorithms and non-fixed algorithms according to the type of the algorithm.

6. The object processing method according to claim 5, wherein said dividing the plurality of types of processing algorithms into fixed algorithms and non-fixed algorithms according to the type of the algorithm comprises:

and determining a fixed operator in the first type of processing algorithm as the fixed algorithm, and determining a non-fixed operator in the first type of processing algorithm and an operator corresponding to the second type of processing algorithm as the non-fixed algorithm.

7. The object processing method according to claim 1, characterized in that the method further comprises:

controlling the dedicated compute engine to communicate through a pass-through path with a plurality of compute engines in the engine resource pool.

8. An object processing apparatus, comprising:

the algorithm acquisition module is used for acquiring a plurality of types of processing algorithms contained in the processing operation of the object to be processed;

the calculation engine distribution module is used for distributing a target calculation engine for each type of processing algorithm from the calculation engines to be selected according to the attribute information of the multiple types of processing algorithms; the candidate computing engines comprise a plurality of computing engines in an engine resource pool and special computing engines;

and the processing operation execution module is used for executing the processing operation by performing the corresponding processing algorithm on the object to be processed based on the target computing engine so as to obtain the operation result corresponding to the object to be processed.

9. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the object processing method of any of claims 1-7 via execution of the executable instructions.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the object processing method of any one of claims 1 to 7.