CN116757259A

CN116757259A - Network model processing method, device, storage medium, and program product

Info

Publication number: CN116757259A
Application number: CN202211172204.4A
Authority: CN
Inventors: 伍国林; 王哲; 廖建文; 陆二伟
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2023-09-15

Abstract

The embodiment of the application provides a network model processing method, equipment, a storage medium and a program product, wherein the method comprises the following steps: acquiring an original calculation graph of a model to be processed, wherein the original calculation graph comprises at least one operator structure; retrieving a target operator structure of a preset type from the original calculation graph; in the original calculation graph, carrying out fusion operation on the target operator structure to generate a new operator structure, wherein the number of operators in the new operator structure is smaller than that in the target operator structure; and constructing a new computational graph of the model to be processed based on the new operator structure and the original computational graph. The application can simplify part of operator structures in the model from the level of the computational graph structure, further simplify the computational graph of the model, reduce unnecessary computation operations in the model, reduce access cost in the model reasoning process and reduce model reasoning time delay.

Description

Network model processing method, device, storage medium, and program product

Technical Field

The present application relates to the field of computer technologies, and in particular, to a network model processing method, device, storage medium, and program product.

Background

Artificial neural networks (Artificial Neural Networks, ANNs) are algorithmic mathematical models that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. Is widely applied to various scenes such as natural language models, image recognition models and the like.

In an actual scene, the neural network model needs to be deployed on a server after training is completed, and the application of the model is realized through a model reasoning process. In order to facilitate the transmission and conversion of neural network models at the time of model deployment, the trained model is typically converted into an onnx model. For example, in a TTS (Text To Speech) model, convolution and deconvolution generally all adopt one-dimensional data operation, when the TTS model generates an onnx model file, operations such as frequent dimension ascending and descending exist in the generated onnx model, and the model itself also includes many operator operations and excessive calculation operations, which not only can cause the increase of the reasoning delay of the model, but also can cause the model reasoning process To consume more energy.

Disclosure of Invention

The embodiment of the application provides a network model processing method, equipment, a storage medium and a program product, which can simplify part of operator structures in a model from the level of a computational graph structure, further simplify the computational graph of the model, reduce unnecessary computational operations in the model, reduce memory access expenditure in the model reasoning process and reduce model reasoning time delay.

In a first aspect, an embodiment of the present application provides a network model processing method, including: acquiring an original calculation graph of a model to be processed, wherein the original calculation graph comprises at least one operator structure; retrieving a target operator structure of a preset type from the original calculation graph; in the original calculation graph, carrying out fusion operation on the target operator structure to generate a new operator structure, wherein the number of operators in the new operator structure is smaller than that in the target operator structure; and constructing a new computational graph of the model to be processed based on the new operator structure and the original computational graph.

In one embodiment, one of the operator structures includes a plurality of operators and a computational relationship between the plurality of operators.

In an embodiment, the retrieving the target operator structure of the preset type from the original computation graph includes: traversing each operator structure in the original calculation graph, and taking the operator structure with the preset type of calculation relation as a target operator structure.

In an embodiment, the predetermined type of the calculation relationship at least includes: a computational relationship between a first type operator and the first type operator; and in the original calculation graph, performing fusion operation on the target operator structure to generate a new operator structure, including: extracting a second type of operators in the target operator structure from the original computational graph, wherein the second type of operators are residual operators except the first type of operators in the target operator structure; and fusing parameters of the second type operator into the first type operator, deleting the second type operator, and generating the new operator structure based on the calculation relation between the first type operators.

In an embodiment, the first type operator comprises: a dimension-increasing operator, a convolution operator and a dimension-decreasing operator; the fusing the parameters of the second type operator into the first type operator, deleting the second type operator, generating the new operator structure based on the calculation relation between the first type operators, including: and fusing parameters of the second type operator into the convolution operator, deleting the second type operator, and generating the new operator structure based on the calculation relation among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator.

In an embodiment, the first type operator comprises: a dimension-increasing operator, a deconvolution integrating operator and a dimension-reducing operator; the fusing the parameters of the second type operator into the first type operator, deleting the second type operator, generating the new operator structure based on the calculation relation between the first type operators, including: and fusing parameters of the second type operator into the deconvolution operator, deleting the second type operator, and generating the new operator structure based on the calculation relation among the dimension-increasing operator, the deconvolution operator and the dimension-decreasing operator.

In one embodiment, the method further comprises: and reasoning the model to be processed based on the new calculation graph.

In an embodiment, the reasoning the model to be processed based on the new computational graph includes: and in the model reasoning process, executing a first type of convolution operator in the new calculation graph by adopting a first instruction, wherein the number of channels of the first type of convolution operator is smaller than a preset value, and the first instruction supports unaligned convolution calculation.

In an embodiment, the reasoning the model to be processed based on the new computational graph includes: and in the model reasoning process, executing a second type of convolution operator in the new calculation graph by adopting a second instruction, wherein the number of channels of the second type of convolution operator is larger than or equal to the preset value, and the second instruction supports alignment convolution calculation.

In a second aspect, an embodiment of the present application provides a network model processing apparatus, including:

the acquisition module is used for acquiring an original calculation graph of the model to be processed, wherein the original calculation graph comprises at least one operator structure;

the retrieval module is used for retrieving a target operator structure of a preset type from the original calculation graph;

The fusion module is used for carrying out fusion operation on the target operator structure in the original calculation graph to generate a new operator structure, wherein the number of operators in the new operator structure is smaller than that in the target operator structure;

and the construction module is used for constructing a new computational graph of the model to be processed based on the new operator structure and the original computational graph.

In an embodiment, the retrieving module is configured to traverse each operator structure in the original computation graph, and take an operator structure with the preset type of computation relationship as a target operator structure.

In an embodiment, the predetermined type of the calculation relationship at least includes: a computational relationship between a first type operator and the first type operator; the fusion module is used for extracting a second type of operators in the target operator structure from the original calculation graph, wherein the second type of operators are residual operators except the first type of operators in the target operator structure; and fusing parameters of the second type operator into the first type operator, deleting the second type operator, and generating the new operator structure based on the calculation relation between the first type operators.

In an embodiment, the first type operator comprises: a dimension-increasing operator, a convolution operator and a dimension-decreasing operator; the fusion module is specifically configured to fuse parameters of the second type operator into the convolution operator, delete the second type operator, and generate the new operator structure based on a calculation relationship among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator.

In an embodiment, the first type operator comprises: a dimension-increasing operator, a deconvolution integrating operator and a dimension-reducing operator; the fusion module is specifically configured to fuse parameters of the second type operator to the deconvolution operator, delete the second type operator, and generate the new operator structure based on a calculation relationship among the dimension-increasing operator, the deconvolution operator, and the dimension-decreasing operator.

In one embodiment, the method further comprises: and the reasoning module is used for reasoning the model to be processed based on the new calculation graph.

In an embodiment, the inference module is configured to execute a first type of convolution operator in the new computation graph by using a first instruction in a model inference process, where the number of channels of the first type of convolution operator is smaller than a preset value, and the first instruction supports unaligned convolution computation.

In an embodiment, the inference module is further configured to execute a second type of convolution operator in the new computation graph by using a second instruction in the model inference process, where the number of channels of the second type of convolution operator is greater than or equal to the preset value, and the second instruction supports aligned convolution computation.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory for storing code instructions, the processor being for executing the code instructions to perform the method described in the first aspect of the embodiments of the present application or any one of the possible implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored therein a computer program or instructions which, when run on a computer, cause the computer to perform the method described in the first aspect of the embodiments of the present application or any one of the possible implementations of the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the network model processing method described in the first aspect or any one of the possible implementations of the first aspect of the embodiments of the present application.

The application provides a network model processing method, equipment, a storage medium and a program product, which are used for generating a new operator structure by fusing operator structures of a preset type which can be fused in an original calculation graph, reducing the number of operators in the new operator structure and reconstructing a simplified new calculation graph based on the new operator structure. Therefore, part of operator structures in the model are simplified from the level of the computational graph structure, the computational graph of the model is further simplified, unnecessary computational operations in the model are reduced, access cost in the model reasoning process is reduced, and model reasoning time delay is reduced.

It should be understood that the description of the application above is not intended to limit key or critical features of embodiments of the application, nor to limit the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the application or the technical solutions of the prior art, the following description of the embodiments or the drawings used in the description of the prior art will be given in brief, it being obvious that the drawings in the description below are some embodiments of the application and that other drawings can be obtained from them without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an electronic device according to an embodiment of the present application;

FIG. 2 is a block diagram of a software architecture of an electronic device according to an embodiment of the present application;

fig. 3 is a schematic diagram of an application scenario of network model processing according to an embodiment of the present application;

FIG. 4 is a flowchart illustrating a network model processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram illustrating a comparison of four target operator structures according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a new operator structure after the target operator structure fusion is simplified according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating a comparison of two target operator structures according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a new operator structure after the target operator structure fusion is simplified according to an embodiment of the present application;

FIG. 9 is a flowchart of a network model processing method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a calculation process supporting alignment convolution according to an embodiment of the present application;

FIG. 11 is a schematic diagram of a calculation process supporting unaligned convolution according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of a network model processing device according to still another embodiment of the present application.

Detailed Description

In embodiments of the present application, the words "first," "second," and the like are used to distinguish between identical or similar items that have substantially the same function and effect. For example, the first chip and the second chip are merely for distinguishing different chips, and the order of the different chips is not limited. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

For clarity in describing aspects of embodiments of the present application, the terms involved are first interpreted:

ANNs: artificial Neural Networks, artificial neural network.

TTS: text To Speech, i.e. "from Text To Speech", is part of a man-machine conversation, allowing the machine To speak. It can convert the files stored in the computer, such as help files or web pages, into natural speech output.

onnx: open Neural Network Exchange, open neural network switching. Is an open file format designed for machine learning for storing trained models. It allows different deep learning frameworks (e.g., pytorch, MXNet) to store model data in the same format. Briefly, onnx is an intermediate expression format that facilitates migration of models in various mainstream deep learning frameworks.

Operators: is a mapping O of function space to function space: X.fwdarw.X. The operator in the broad sense can be generalized to any space, such as an inner product space, etc. The deep learning algorithm consists of individual computational units, which may be referred to as operators (OP for short). In the network model, the computation logic in the operator corresponding layer, for example: the convolution layer (Convolution Layer) is an operator. The weight summation process in the full-connected Layer (FC Layer) is an operator.

Calculation chart: both the input and the computation function appear as nodes, and the relationships between the output terms of the nodes represent the composed computation graph in directed line segments. The computational graph of the neural network model takes operators in the model as nodes, and takes the data flow direction of the computational relationship among the operators as a computational graph of directed line segments.

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 shows a schematic configuration of an electronic device 100.

The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, such that the processor 110 communicates with the touch sensor 180K through an I2C bus interface to implement a touch function of the electronic device 100.

The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display functionality of the electronic device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the electronic device 100, and may also be used to transfer data between the electronic device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. Wireless communication techniques may include global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a quasi zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite based augmentation system (satellite based augmentation systems, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing, so that the electrical signal is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer-executable program code that includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The processor 110 may be adapted to execute any of the methods provided by the embodiments of the present application in accordance with the obtained executable instructions by invoking a computer program stored in the memory 121.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a "horn," is used to convert audio electrical signals into sound signals. The electronic device 100 may listen to music, or to hands-free conversations, through the speaker 170A.

A receiver 170B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When electronic device 100 is answering a telephone call or voice message, voice may be received by placing receiver 170B in close proximity to the human ear.

Microphone 170C, also referred to as a "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can sound near the microphone 170C through the mouth, inputting a sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, and may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four, or more microphones 170C to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The earphone interface 170D is used to connect a wired earphone. The headset interface 170D may be a USB interface 130 or a 3.5mm open mobile electronic device platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. The capacitance between the electrodes changes when a force is applied to the pressure sensor 180A. The electronic device 100 determines the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the touch operation intensity according to the pressure sensor 180A. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the electronic device 100. In some embodiments, the angular velocity of electronic device 100 about three axes (i.e., x, y, and z axes) may be determined by gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the electronic device 100 through the reverse motion, so as to realize anti-shake. The gyro sensor 180B may also be used for navigating, somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, electronic device 100 calculates altitude from barometric pressure values measured by barometric pressure sensor 180C, aiding in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The electronic device 100 may detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 180E may detect the magnitude of acceleration of the electronic device 100 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the electronic device 100 is stationary. The electronic equipment gesture recognition method can also be used for recognizing the gesture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device 100 may measure the distance by infrared or laser. In some embodiments, the electronic device 100 may range using the distance sensor 180F to achieve quick focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light outward through the light emitting diode. The electronic device 100 detects infrared reflected light from nearby objects using a photodiode. When sufficient reflected light is detected, it may be determined that there is an object in the vicinity of the electronic device 100. When insufficient reflected light is detected, the electronic device 100 may determine that there is no object in the vicinity of the electronic device 100. The electronic device 100 can detect that the user holds the electronic device 100 close to the ear by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used in holster mode, pocket mode to automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense ambient light level. The electronic device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. Ambient light sensor 180L may also cooperate with proximity light sensor 180G to detect whether electronic device 100 is in a pocket to prevent false touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic device 100 may utilize the collected fingerprint feature to unlock the fingerprint, access the application lock, photograph the fingerprint, answer the incoming call, etc.

The temperature sensor 180J is for detecting temperature. In some embodiments, the electronic device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by temperature sensor 180J exceeds a threshold, electronic device 100 performs a reduction in the performance of a processor located in the vicinity of temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the electronic device 100 heats the battery 142 to avoid the low temperature causing the electronic device 100 to be abnormally shut down. In other embodiments, when the temperature is below a further threshold, the electronic device 100 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.

The touch sensor 180K, also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may parse out a voice signal based on the vibration signal of the vocal part vibration bone piece obtained by the bone conduction sensor 180M, and implement a voice function. The application processor can analyze heart rate information based on the blood pressure beat signals acquired by the bone conduction sensor 180M, so that a heart rate detection function is realized.

The keys 190 include a power-on key, a volume key, etc. The keys 190 may be mechanical keys. Or may be a touch key. The electronic device 100 may receive key inputs, generating key signal inputs related to user settings and function controls of the electronic device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also correspond to different vibration feedback effects by touching different areas of the display screen 194. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the invention, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.

Fig. 2 is a software configuration block diagram of the electronic device 100 according to the embodiment of the present invention.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages.

As shown in fig. 2, the application package may include applications such as phone, mailbox, calendar, camera, gallery, map, navigation, WLAN (Wireless Local Area Network ), bluetooth, music, video, short message, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.

As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.

The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.

The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.

The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android run time (Android Runtime) includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.

The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.

Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The workflow of the electronic device 100 software and hardware is illustrated below in connection with capturing a photo scene.

When touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the input event. Taking the touch operation as a touch click operation, taking a control corresponding to the click operation as an example of a control of a camera application icon, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera driver by calling a kernel layer, and captures a still image or video by the camera 193.

The network model processing method according to the embodiment of the application is described in detail below through specific embodiments. The following embodiments may be combined with each other or implemented independently, and the same or similar concepts or processes may not be described in detail in some embodiments.

In an actual scene, the neural network model needs to be deployed on a server or a terminal after training is completed, and the application of the model is realized through a model reasoning process. In order to facilitate the transmission and conversion of neural network models at the time of model deployment, the trained model is typically converted into an onnx model. For example, in a TTS (Text To Speech) model, convolution and deconvolution generally all adopt one-dimensional data operation, when the TTS model generates an onnx model file, operations such as frequent dimension lifting and dimension lowering exist in the generated onnx model, and the model itself also includes many operator operations, and excessive calculation operations not only can cause the increase of the reasoning delay of the model, but also can increase the occupation of the memory space.

In order to solve the above problems, an embodiment of the present application provides a network model processing scheme, which generates a new operator structure by fusing operator structures of a preset type that can be fused in an original computation graph, reduces the number of operators in the new operator structure, and then reconstructs a simplified new computation graph based on the new operator structure. Therefore, part of operator structures in the model are simplified from the level of the computational graph structure, the computational graph of the model is further simplified, unnecessary computational operations in the model are reduced, access cost in the model reasoning process is reduced, and model reasoning time delay is reduced.

As shown in fig. 3, a schematic view of a scenario architecture of a network model processing according to an embodiment of the present application is taken as an example of a TTS model, where the network model processing architecture mainly includes: a model conversion phase and a model reasoning phase, wherein:

in the model conversion stage, a TTS model calculation diagram is firstly input, and then a fusible diagram structure search is carried out on the TTS model calculation diagram, wherein the fusible diagram structure refers to a calculation structure conforming to a preset type. For the fusible graph structure, conv (convolution) graph fusion and deconvolution graph fusion can be carried out, and finally, a new TTS calculation graph is constructed based on the fused graph structure and the unfused graph structure, and the new TTS calculation graph simplifies the graph structure because of the fusion of the graph structure, reduces the calculation operation amount in the model, further reduces the memory access cost in the model reasoning process and reduces the model reasoning time delay.

In the model reasoning stage, through enabling the TTS model to support Unalized Conv (Unaligned convolution calculation), the end side reasoning process of the subsequent TTS model is carried out, so that invalid calculation operation in small-channel convolution calculation is reduced, the calculation amount in the model reasoning process is further reduced, and the model reasoning time delay is reduced.

In this way, in the deployment of the TTS model, the end-side reasoning of the model is accelerated by using a method combining graph structure optimization and operator optimization, the graph structure optimization mainly comprises graph fusion of Conv and Deconv, and the operator optimization supports calculation of a small channel Unaligned Conv, so that the overall reasoning time delay of the model is reduced.

As shown in fig. 4, an embodiment of the present application provides a network model processing method, which may be executed by the electronic device 100 and may be applied to an application scenario of network model processing shown in fig. 3, so as to simplify a part of operator structures in a model from a calculation map structure level, further simplify a calculation map of the model, reduce memory overhead in a model reasoning process, and reduce model reasoning delay. The method comprises the following steps:

step 401: an original calculation map of the model to be processed is obtained, wherein the original calculation map comprises at least one operator structure.

In this step, the model to be processed may be a neural network model based on a deep learning algorithm, such as a TTS model, an image recognition model, or the like. The computational graph is a computational graph which takes operators in the model as nodes and takes the data flow direction of the computational relationship among the operators as directed line segments, and is used for representing the computational process and the data flow direction in the model to be processed. The operator structure is the data flow relation among operators in the model to be processed. That is, one or more operator structures may be included in the computational graph of the model to be processed, and parallel or serial data flow relationships may exist between the individual operator structures. The corresponding computational graph can be obtained by reading the onnx model file of the model to be processed.

In one embodiment, an operator structure includes a plurality of operators and a computational relationship between the plurality of operators. That is, an operator structure may include a plurality of operators, where the operators are connected by a directed line segment, for characterizing a data flow computation relationship between the operators.

Step 402: and retrieving a target operator structure of a preset type from the original calculation graph.

In this step, the preset type is used to characterize the operator structure type that can perform graph structure fusion, and the operator structure type that can perform graph structure fusion can be found out in advance based on empirical statistical data, and the operator structure type will not affect the final calculation result of the model, and this operator structure type is used as the preset type. In an actual scene, the types of the operator structures in the original calculation graph can be compared with the preset types by searching the original calculation graph, so that the target operator structures which accord with the preset types can be searched, and the number of the target operator structures can be multiple.

In one embodiment, step 402 may specifically include: traversing each operator structure in the original calculation graph, and taking the operator structure with the preset type of calculation relation as a target operator structure.

In this embodiment, the preset type may characterize a computational relationship between specific operators, including specific operators, and data flow relationships between those specific operators. When the target operator structure is searched, each node in the original calculation graph can be traversed, and then each operator structure is traversed, and the operator structure matched with the calculation relation of the preset type is found out to serve as the target operator structure. The matching here means that the target operator structure includes a preset type of computation relationship, that is, includes the specific operator and the data stream relationship thereof.

Step 403: in the original computational graph, fusion operation is carried out on the target operator structure, a new operator structure is generated, and the number of operators in the new operator structure is smaller than that in the target operator structure.

In this step, the fusion operation refers to that the operators in the target operator structure are fused with each other under the premise of not changing the calculation result, so as to achieve the effect of reducing the access times to the memory, and further the number of operators in the generated new operator structure is smaller than that in the target operator structure. In an actual scene, after the calculation is completed, a calculation result of one operator is stored in a corresponding memory, and when the next operator of the operator needs to use the calculation results, data is taken out from the memory to calculate, so that in an operator structure, the more the number of operators, the more memory access times in the model reasoning process, the more memory resources are consumed, and the more memory access times are also the longer the consumed time. Therefore, through fusion operation, the number of operators in the new operator structure is smaller than that of operators in the target operator structure, so that the memory consumption is reduced, the memory access times are reduced, the time delay of model reasoning is reduced, and the model reasoning process is accelerated.

In one embodiment, step 403 may specifically include: extracting a second type of operators in the target operator structure from the original computational graph, wherein the second type of operators are residual operators except the first type of operators in the target operator structure. And fusing parameters of the second type operators into the first type operators, deleting the second type operators, and generating a new operator structure based on the calculation relation between the first type operators.

In this embodiment, the preset type of calculation relationship includes, but is not limited to: the first type of operator and the computational relationship between the first type of operator. The first type of operator is an operator that must be preserved to ensure that the model calculation results are not affected, and cannot be simplified, such as convolution operators and deconvolution operators. When the first type operator is retrieved from the original computational graph, the graph structure and corresponding data flow relationship of the first type operator are preserved. A second type of operator is then extracted from the original computational graph, where the second type of operator is an operator that allows for graph structure fusion simplification, such as pad operator (the function is used to populate the tensor), without guaranteeing that the model computation structure is not affected. And then fusing parameters of the second type operator into the first type operator according to the original calculation rule, so that the calculation process of the second type operator is reserved in the calculation process of the first type operator, the second type operator is deleted from the original calculation image, so that the calculation process of the second type operator is reserved but no longer exists as a separate operator, caching and reading of calculation results of the second type operator are not needed, a new operator structure is generated based on the calculation relation between the first type operator, the number of operators is reduced, memory consumption is further reduced, the memory access times are reduced, the time delay of model reasoning is reduced, and the model reasoning process is accelerated.

In one embodiment, for the fusion of the convolution graph structure, step 403 may specifically include: and fusing parameters of the second type operator into the convolution operator, deleting the second type operator, and generating a new operator structure based on the calculation relation among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator.

In this embodiment, the first type of operator includes, but is not limited to: an dimension-increasing operator, a convolution operator and a dimension-decreasing operator. For a to-be-processed model like a TTS model, one-dimensional data operation is generally adopted for convolution and deconvolution, and when the TTS model is converted into an onnx model file, a large number of operations such as dimension rising and dimension falling exist in the generated onnx model, so that the calculation operation amount in the model can be greatly increased. The statistics can be carried out on the model, a first type operator which is required to be reserved without influencing the calculation result of the model, such as a dimension-increasing operator, a convolution operator and a dimension-reducing operator, can be determined, and the calculation relation among the first type operators can be set as follows: the output of the dimension-increasing operator serves as the input of the convolution operator, and the output of the convolution operator serves as the input of the dimension-decreasing operator. Thereby generating a preset type of calculation relation.

Taking a TTS model as an example, as shown in fig. 5, a comparison schematic diagram of four target operator structures (a), (b), (c), and (d) of the TTS model, which conform to a preset type, is obtained through the search in step 402, where:

The first object operator structure (a) sequentially comprises the following components according to the data flow direction: pad operator, unsqueze (upwiki operator), conv (convolution operator), squeze (downwiki operator), add operator (for adding elements to the collection), and LeakyRelu (operator to activate functions).

The second object operator structure (b) sequentially comprises, according to the data flow direction: pad operator, unque (dimension up operator), conv (convolution operator), sque (dimension down operator), add operator (for adding elements to the set), and Tanh (operator to activate functions).

The third object operator structure (c) sequentially includes, according to the data flow direction: pad operator, unque (dimension up operator), conv (convolution operator), sque (dimension down operator).

The fourth object operator structure (d) sequentially comprises, according to the data flow direction: unqueeze (dimension-increasing operator), conv (convolution operator), squeeze (dimension-decreasing operator), add operator.

As shown in fig. 5, the four target operator structures may have first type operators, that is, unqueeze (dimension-increasing operator), conv (convolution operator), and Squeeze (dimension-decreasing operator), and the calculation relationships between the first type operators conform to the preset data flow direction relationship. The remaining operators are then the second type of operators. Such as the second type of operator in the first target operator structure (a) comprising: pad operator, add operator, and LeakyRelu operator. The second type of operator in the second target operator structure (b) comprises: pad operator, add operator, and Tanh operator. The second type of operator in the third target operator structure (c) comprises: pad operator. The second type of operator in the fourth target operator structure (d) comprises: add operator.

In the process of operator structure fusion, aiming at a target operator structure, parameters of a second type operator can be fused into a convolution operator, the second type operator is deleted, and a new operator structure is generated based on a calculation relation among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator. For example, for the target operator structure (a), parameters of the Pad operator, the Add operator and the inaryrenu operator are added to corresponding Conv (convolution operator) according to the originally corresponding calculation rules, so that the calculation process of the Pad operator, the Add operator and the inaryrenu operator still remains in the Conv (convolution operator), and the Pad operator, the Add operator and the inaryrenu operator are deleted from the target operator structure (a), so that a new operator structure as shown in fig. 6 can be obtained, and the new operator structure only includes: the original calculation relations of the Unsqueeze (dimension-increasing operator), conv (convolution operator), squeeze (dimension-decreasing operator) and the operators are reduced, so that the access and memory costs of the Pad operator, the Add operator and the LeakyRelu operator are reduced, the calculation diagram is simplified, and the time delay of the model in the reasoning process is reduced.

The fusion process of the target operator structure (b), the target operator structure (c) and the target operator structure (d) is similar to that of the target operator structure (a), and the new operator structure shown in fig. 6 can be obtained after the final target operator structure is fused and simplified. And will not be described in detail herein.

In one embodiment, for the fusion of deconvolution graph structures, step 403 may specifically include: comprising the following steps: and fusing parameters of the second type operator into the deconvolution integrating sub, deleting the second type operator, and generating a new operator structure based on the calculation relation among the dimension-increasing operator, the deconvolution operator and the dimension-decreasing operator.

In this embodiment, the first type operator includes: an dimension increasing operator, a deconvolution operator and a dimension decreasing operator. Analogous to the fusion process described above for the convolution graph structure. Taking an onnx model of a TTS model as an example of a model to be processed, statistics can be carried out on the model, first type operators which are required to be reserved and are ensured not to influence a model calculation result, such as a dimension-increasing operator, a deconvolution operator and a dimension-reducing operator, can be determined, and the calculation relation among the first type operators can be set as follows: the output of the dimension-increasing operator is used as the input of the deconvolution operator, and the output of the deconvolution operator is used as the input of the dimension-decreasing operator after a series of processing. Thereby generating a preset type of calculation relation.

Taking a TTS model as an example, as shown in fig. 7, a comparison schematic diagram of two target operator structures (e) and (f) of the TTS model, which conform to a preset type, is obtained through the search in step 402, where:

The object operator structure (e) sequentially comprises the following components according to the data flow direction: unsqueeze (dimension up operator), convTranspose (deconvolution product), slice operator (used to truncate the array), pad operator, add operator, and Squeeze (dimension down operator).

The object operator structure (f) sequentially comprises the following components according to the data flow direction: unsqueeze (dimension up operator), convTranspose (deconvolution product), slice operator, add operator, and Squeeze (dimension down operator).

As shown in fig. 7, the first type of operators, namely, the unque (dimension-increasing operator), the conv-transform (deconvolution product) and the sque (dimension-decreasing operator), exist in the two target operator structures, and the calculation relationship between the first type of operators accords with the preset data flow direction relationship. Then in the target operator structure (e): the Slice operator, the Pad operator and the Add operator are the corresponding second type operators. In the target operator structure (f): the Slice operator and the Add operator are the corresponding second type operators.

In the process of operator structure fusion, aiming at a target operator structure, parameters of a second type operator can be fused into an deconvolution integrating operator, the second type operator is deleted, and a new operator structure is generated based on the calculation relation among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator. For example, for the target operator structure (e), parameters of the Slice operator, the Pad operator and the Add operator are added to corresponding ConvTransposses (deconvolution operators) according to the originally corresponding calculation rules, so that the calculation process of the Slice operator, the Pad operator and the Add operator still remains in the ConvTransposses (deconvolution operators), and the Slice operator, the Pad operator and the Add operator are deleted from the target operator structure (e), so that a new operator structure as shown in FIG. 8 can be obtained, and the new operator structure only comprises: unsqueeze (dimension-increasing operator), convTranspose (deconvolution integrator) and Squeeze (dimension-decreasing operator), and the original calculation relation of the operators, so that the access and storage costs of Slice operators, pad operators and Add operators are reduced, the calculation diagram is simplified, and the time delay of the model in the reasoning process is reduced.

The fusion process of the target operator structure (f) is similar to that of the target operator structure (e), and the new operator structure shown in fig. 8 can be obtained after the final target operator structure is fused and simplified. And will not be described in detail herein.

Step 404: and constructing a new computational graph of the model to be processed based on the new operator structure and the original computational graph.

In this step, the new operator structure in step 403 is recombined with the operator structure that cannot be fused in the original calculation map, so as to construct a new calculation map of the model to be processed, and the new calculation map reduces the number of operators, reduces unnecessary calculation operations in the model, reduces memory overhead in the model reasoning process, and reduces model reasoning delay on the basis of retaining the original calculation result of the model to be processed.

According to the network model processing method, the operator structures of the preset types which can be fused in the original computational graph are fused to generate the new operator structure, the number of operators in the new operator structure is reduced, and then the simplified new computational graph is reconstructed based on the new operator structure. Therefore, part of operator structures in the model are simplified from the level of the computational graph structure, the computational graph of the model is further simplified, unnecessary computational operations in the model are reduced, access cost in the model reasoning process is reduced, and model reasoning time delay is reduced.

As shown in fig. 9, an embodiment of the present application provides a network model processing method, which may be executed by the electronic device 100 and may be applied to an application scenario of network model processing shown in fig. 3, and compared with the embodiment described above, the embodiment further includes a computation graph structure optimization process in a model reasoning process, so as to simplify a part of an operator structure in a model from a computation graph structure and an operator level, and further simplify a computation graph of the model, so as to reduce access cost in the model reasoning process and reduce model reasoning delay. The method comprises the following steps:

step 901: an original calculation map of the model to be processed is obtained, wherein the original calculation map comprises at least one operator structure. See the description of step 401 for details of the above embodiments.

Step 902: and retrieving a target operator structure of a preset type from the original calculation graph. See the description of step 402 for details of the above embodiments.

Step 903: in the original computational graph, fusion operation is carried out on the target operator structure, a new operator structure is generated, and the number of operators in the new operator structure is smaller than that in the target operator structure. See the description of step 403 for details of the above embodiments.

Step 904: and constructing a new computational graph of the model to be processed based on the new operator structure and the original computational graph. See the description of step 404 for details of the above embodiments.

Step 905: and reasoning the model to be processed based on the new calculation graph.

In the step, the new calculation graph reduces the number of operators on the basis of keeping the original calculation result of the model to be processed, reduces unnecessary calculation operation in the model, and infers the model to be processed based on the new calculation graph, so that access cost in the model inference process can be reduced, and model inference time delay is reduced.

In one embodiment, step 905 may specifically include: in the model reasoning process, a first instruction is adopted to execute a first type convolution operator in the new calculation graph, wherein the number of channels of the first type convolution operator is smaller than a preset value, and the first instruction supports unaligned convolution calculation.

In this embodiment, taking the TTS model as an example, if there is convolution calculation with a small number of channels, in the calculation process using the arm neon instruction, channel alignment is performed, that is, a channel padding operation is performed, and if the number of convolutions of the model is small, there is a large number of invalid calculations, and additional overhead is increased.

As shown in fig. 10, for a calculation process supporting aligned convolution, assuming that 8 data can be processed at a time in the convolution calculation process using the arm neon instruction, the number of channels for some convolutions is much smaller than 8, for example, only one channel, and in the calculation process, the arm neon instruction used for calculation causes the convolution calculation with a small number of channels to perform channel alignment, that is, an operation of generating channel padding. For example, in fig. 10, the Pad operator outputs data with parameter (1,4,1,200062) to the Conv operator, assuming that the number of output channels of the Conv operator is 1, and the convolution calculation is performed by using the arm neon instruction, so that the data with parameter (1,8,1,200000) is output, it can be seen that the number of output channels 1 of the convolution is forcedly increased to 8, so that only one channel data is valid, and the other channel data is invalid, which generates a large amount of invalid calculation and increases additional cost. A ChannelResize operation is also generated to change the convolved output data parameters to (1,1,1,200000), i.e. back to output data in the case of one channel. Thus, not only is a great deal of invalid computation increased in the channel alignment process, but also the unnecessary ChannelResize operation is increased, so that the computation cost is very high, and the model reasoning speed is reduced.

Therefore, a preset value can be preset to limit the classification of the convolution operator, the preset value is a threshold value of the number of channels of the convolution operator, the preset value can be set based on the whole data volume of the model to be processed, and the principle is that the convolution operator with the number of channels smaller than the preset value is guaranteed to be more beneficial to the acceleration of the model by adopting a non-aligned convolution calculation mode. And for the first type convolution operator with the channel number smaller than the preset value, executing calculation by adopting a first instruction to complete the model reasoning process.

As shown in fig. 11, in a calculation process supporting non-aligned convolution, the data with the Pad operator output parameter (1,4,1,200062) is given to the Conv operator, and the number of channels of the Conv operator is assumed to be 1, and at this time, a first instruction supporting non-aligned convolution calculation is adopted to perform calculation, so that there is no channel alignment process in the Conv operator calculation process, and therefore, the data with the Conv operator output parameter (1,1,1,200000) is still 1 channel, so that the channel resolution operation is not required, unnecessary invalid calculation is not generated, additional calculation operation is not generated, the cost of calculation resources is reduced, and the model reasoning efficiency is improved.

In one embodiment, step 905 may specifically include: and in the model reasoning process, executing a second type of convolution operator in the new calculation graph by adopting a second instruction, wherein the number of channels of the second type of convolution operator is larger than or equal to a preset value, and the second instruction supports alignment convolution calculation.

In this embodiment, for the second type convolution operator with the number of channels greater than the preset value, since the number of channels of the convolution operator is very large, in order to increase the calculation rate, a calculation mode of aligned convolution may be adopted, so that the data processing amount of single execution calculation may be increased, and further, the model reasoning process may be accelerated.

According to the network model processing method, the convolution operator is classified from the model reasoning process, and different instructions are adopted to execute corresponding calculation processes adaptively, so that the model reasoning process is more reasonable, resource consumption caused by blind channel alignment is avoided, and model reasoning speed and effect are provided.

As shown in fig. 12, an embodiment of the present application provides a network model processing apparatus, which may be applied to the electronic device 100 shown in the foregoing description, and may be applied to an application scenario of network model processing shown in fig. 3, so as to simplify a part of operator structures in a model from a calculation map structure level, and further simplify a calculation map of the model, so as to reduce memory overhead in a model reasoning process, and reduce model reasoning delay. The device comprises: the system comprises an acquisition module, a retrieval module, a fusion module and a construction module, wherein the functions of each module are as follows:

The acquisition module is used for acquiring an original calculation graph of the model to be processed, wherein the original calculation graph comprises at least one operator structure.

And the retrieval module is used for retrieving the target operator structure of the preset type from the original calculation graph.

And the fusion module is used for carrying out fusion operation on the target operator structure in the original calculation graph to generate a new operator structure, wherein the number of operators in the new operator structure is smaller than that of operators in the target operator structure.

In one embodiment, an operator structure includes a plurality of operators and a computational relationship between the plurality of operators.

In an embodiment, the search module is configured to traverse each operator structure in the original computation graph, and take an operator structure with a preset type of computation relationship as a target operator structure.

In one embodiment, the predetermined type of calculation relationship at least includes: the first type of operator and the computational relationship between the first type of operator. And the fusion module is used for extracting a second type of operators in the target operator structure from the original calculation graph, wherein the second type of operators are residual operators except the first type of operators in the target operator structure. And fusing parameters of the second type operators into the first type operators, deleting the second type operators, and generating a new operator structure based on the calculation relation between the first type operators.

In one embodiment, the first type of operator comprises: an dimension-increasing operator, a convolution operator and a dimension-decreasing operator. The fusion module is specifically configured to fuse parameters of the second type operator into the convolution operator, delete the second type operator, and generate a new operator structure based on a calculation relationship among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator.

In one embodiment, the first type of operator comprises: an dimension increasing operator, a deconvolution operator and a dimension decreasing operator. The fusion module is specifically configured to fuse parameters of the second type of operators into the deconvolution integrating operator, delete the second type of operators, and generate a new operator structure based on a calculation relationship among the dimension-increasing operator, the deconvolution operator and the dimension-decreasing operator.

In an embodiment, the inference module is further configured to execute a second type of convolution operator in the new computation graph by using a second instruction in the model inference process, where the number of channels of the second type of convolution operator is greater than or equal to a preset value, and the second instruction supports aligned convolution computation.

It should be noted that, the above device provided by the present application can implement all the method steps implemented by the corresponding method embodiments, and can achieve the same technical effects, and detailed descriptions of the same parts and beneficial effects as those of the method embodiments in this embodiment are omitted.

Embodiments of the present application also provide a computer program product comprising one or more computer programs. When the computer program is loaded and executed on a computer, the flow or functions according to embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL), or wireless (e.g., infrared, wireless, microwave, etc.), or semiconductor medium (e.g., solid state disk, SSD)) or the like.

Embodiments of the present application also provide a computer-readable storage medium storing instructions that, when executed, cause a computer to perform a method as described in any of the above embodiments. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. Computer readable media can include computer storage media and communication media and can include any medium that can transfer a computer program from one place to another. The storage media may be any target media that is accessible by a computer.

As one possible design, the computer-readable medium may include a compact disk read-only memory (CD-ROM), RAM, ROM, EEPROM, or other optical disk storage. The computer readable medium may include disk storage or other disk storage devices. Moreover, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital versatile disc (digital versatile disc, DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), storage media, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims

1. A network model processing method, comprising:

acquiring an original calculation graph of a model to be processed, wherein the original calculation graph comprises at least one operator structure;

retrieving a target operator structure of a preset type from the original calculation graph;

in the original calculation graph, carrying out fusion operation on the target operator structure to generate a new operator structure, wherein the number of operators in the new operator structure is smaller than that in the target operator structure;

And constructing a new computational graph of the model to be processed based on the new operator structure and the original computational graph.

2. The method of claim 1, wherein one of the operator structures comprises a plurality of operators and a computational relationship between the plurality of operators.

3. The method of claim 2, wherein retrieving a target operator structure of a preset type from the original computational graph comprises:

traversing each operator structure in the original calculation graph, and taking the operator structure with the preset type of calculation relation as a target operator structure.

4. The method according to claim 1, wherein the predetermined type of calculation relationship at least includes: a computational relationship between a first type operator and the first type operator; and in the original calculation graph, performing fusion operation on the target operator structure to generate a new operator structure, including:

extracting a second type of operators in the target operator structure from the original computational graph, wherein the second type of operators are residual operators except the first type of operators in the target operator structure;

And fusing parameters of the second type operator into the first type operator, deleting the second type operator, and generating the new operator structure based on the calculation relation between the first type operators.

5. The method of claim 4, wherein the first type of operator comprises: a dimension-increasing operator, a convolution operator and a dimension-decreasing operator; the fusing the parameters of the second type operator into the first type operator, deleting the second type operator, generating the new operator structure based on the calculation relation between the first type operators, including:

and fusing parameters of the second type operator into the convolution operator, deleting the second type operator, and generating the new operator structure based on the calculation relation among the dimension-increasing operator, the convolution operator and the dimension-decreasing operator.

6. The method of claim 4, wherein the first type of operator comprises: a dimension-increasing operator, a deconvolution integrating operator and a dimension-reducing operator; the fusing the parameters of the second type operator into the first type operator, deleting the second type operator, generating the new operator structure based on the calculation relation between the first type operators, including:

And fusing parameters of the second type operator into the deconvolution operator, deleting the second type operator, and generating the new operator structure based on the calculation relation among the dimension-increasing operator, the deconvolution operator and the dimension-decreasing operator.

7. The method as recited in claim 1, further comprising: and reasoning the model to be processed based on the new calculation graph.

8. The method of claim 7, wherein the reasoning the model to be processed based on the new computational graph comprises:

and in the model reasoning process, executing a first type of convolution operator in the new calculation graph by adopting a first instruction, wherein the number of channels of the first type of convolution operator is smaller than a preset value, and the first instruction supports unaligned convolution calculation.

9. The method of claim 8, wherein the reasoning the model to be processed based on the new computational graph comprises:

and in the model reasoning process, executing a second type of convolution operator in the new calculation graph by adopting a second instruction, wherein the number of channels of the second type of convolution operator is larger than or equal to the preset value, and the second instruction supports alignment convolution calculation.

10. An electronic device, comprising: a memory for storing a computer program and a processor for executing the computer program to perform the method of any of claims 1-9.

11. A computer readable storage medium storing instructions that, when executed, cause a computer to perform the method of any one of claims 1-9.

12. A computer program product comprising a computer program which, when run, causes an electronic device to perform the method of any one of claims 1-9.