CN113392954B

CN113392954B - Data processing method and device of terminal network model, terminal and storage medium

Info

Publication number: CN113392954B
Application number: CN202010175997.XA
Authority: CN
Inventors: 刘默翰; 赵磊; 石文元; 俞清华; 隋志成; 周力
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-03-13
Filing date: 2020-03-13
Publication date: 2023-01-24
Anticipated expiration: 2040-03-13
Also published as: CN113392954A; WO2021180201A1

Abstract

The application is applicable to the technical field of artificial intelligence, and provides a data processing method, a device, a terminal and a storage medium of a terminal network model, wherein the method comprises the following steps: acquiring data to be processed; and inputting the data to be processed into a target network model for processing to obtain a processing result, wherein at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is used for preprocessing the input data when the data format of the input data of the corresponding network level is not matched with the quantization precision of the network level. The technical scheme provided by the application realizes processing of data with different precisions in the same target network model, solves the compatibility problem of the neural network with mixed precision, and improves the operation efficiency.

Description

Data processing method and device of terminal network model, terminal and storage medium

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a data processing method and device of a terminal network model, a terminal and a storage medium.

Background

With the continuous development of artificial intelligence technology, neural networks are widely applied to various fields, and in order to deal with different data processing scenes, different levels in the neural networks can be configured with corresponding quantization precision according to scene requirements to generate the neural networks with mixed precision. However, the existing neural network with mixed precision cannot process the compatibility problem caused by different quantization precision of different network levels, thereby reducing the operation efficiency of the neural network.

Disclosure of Invention

The embodiment of the application provides a data processing method and device of a terminal network model, a terminal and a storage medium, and can solve the problem that the existing artificial intelligence technology cannot process the compatibility problem caused by different network level quantization precisions in a neural network with mixed precision, so that the operation efficiency of the neural network is reduced.

In a first aspect, an embodiment of the present application provides a data processing method for a terminal network model, including:

acquiring data to be processed;

and inputting the data to be processed into a target network model for processing to obtain a processing result, wherein at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is used for preprocessing the input data when the data format of the input data of the corresponding network level is not matched with the quantization precision of the network level.

In a possible implementation manner of the first aspect, the inputting the data to be processed into a target network model for processing, and obtaining a processing result includes:

inputting the data to be processed into the target network model, and performing the following processing for each network level in the target network model:

judging whether the data format of input data of the current network level in the target network model is matched with the quantization precision of the network level, wherein the input data comprises the data to be processed or the data of the data to be processed at the corresponding network level;

if not, configuring a preprocessing function for the current network level, and preprocessing input data of the current network level through the preprocessing function to obtain preprocessed data;

outputting an operation result of the current network level based on the network weight corresponding to the quantization precision in the current network level and the preprocessed data;

and after all network levels in the target network model are processed, taking the operation result of the last network level as the processing result of the data to be processed.

In a possible implementation manner of the first aspect, if a data format of input data of a current network level in the target network model does not match quantization precision of the network level, configuring a preprocessing function for the current network level includes:

if the data precision of the data format is greater than the quantization precision, determining the number of data blocks obtained by dividing the input data according to the ratio of the data precision to the quantization precision;

acquiring a low-bit quantization function corresponding to the quantization precision, and determining the execution times of the low-bit quantization function according to the number of the data blocks;

and generating the preprocessing function according to the execution times and the low-bit quantization function.

In a possible implementation manner of the first aspect, if the quantization precision is 1-bit quantization, the low-bit quantization function specifically is:

wherein, the first and the second end of the pipe are connected with each other,

passing the output value of the low bit quantization function for the ith data block; x is the number of _i The original value corresponding to the input data for the ith data block; t is a preset threshold value.

In a possible implementation manner of the first aspect, if the quantization precision is 2-bit quantization, the low-bit quantization function is specifically:

if the data precision of the data format is smaller than the quantization precision, configuring an exclusive-or module between the input data and the learning weight of the target network model;

and taking the output end of the XOR module as the input end of a counting Popcount module, serially connecting the XOR module with the Popcount module, and taking the serially connected XOR module and the Popcount module as the preprocessing function.

In a possible implementation manner of the first aspect, the generating the target network model according to the preprocessing function corresponding to the network hierarchy and the network information includes:

configuring associated initial weights for each network hierarchy according to the quantization precision;

and generating the target network model according to each initial weight and the preprocessing function.

In a second aspect, an embodiment of the present application provides a data processing apparatus of a terminal network model, including:

the data acquisition unit is used for acquiring data to be processed;

and the data processing unit is used for inputting the data to be processed into a target network model for processing to obtain a processing result, wherein at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is used for preprocessing the input data when the data format of the input data of the corresponding network level is not matched with the quantization precision of the network level.

In a third aspect, an embodiment of the present application provides a terminal device, a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements a data processing method of a terminal network model according to any one of the above first aspects when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program is configured to, when executed by a processor, implement a data processing method of a terminal network model according to any one of the above first aspects.

In a fifth aspect, the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the data processing method of the terminal network model in any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

According to the embodiment of the application, before the target network model is generated, the quantization precision corresponding to different network levels is determined by obtaining the network information of the target network model, the preprocessing function for converting the data formats between different precisions is configured based on the quantization precision of the current level and the quantization precision of the previous level, the target network model is generated according to the preprocessing function, and in the actual data processing process, the conversion between different quantization precisions can be realized through the preprocessing function, so that the data with different precisions can be processed in the same target network model, the problem of compatibility of a neural network with mixed precision is solved, and the operation efficiency is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a block diagram of a partial structure of a mobile phone provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a software structure of a mobile phone according to an embodiment of the present application;

fig. 3 is a flowchart illustrating an implementation of a data processing method of a terminal network model according to a first embodiment of the present application;

FIG. 4 is a schematic diagram of a network hierarchy provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a network hierarchy provided by another embodiment of the present application;

FIG. 6 is a schematic diagram of a neural network for quantifying weight values according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a neural network with fixed point quantization provided by an embodiment of the present application;

fig. 8 is a flowchart illustrating a specific implementation of a data processing method S302 of a terminal network model according to a second embodiment of the present application;

FIG. 9 is a schematic diagram illustrating partitioning of input data according to an embodiment of the present application;

FIG. 10 is a schematic diagram illustrating partitioning of input data according to an embodiment of the present application;

fig. 11 is a flowchart illustrating a detailed implementation of a data processing method S802 of a terminal network model according to a fourth embodiment of the present application;

FIG. 12 is a schematic diagram of data conversion for mixed precision provided by an embodiment of the present application;

fig. 13 is a flowchart of a specific implementation of the data processing method for a terminal network model according to the fifth embodiment of the present application before the data to be processed is input into a target network model for processing to obtain a processing result;

fig. 14 is a block diagram of a data processing device of a terminal network model according to an embodiment of the present application;

fig. 15 is a schematic diagram of a terminal device according to another embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless otherwise specifically stated.

The data processing method of the terminal network model provided in the embodiment of the present application may be applied to a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, a super-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and other terminal devices, and may also be applied to a database, a server, and a service response system based on terminal artificial intelligence, and the embodiment of the present application does not set any limit to the specific type of the terminal device.

For example, the terminal device may be a Station (ST) in a WLAN, and may be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA) device, a handheld device with Wireless communication capability, a computing device or other processing device connected to a Wireless modem, a computer, a laptop, a handheld communication device, a handheld computing device, and/or other devices for communicating on a Wireless system, and a next generation communication system, such as a Mobile terminal in a 5G Network or a Mobile terminal in a future evolved Public Land Mobile Network (PLMN) Network, and so on.

By way of example and not limitation, when the terminal device is a wearable device, the wearable device may also be a generic term for intelligently designing daily wearing by applying wearable technology, developing wearable devices, such as glasses, gloves, watches, clothes, shoes, and the like, which are configured with an adaptive learning algorithm. The wearable device is a portable device which is directly worn on the body or integrated into clothes or accessories of a user, is attached to the user body, and is used for recording behavior data of the user in the advancing process and outputting a corresponding processing result according to the behavior data and a preset neural network with mixed precision. The wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction and cloud interaction. The general formula smart machine of wearing includes that the function is complete, the size is big, can not rely on the smart mobile phone to realize complete or partial function, like smart watch or intelligent glasses etc to and only be absorbed in a certain class of application function, need use like the smart mobile phone cooperation with other equipment, like the intelligent watch that all kinds have the display screen, intelligent bracelet etc..

Take the terminal device as a mobile phone as an example. Fig. 1 is a block diagram illustrating a partial structure of a mobile phone according to an embodiment of the present disclosure. Referring to fig. 1, the cellular phone includes: radio Frequency (RF) circuit 110, memory 120, input unit 130, display unit 140, sensor 150, audio circuit 160, near field communication module 170, processor 180, and power supply 190. Those skilled in the art will appreciate that the handset configuration shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following specifically describes each constituent component of the mobile phone with reference to fig. 1:

the RF circuit 110 may be used for receiving and transmitting signals during information transmission and reception or during a call, and in particular, receives downlink information of a base station and then processes the received downlink information to the processor 180; in addition, data for designing uplink is transmitted to the base station. Typically, the RF circuitry includes, but is not limited to, an antenna, at least one Amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 110 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), long Term Evolution (LTE)), e-mail, short Messaging Service (SMS), and the like.

The memory 120 may be configured to store software programs and modules, and the processor 180 executes various functional applications and data processing of the mobile phone by running the software programs and modules stored in the memory 120, for example, storing a target network model in a cache area of the memory 120, outputting a processing result through the target network model according to data generated during a use process of the mobile phone, and identifying an accuracy of the processing result according to a response operation of a user on the processing result, and adjusting a weight in the target network model based on the accuracy. The memory 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 120 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The input unit 130 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone 100. Specifically, the input unit 130 may include a touch panel 131 and other input devices 132. The touch panel 131, also referred to as a touch screen, may collect touch operations of a user on or near the touch panel 131 (e.g., operations of the user on or near the touch panel 131 using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program.

The display unit 140 may be used to display information input by the user or information provided to the user and various menus of the mobile phone, such as outputting an adjusted correction image. The Display unit 140 may include a Display panel 141, and optionally, the Display panel 141 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch panel 131 can cover the display panel 141, and when the touch panel 131 detects a touch operation on or near the touch panel 131, the touch operation is transmitted to the processor 180 to determine the type of the touch event, and then the processor 180 provides a corresponding visual output on the display panel 141 according to the type of the touch event. Although the touch panel 131 and the display panel 141 are shown as two separate components in fig. 1 to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 131 and the display panel 141 may be integrated to implement the input and output functions of the mobile phone.

The handset 100 may also include at least one sensor 150, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 141 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 141 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing gestures of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometers and taps), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The handset 100 may also include a camera 160. Optionally, the position of the camera on the mobile phone 100 may be front-located or rear-located, which is not limited in this embodiment of the application.

Optionally, the mobile phone 100 may include a single camera, a dual camera, or a triple camera, which is not limited in this embodiment.

For example, the cell phone 100 may include three cameras, one being a main camera, one being a wide camera, and one being a tele camera.

Optionally, when the mobile phone 100 includes a plurality of cameras, the plurality of cameras may be all front-mounted, all rear-mounted, or a part of the cameras front-mounted and another part of the cameras rear-mounted, which is not limited in this embodiment of the present application.

The terminal device may receive communication data sent by other devices through the near field communication module 170, for example, the near field communication module 170 is integrated with a bluetooth communication module, establishes communication connection with other mobile phones through the bluetooth communication module, receives device information fed back by other mobile phones, and generates a device information page corresponding to the other mobile phones. Although fig. 1 shows the near field communication module 170, it is understood that it does not belong to the essential constitution of the cellular phone 100, and may be omitted entirely as needed within the scope not changing the essence of the application.

The processor 180 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 120 and calling data stored in the memory 120, thereby integrally monitoring the mobile phone. Alternatively, processor 180 may include one or more processing units; preferably, the processor 180 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 180.

The handset 100 also includes a power supply 190 (e.g., a battery) for powering the various components, which may preferably be logically connected to the processor 180 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.

The handset 100 also includes audio circuitry, a speaker, and a microphone that provides an audio interface between the user and the handset. The audio circuit can transmit the electric signal converted from the received audio data to the loudspeaker, and the electric signal is converted into a sound signal by the loudspeaker to be output; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received by the audio circuit and converted into audio data, which is then processed by the audio data output processor 180 and then transmitted to, for example, another cellular phone via the RF circuit 110, or the audio data is output to the memory 120 for further processing.

Fig. 2 is a schematic diagram of a software structure of the mobile phone 100 according to the embodiment of the present application. Taking the operating system of the mobile phone 100 as an Android system as an example, in some embodiments, the Android system is divided into four layers, which are an application layer, an application Framework (FWK) layer, a system layer, and a hardware abstraction layer, and the layers communicate with each other through a software interface.

As shown in fig. 2, the application layer may be a series of application packages, and the application packages may include short message, calendar, camera, video, navigation, gallery, call, and other applications.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application programs of the application layer. The application framework layer may include some predefined functions, such as functions for receiving events sent by the application framework layer. Specifically, the neural network generated by the embodiment may be deployed in an application framework layer, and generate a programming framework corresponding to the neural network through a corresponding programming language.

As shown in FIG. 2, the application framework layer may include a window manager, a resource manager, and a notification manager, among others.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a brief dwell, and does not require user interaction. Such as a notification manager used to notify download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scrollbar text in a status bar at the top of the system, such as a notification of a running application in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.

The application framework layer may further include:

a viewing system that includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures. According to the method and the device for generating the page data, in the frame layer of the running and application program, the text control can be used for adjusting the text type objects in the target page, the control for displaying the picture can be used for adjusting the background image in the target page, all the adjusted page data are packaged, and the target page is generated.

The phone manager is used to provide the communication functions of the handset 100. Such as management of call status (including on, off, etc.).

The system layer may include a plurality of functional modules. For example: a sensor service module, a physical state identification module, a three-dimensional graphics processing library (such as OpenGL ES), and the like.

The sensor service module is used for monitoring sensor data uploaded by various sensors in a hardware layer and determining the physical state of the mobile phone 100;

the physical state recognition module is used for analyzing and recognizing user gestures, human faces and the like;

the three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.

The system layer may further include:

the surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used still image files, video format playback and recording, and audio, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The hardware abstraction layer is a layer between hardware and software. The hardware abstraction layer may include a display driver, a camera driver, a sensor driver, a microphone driver, and the like, for driving the relevant hardware of the hardware layer, such as the display screen, the camera, the sensor, the microphone, and the like.

In the embodiment of the present application, the execution subject of the flow is a device configured with a neural network. By way of example and not limitation, the device configured with the neural network may specifically be a terminal device, and the terminal device may be a smartphone, a tablet computer, or a notebook computer used by a user. Fig. 3 shows an implementation flowchart of a data processing method of a terminal network model according to a first embodiment of the present application, which is detailed as follows:

in S301, data to be processed is acquired.

In this embodiment, the process of processing the data may include a training process and an actual application process after the training process is completed. Therefore, the data to be processed may be user data that is configured with training data in advance or collected during the use of the terminal device.

In this embodiment, the terminal device may obtain the data to be processed from the database, or may obtain the data to be processed in various manners, such as by using the operation parameters of the sensor or the acquisition module during the operation of the terminal.

In S302, the data to be processed is input into a target network model for processing, and a processing result is obtained, where at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is used to preprocess the input data of a corresponding network level when a data format of the input data does not match with quantization accuracy of the network level.

The terminal device may generate the target network model in advance before processing the data to be processed. In this case, the terminal device may acquire network information of a target network model to be generated; the network information includes a quantization accuracy for each network level within the target network model.

In this embodiment, the terminal device that needs to configure the target network model may be a server, and under this condition, the server has a relatively strong operation capability and a relatively strong data read-write I/O capability, and the quantization precision of each network level of the target network model may be relatively large, that is, the down-sampling ratio is relatively low, for example, 8-bit (bit) sampling and 16-bit sampling are used for equivalent precision, and certainly, a part of levels may also use full precision; the terminal device may also be an edge device, for example, a mobile terminal used by a user, for example, a mobile phone used by the user, a wearable device, and the like, and since the cost performance and the portability of the device are considered, the device size of the edge device is small, and thus the computation capability of the related data processing module is much lower than that of a server, in this case, the quantization precision adopted by a target network model in the edge device is low, and a part of hierarchies may adopt ultra-low bit quantization, for example, adopt hierarchy models corresponding to 1-bit quantization and 2-bit quantization.

Compared with a low-bit quantization mode, the high-bit quantization mode has the advantages of strong general capability, simple algorithm, simple training and the like, because the information content expressed by each input data and the weight value in the current level is larger, the amount of information which can be extracted in the training process and the actual use process is larger, the training algorithm used in the training process is simpler and strong in universality and can be suitable for a plurality of different application scenes, but simultaneously, because the quantization precision is higher, the storage space required to be distributed for each data is larger, the occupancy rate on a memory ROM is larger, the ROM profit is smaller, on the other hand, when the calculation operation of high quantization precision is carried out, the time consumption of a computer for processing the calculation between high-precision data is longer, for example, the time consumption for calculating the product between two 64-system numbers or the convolution between two 16-system-based matrixes is longer, and the calculation rate is lower and the occupied calculation resources are more. However, since the edge device has limited storage space and computation capability, it is difficult to support a high-precision quantized target network model.

Compared with the high bit quantization mode, the low bit quantization mode has the advantages of high operation speed and low occupancy rate of the memory ROM, in the low bit quantization mode, the terminal device converts each data into the bit mode, for example, if the low bit quantization mode is 2bit quantization, the value of one data is "8", the data is converted into [1000], when performing subsequent calculation, logical operation such as sum or exclusive or can be performed by the mode of [1000] and the weight, and the time consumption of the computer in performing the logical operation is far lower than that of performing the mathematical operation, thereby improving the operation efficiency. Moreover, the data amount of the low-bit quantized data is smaller than that of the high-bit quantized data, so that the occupancy rate of the ROM space is greatly reduced, and the ROM has a large benefit. However, just because the amount of data per data is small, the amount of information contained is relatively small, and therefore, in order to overcome the problem of the lack of the amount of information, the algorithm adopted is complex and the training process is long.

Different quantization precisions have different advantages and disadvantages, so in order to improve the applicability of the neural network to an actual scene, the operation pressure and the data storage pressure of the terminal equipment can be reduced as much as possible, the corresponding quantization precisions can be configured for different network levels according to the data processing target corresponding to each network level, and the finally generated target network model is specifically a neural network with mixed precision. For example, the target network model is a neural network identified by an abnormal account, and may identify behavior characteristics of an abnormal user according to historical behaviors of a plurality of marked historical users, configure a corresponding neural network based on the behavior characteristics, and determine whether the user is an abnormal user by importing behavior data of the user to be identified. In this case, a certain network level needs to extract the behavior features of the user behavior, and since the number of the features of the behavior features of the user and the number of the types of each feature are large, that is, the amount of information required to be expressed by the behavior features is large, the network level can adopt high-bit quantization so as to match the network level with the data processing task required to be executed by the network level; the other network level is used for comparing the similarity between the user type sequence of the user and the abnormal type sequence, the output result is an abnormal user or a legal user, at the moment, the similarity between the two sequences is only needed to be compared, the information quantity of the output result is relatively small, and at the moment, the network level can adopt low bit quantization.

In this embodiment, before a user needs to create a target network model, a data processing target of each network layer may be determined according to an application scenario of the target network model, and corresponding quantization precision is configured for each network layer based on the data processing target, so as to generate network information about the target network model according to the quantization precision of each network layer.

In a possible implementation, the different quantization resolutions may be configured with corresponding hierarchical models that are specifically used to define the operation operations performed by the network hierarchy, such as pooling dimensionality reduction, matrix convolution, vector multiplication, or logical operations, for example. After the quantization precision of the network hierarchy is determined, the terminal device may select one hierarchy model used in the network hierarchy from the hierarchy models associated with the quantization precision, and generate the network information according to the hierarchy model and the quantization precision of each network hierarchy.

It should be noted that the network information may be used to construct a general framework of the target network model, such as a connection order between each network level, an input-output relationship, and the like, but the weight values and the operation factors in each network level may be set as default values, that is, initial values preconfigured before training, or the trained weight values and operation factors may be downloaded from the cloud server.

In a possible implementation manner, the cloud server may train a full-precision network corresponding to the target network model through a large amount of historical data, and adjust each weight value and operation factor in the full-precision network to obtain the trained full-precision network. In this case, since quantization accuracies of respective network levels are different between the full-precision network and the neural network of the hybrid accuracy to be created, the weight value and the operation factor of the full-precision network stored in the cloud server cannot be directly imported into the target network model to be generated at present, and the terminal device can convert the weight value and the operation factor into the quantization weight and the quantization factor corresponding to the quantization accuracy through the quantization accuracies of the respective network levels, import the quantization weight and the quantization factor into the level model corresponding to the network level, and generate network information according to the level model and the quantization accuracy of the respective network levels.

In this embodiment, the target network model may be a neural network constructed based on a convolutional neural network, or may be a neural network constructed based on different network architectures such as a cyclic neural network and a long-short term neural network, where a network framework of the target network model is not limited, and when the neural network includes two or more layers of neural networks, the method provided in this embodiment is used to construct a neural network with mixed precision.

In this embodiment, each network layer may use the output data of the previous layer as the input data of the current network layer, that is, there is a certain cascade relationship between different network layers. In this case, the terminal device may recognize the quantization accuracy of the previous network hierarchy as the quantization accuracy of the output data, and determine the data format of the input data of the current network hierarchy based on the quantization accuracy of the output data. Particularly, if the current network hierarchy is the first network hierarchy of the target network model, that is, there is no previous cascaded network hierarchy, in this case, the terminal device may use a data format of target data of the required data as a data format of input data of the network hierarchy, for example, if the target data is image data, the corresponding data format is an image format; if the behavior information of the user of the target data is stored through a Chinese text, the data format of the target data can be a text format; if the target data is a numerical value calculated by a function, the format of the target data may be an integer number, specifically, the integer number may be in different data formats such as int8, uint8, int16, uint16, int32, uint32, and the like according to the number of bytes occupied and the value range, and if the calculated numerical value is a floating point number, the target data may also be in different data formats such as float16, float32, and the like.

In a possible implementation manner, if the data format of the input data of the network level matches the quantization precision of the network level, that is, the input data of the network level is consistent with the data format used by the operation factor and the weight value of the current network level, the operation can be directly performed, in this case, a preprocessing function does not need to be configured for the network level.

For example, if the quantization precision of the network layer is 8 bits, and the data format of the input data is also 8 bits, the corresponding 8-bit operation can be performed on the input data directly through the operation factor and the weight value in the network layer. Similarly, if the data format of the input data is Nbit, the quantization accuracy of the network level is also Nbit, and the calculation may be performed by referring to the above procedure.

Illustratively, if the quantization precision of the network level is 8 bits, and the data format of the input data is 4 bits, the data format of the input data is identified to match the quantization precision of the network level, and since the integer number of 4 bits and the integer number of 8 bits are operated in the neural network, the input data is converted into a corresponding numerical value, mathematical operations such as multiplication, division, addition and subtraction of the numerical value are performed, rather than logical operations, and the two quantization precisions are only the difference between the value ranges, and format conversion is not performed through a preprocessing function. Similarly, if the quantization precision of the network layer is 4bit and the data format of the input data is 4bit, it can be identified that the data format of the input data matches the quantization precision of the network layer.

By way of example, and not limitation, fig. 4 illustrates a network-level schematic provided by an embodiment of the present application. Referring to fig. 4, the data format of the network-level input data X may be 4-bit or 8-bit quantized input data, and the two quantized data may be obtained by quantizingThe weighting values are stored in a data format of unit8 (of course, as described above, the data format may also be int8, int16, or other high-bit quantization formats), the quantization precision of the data format of the weighting values W and B in the network layer may be 4 bits or 8 bits, and the quantized weighting values may also be stored in a format of uint 8. The convolution layer can include two operation modules, namely a product operation and a superposition operation, wherein two data are used as data with unit8 format for explanation, and the data size can be in [0,2 ] when the product operation is carried out on the data of two units 8 ¹⁶ -1]Insofar, storing the result of the multiplication by the agent 8 necessarily leads to a data out-of-range situation and, on the basis of the multiplication, a numerical superposition with results exceeding 2 ¹⁶ Therefore, in the whole operation process, data can be stored through int32, and finally, an activation function is connected in series to convert the intermediate operation result stored in the int32 format into an output result stored in the agent 8, so that the output result of the network level and the quantization precision of the network level are kept consistent. In the whole process, multiplication and addition operation can be carried out between input data and the weighted values without format conversion, so that the increase of the operation amount caused by format conversion is reduced, the whole calculation process is in the high-precision operation process, and inverse quantization operation is not required, so that the I/O pressure and the data operation amount are further reduced.

Illustratively, if the quantization precision of the network level is 1bit and the data format of the input data is 2bit, the data format of the input data is identified to match the quantization precision of the network level, and since the 1bit data and the 2bit data are operated in the neural network, not the numerical calculation, but the logical operation, i.e. the sum operation, or operation, xor operation and non-operation of the 1bit quantized data and the 2bit data, is performed, i.e. the bit operation is performed, but not the numerical operation. In this case, the 1-bit data and the 2-bit data are collectively converted into 2-bit data by performing an operation therebetween, and a bit operation is performed based on the 2-bit data. Specifically, the conversion process may be: 00 can be represented by-1, 01 can be represented by 1, and 11 can be represented by 0. For 2-bit data, 2-bit data can be identified, wherein one of the data stores a value (A), the other stores an identifier (B) of 0, the lower bit is the same as 1bit, the upper bit is used for identifying, and if the lower bit is 1, the whole number is represented as 0 no matter whether the lower bit is 1 or 0. The calculation of A adopts XNOR operation, and the operation process is the same as that of the 1bit data format. AND in B, the number of 0 is marked by 1, AND an AND operation module is adopted. If B is marked with 0 to indicate 1, an OR operation module may be required, and correspondingly, the number of 0 s needs to be counted in Popcount. Similarly, if the quantization precision of the network layer is 2 bits, and the data format of the input data is 1bit, it can be identified that the data format of the input data matches the quantization precision of the network layer.

By way of example, and not limitation, fig. 5 illustrates a network-level schematic provided by another embodiment of the present application. Referring to fig. 5, the data format of the input data X' of the network hierarchy is input data obtained by quantizing 1bit OR 2bit, the weight value W in the network hierarchy may also adopt quantization precision of 1bit OR 2bit, the network hierarchy includes an xor gate XNOR module AND an operation module such as a module for calculating Popcount, AND it should be emphasized that in other embodiments, the network hierarchy may further include an AND operation module OR an OR operation module. The Popcount module may be configured to count the number of "1" included in the input data. The function of the XNOR module in the bit operation may be equivalent to the product operation in the numerical operation, and the function of the Popcount module in the bit operation may also be equivalent to the sum operation in the numerical operation, so that the high-bit operation effect may be realized in the network level of the low bit, only the adopted functions are different, and the algorithm adopted in the training learning and operation process needs to be adjusted according to the quantization precision. Since the data obtained by the operation of the Popcount module is inevitably a numerical value, that is, the data cannot be expressed in a bit format, for example, the output data of XNOR is [01101101], the output data after passing through the Popcount module is 5, but cannot be expressed in a format of [00000101], and therefore, in order to keep the output data to be also output in a bit format, a low bit activation function may be connected in series behind the Popcount module, and the output result may be directly output in a representation format corresponding to a low bit.

In a possible implementation manner, in order to reduce the calculation amount of the target network model, the low-bit activation function may be a threshold function f (x) _i ) Wherein the threshold function may be f (x) _i )：

f(x _i )＝α·Sign(z _i )

Wherein, f (x) _i ) The conversion result of the ith input data for the threshold function; z is a radical of formula _i The ith output data is the ith output data after being operated by the XNOR module and the Popcount module; alpha is a preset adjusting parameter. If there is a normalized BN operation within the network hierarchy, z is imported into the threshold function as described above _i May be normalized data, i.e.

Wherein BN (z) _i ) Is a Batch normalization function, namely Batch normalization.

In this embodiment, if it is detected that the input data of the network layer does not match the quantization precision of the network layer, it indicates that the input data needs to be converted, and then the input data can be directly subjected to corresponding operations with the operators and the weight values in the network layer, so as to be compatible with the operation difference between two different quantization precision layers. For example, if the output data of the previous network layer is high bit quantization data, a numerical operation is performed, and the output is an operation result of the numerical operation. However, the quantization precision of the current network layer is a low bit quantization precision, and the executed logic operation, and the numerical result obtained by the numerical operation cannot be directly subjected to the logic operation, the calculated numerical value needs to be converted into a data format capable of being subjected to the logic operation, so that a preprocessing function for converting the high bit quantization precision into the low bit quantization precision needs to be configured.

In contrast, if the output data of the previous network hierarchy is low-bit quantized data, that is, logical operation is performed, for example, and operation, or operation and operation, and the number of corresponding characters is counted by the popcount function, and integer data in a preset format is output, although the output data is still an integer, the logical operation is performed, numerical operation involving no number is not involved, and the operation efficiency is high, whereas the current network hierarchy quantization accuracy is high-bit quantization accuracy, the numerical operation needs to be performed, and the numerical operation cannot be performed based on the operation result expressed by bits, the data identified by one bit needs to be converted into a corresponding numerical value, and therefore, a preprocessing function for converting high-bit quantization into low-bit quantization needs to be configured.

In one possible implementation manner, if the quantization precision of the data format of the input data is higher than the quantization precision of the network layer, the quantization function corresponding to the quantization precision of the network layer may be used as the preprocessing function, where the quantization function may be:

the quantization value is output after the data x is processed by the quantization function; x is the value range of input data; k is the quantization precision of the network level, and xi is a preset offset. It should be noted that, the above-mentioned min (X) is specifically referred to as offset compensation or offset, and in some scenarios, if there is no 0-point offset, no correction is needed, that is, min (X) is 0, in this case, the above-mentioned equation may be modified as follows:

for example, in the process of quantizing the weight, each weight value is already symmetrical to a point 0, and in this case, when the weight is stored by using an int8 data structure, min (X) may be set to zero, and the original point 0 is directly retained.

Optionally, if the quantization precision of the network layer is specifically 2-bit quantization, the preprocessing function may be converted into:

where α is the scaling. It should be noted that, since the data loss amount of the quantization operation is large, the bits processed by the preprocessing function cannot be restored by the inverse quantization formula to obtain the data before quantization.

Optionally, if the quantization precision of the network layer is 1bit quantization, the preprocessing function may be converted into:

In a possible implementation manner, if the quantization precision of the data format of the input data is lower than the quantization precision of the network level, a corresponding inverse quantization function may be configured according to the quantization precision of the network level and the quantization precision of the data format, where the inverse quantization function may be:

note that, the dequantization function is identical to the quantization function, and if the zero point offset is considered in the dequantization function, the zero point may be restored, that is, a correction amount of min (X) may be added in the dequantization process; on the contrary, if the original parameter is already aligned with 0 point in the quantization process, the function may omit the correction amount of min (X), i.e., min (X) =0.

In this embodiment, after determining the preprocessing function corresponding to the network hierarchy in which the data format of the input data does not match the quantization precision, the terminal device may construct the target network model according to the preprocessing function and the hierarchy model of the network hierarchy in which other quantization precisions in the network information match the data format of the input data. Specifically, the preprocessing function may be deployed in a data transmission path between the input end of the network hierarchy and the hierarchy model, that is, after format conversion is performed on data input by the input end through the preprocessing function, corresponding operation is performed through the hierarchy model in the network hierarchy, so as to obtain an operation result corresponding to the network hierarchy.

It should be noted that, corresponding operation factors may be configured in the preprocessing function, and the terminal device may adjust the operation factors in the preprocessing function in the training and learning process, so that the preprocessing function is matched with the operation task corresponding to the network level. For example, for the 2-bit quantization function, the scaling and the decision threshold in the Sign function may be adjusted, and the pre-processing function is matched with the network level by adjusting the operation factor.

In the existing artificial intelligence technology, the weighted values in the neural network can be quantized, but the input data is not quantized, but in the operation process of the implementation manner, quantization adjustment operation needs to be executed for each network level, and in the actual operation process, the operation is performed with the same quantization precision, for example, the output result is calculated through numerical operation or the output result is calculated through logic operation, so that the size of the storage space occupied by the neural network in the terminal equipment is actually changed, the operation efficiency is not improved, and the operation scene of mixed precision is not supported, and the construction flexibility of the neural network is limited.

As an example, fig. 6 shows a schematic diagram of a neural network for quantifying weight values according to an embodiment of the present application. As can be seen from fig. 6, the weighted values in the neural network can be quantized and stored with the corresponding quantization precision, and during actual calculation, the quantized weighted values need to be converted into full-precision numerical values by an inverse quantization algorithm, and then the full-precision numerical values are subsequently calculated with the input data, and the whole calculation process is still based on full-precision calculation, so that the calculation amount is large, the calculation effect of the whole neural network is not improved, and the size of the storage space occupied by the target network model is only reduced.

As an example, fig. 7 shows a schematic diagram of a neural network for performing fixed point quantization according to an embodiment of the present application. Referring to fig. 7, it can be seen that the weight values in the neural network and the input data are quantized with corresponding quantization precision. Taking a general-purpose CPU as an example, the CPU may be in an x86 and ARM framework, and the minimum data structure may be 8 bits, for example, a prescription 8 or int8 data format, so that all data structures lower than 8 bits need to be converted into 8-bit data structures before calculation. The dominant operator output maintains the int32 data structure throughout the reasoning process. The value range of the fluid 8 data structure is [0,2 ] ⁸ -1]Convolution is a multiply-add operation, and the 2 uint8 data multiplication value ranges are [0,2 ] ¹⁶ -1]The value range after the re-addition may exceed 2 ¹⁶ . Meanwhile, the true weight mean is 0, and for the int8 data format, the weight needs to be calculated after shifting according to 0 bits before calculation, while for the int8 data format, since the weight 0 point is symmetric in most cases, the shift is not necessarily needed. Therefore, the int32 data structure (value range [ -2 ] is maintained throughout the computation ¹⁶ ,2 ¹⁶ -1]). It should be noted that the data format of the weight value and the input data may adopt the same data structure, or may adopt two data structures with the same number of bits for calculation, for example, the data format of the weight value is int8 data format, and the data format of the input data may adopt agent 8 data format.

In this embodiment, after the terminal device generates the target network model, the operation factor and the weight value in the target network model may be adjusted through the preset training data or the collected actual data, so that the operation result is converged. In the whole training and operation process, due to the configuration of the preprocessing function, when data are converted in different quantization precisions in the training process of the target network model, the input data can be matched with the operation precision of the current network through the preprocessing function, so that the compatibility problem of the neural network with mixed precision is solved, the advantages of high operation efficiency of low-precision quantization levels can be kept, the training process with high quantization precision can be used, the convergence is fast, the edge equipment can be adapted to the corresponding target network model, and the operation amount of the terminal equipment and the occupancy rate of a storage space are reduced.

As can be seen from the above, in the data processing method of the terminal network model provided in the embodiment of the present application, before the target network model is generated, the network information of the target network model is obtained, the quantization precisions corresponding to different network levels are determined, the preprocessing function for converting data formats between different precisions is configured based on the quantization precision of the current level and the quantization precision of the previous level, and the target network model is generated according to the preprocessing function, so that data with different precisions are processed in the same target network model, the problem of compatibility of a neural network with mixed precision is solved, and the operation efficiency is improved.

Fig. 8 shows a flowchart of a specific implementation of a data processing method S302 of a terminal network model according to a second embodiment of the present application. Referring to fig. 8, with respect to the embodiment described in fig. 3, in the data processing method of a terminal network model provided in this embodiment, S302 includes: s801 to S804 are specifically described as follows:

further, the inputting the data to be processed into a target network model for processing, and obtaining a processing result includes: inputting the data to be processed into the target network model, and performing the following processing for each network level in the target network model:

in S801, it is determined whether a data format of input data of a current network hierarchy in the target network model matches a quantization precision of the network hierarchy, where the input data includes the data to be processed or data of the data to be processed at a corresponding network hierarchy.

In this embodiment, the terminal device may import the target data into the target network model, and perform a responsive operation on the target data through each network hierarchy in the target neural network, thereby outputting a processing result corresponding to the target data. It should be noted that, a cascade relationship exists between different network hierarchies, that is, input data of a certain network hierarchy is output data of a network hierarchy of an upper cascade, output data of the network hierarchy is input data of a network hierarchy of a lower cascade, and so on, where input data of a first-layer network of a target network hierarchy is to-be-processed data.

In S802, if the network levels are not matched, a preprocessing function is configured for the current network level, and the input data of the current network level is preprocessed by the preprocessing function to obtain preprocessed data.

In this embodiment, if it is detected that the data format of the input data of the current network hierarchy in the target network model does not match the quantization precision of the network hierarchy, a preprocessing function needs to be configured for the current network hierarchy, and the preprocessing data corresponding to the data to be processed is output through the preprocessing function.

For example, if the preprocessing function is a low bit quantization function, the input data represented by high bit quantization, that is, the data stored in the form of a numerical value, may be converted into preprocessing data represented by low bit, for example, the preprocessing data represented by bits, and the logic operation may be performed.

For example, if the preprocessing function is a high bit quantization function, the input data expressed by low bit quantization, that is, the data stored in the form of bits, may be converted into the preprocessing data expressed by high bit quantization, for example, the preprocessing data expressed by numerical values, and the numerical operation may be performed.

In S803, an operation result of the current network hierarchy is output based on the network weight corresponding to the quantization precision in the current network hierarchy and the preprocessed data.

In this embodiment, after the preprocessing operation is performed on the input data to obtain the preprocessed data, corresponding operation operations, such as convolution, pooling, multiplication, superposition, and the like, may be performed according to the preprocessed data and the network weight of the current network hierarchy to obtain a corresponding operation result, and the operation result is used as the input data of the next cascaded network hierarchy, and so on until the operation of the last network hierarchy is completed.

In S804, after all network levels in the target network model are processed, the operation result of the last network level is used as the processing result of the data to be processed.

In this embodiment, the terminal device may use an operation result of the last network layer as a processing result corresponding to the target data. If the operation is a training learning process, the processing result can be compared with the standard result, whether the prediction of the operation is accurate or not is identified, the loss value of the target network model is calculated, and the network weight is continuously adjusted; if the operation is an actual application process, the processing result may be output, for example, the result is displayed or corresponding information is generated through a display of the terminal device.

In the embodiment of the application, after the target network model is generated, the target data can be imported into the target network model to calculate the processing result of the target data, and the neural network is a neural network with mixed precision, so that different use scenes can be dealt with due to different quantization precision among different network layers, the resource occupancy rate of the neural network can be maximally reduced, and the operation speed is considered.

Fig. 9 is a flowchart illustrating a specific implementation of a data processing method S802 of a terminal network model according to a third embodiment of the present application. Referring to fig. 9, with respect to the embodiment described in fig. 8, in the data processing method of a terminal network model provided in this embodiment, S802 includes: s901 to S903, are specifically detailed as follows:

further, if the data format of the input data of the current network level in the target network model does not match the quantization precision of the network level, a preprocessing function is configured for the current network level, including:

in S901, if the data precision of the data format is greater than the quantization precision, the number of data blocks obtained by dividing the input data is determined according to the ratio between the data precision and the quantization precision.

In this embodiment, the case where the data format of the input data in the network hierarchy does not match the quantization precision in the network hierarchy is specifically of two types, namely, the data precision of the data format is greater than the quantization precision in the network hierarchy, and the data precision of the data format is less than the quantization precision in the network hierarchy.

The data format may specifically have a data precision greater than the quantization precision of the network layer: the data format of the input data of the network level is 8 bits, the quantization precision of the network level is 2 bits or 1bit, or the data format of the input data of the network level is 4 bits, and the quantization precision of the network level is 2 bits or 1bit, wherein when the data format of the input data of the network level is 8 bits or 4 bits, the operation process is based on numerical value operation, and when the quantization precision of the network level is 2 bits or 1bit, the operation process is based on bits operation, the operation logics of the two quantization precisions are inconsistent, at this moment, the two precisions are identified to be unmatched, and the data precision of the data format is greater than the quantization precision of the network level, at this moment, the operation of S901 is executed.

In this embodiment, when the terminal device detects that the data accuracy of the output data of the previous network hierarchy is greater than the quantization accuracy of the current network hierarchy, it needs to perform quantization operation on the output data of the previous network hierarchy with high accuracy, that is, the data of the current network hierarchy. For example, the data format of the input data is int8 data, and the quantization precision of the network level is 1bit precision, the input data can be divided into 8 1bit numbers, and if the quantization precision of the network level is 2bit or 4bit, the input data can be divided into 42 bit numbers and 2 4bit numbers; similarly, if the data format of the input data is int16, 16 1-bit numbers, 8 2-bit numbers, 4-bit numbers, and 2 8-bit numbers can be used for storage, and so on.

In S902, a low bit quantization function corresponding to the quantization precision is acquired, and the number of times of execution of the low bit quantization function is determined according to the number of data blocks.

In this embodiment, the terminal device may obtain a corresponding low-bit quantization function according to the quantization precision. And determining the execution times of the low bit quantization function according to the number of the databases. For example, if the number of the divided data blocks is 4, four inputs are required for one input data, and each database is respectively imported into the low bit quantization function and converted into a data format matched with the current network hierarchy, that is, the low bit quantization function needs to perform 4 operations, that is, the number of the operations is 4.

Illustratively, fig. 10 shows a schematic diagram of dividing input data according to an embodiment of the present application. Referring to fig. 10, the data format of the input data is int8 data, that is, the data precision is 8 bits, and the quantization precision of the network level is divided into 1bit, 2bit and 4bit, and according to the ratio between the data precision and the quantization precision, the data block number ratio can be determined to be 8, 4 and 2. Wherein, the 8int data is [01110001], so after being divided into 8 data blocks with 1bit, the ratio is 0,1, 0,1, the output result obtained by a low bit quantization function of 1bit is [ -1, -1, -1,1]. Similarly, for 2 bits the output results are [1, -1,0,1], while for 4 bits the output results are [7,1].

In S903, the preprocessing function is generated according to the execution times and the low bit quantization function.

In this embodiment, the terminal device may generate the preprocessing function according to the execution times and the corresponding low-bit quantization function, so that after the data output by the previous network hierarchy is received, the terminal device can firstly divide the target data, and after each data block is converted by the low-bit quantization function, each converted data block is encapsulated, and input data matching the quantization precision of the current network hierarchy is generated. For example, an 8int data value is 113, the corresponding bit is represented as [01110001], the data is divided into 2 4-bit data blocks, and the data is [ [0111] and [0001], the ratio of the data generated by the 4-bit quantization function is 7 and 1, so that [01110001] can be converted into [7,1] for storage, and [7,1] is used as input data for subsequent operations.

Specifically, the terminal device may convert each data block into an input mode corresponding to a low bit quantization function before performing an operation on the data block corresponding to the input data by the low bit quantization function, for example, if the low bit quantization function performs a conversion operation based on bits, the data block is represented in bits, and if the low bit quantization function performs a conversion operation based on numerical values, the data block is represented in numerical values.

In a possible implementation manner, if the data precision of the data format of the network-level input data is greater than the quantization precision of the network level, for example, the data format of the input data is input data quantized by 4 bits or 8 bits, even input data with 16 bits and full precision, and the quantization precision is 1bit quantization, the low bit quantization function is specifically:

wherein the content of the first and second substances,

In a possible implementation manner, if the data precision of the data format of the network-level input data is greater than the quantization precision of the network level, for example, the data format of the input data is input data quantized by 4 bits or 8 bits, even input data with 16 bits and full precision, and the quantization precision is 2 bits quantization, the low bit quantization function is specifically:

In this embodiment, the threshold T may be dynamically adjusted in a training process according to a neural network, so that the low-bit quantization function can be matched with a data processing target of the network level. It should be noted that, when the target data is divided into data blocks, the number of the divided data blocks may be the same as the ratio between the data precision and the quantization precision, for example, the data format of the target data is 8-bit quantization, and the network-level quantization precision is 2-bit, then the ratio between the two parameters is 4, then the terminal device may divide the target data into 4 data blocks, in this case, the preset threshold may be a default value, for example, the value of T may be 0 when 2-bit quantization is performed, since the number of the desirable values of 2-bit quantization is 3, that is, 1,0 and-1, the T threshold has an option of only 0; similarly, the number of the data blocks may be smaller than the ratio, for example, the data format of the target data is 8-bit quantization, the quantization precision of the network level is 2-bit, the ratio between the two parameters is 4, the number of the divided data blocks may be configured to be 2, the terminal device may divide the target data into 2 data blocks, each data block may be represented by 4-bit data, a value range of one data block may be 0 to 15, and in this case, the threshold T may select a corresponding parameter value according to an operation condition of the actual neural network.

In the embodiment of the application, when it is detected that the data format of the input data of the network hierarchy is greater than the quantization precision of the network hierarchy, the input data is divided into a plurality of data blocks, each data block is led into a preset low-bit quantization function, and multiple quantization processing is performed, so that a preprocessing function for converting high-bit data into low-bit data is obtained, and the compatibility of a target network model is improved.

Fig. 11 shows a flowchart of a specific implementation of a data processing method S802 of a terminal network model according to a fourth embodiment of the present application. Referring to fig. 11, with respect to the embodiment described in fig. 8, in the data processing method of a terminal network model provided in this embodiment, S802 includes: s1101 to S1102 are specifically described below:

in S1101, if the data accuracy of the data format is smaller than the quantization accuracy, an exclusive or module is arranged between the input data and the learning weight of the target network model.

In this embodiment, if it is detected that the data precision of the input data of the network layer belongs to low bit quantization and the quantization precision of the network layer is high bit quantization, it is identified that the quantization precision of the network layer does not match the data format of the input data. The data format may specifically have a data precision smaller than the quantization precision of the network layer: the data format of the input data of the network level is 1bit, the quantization precision of the network level is 4bit or 8bit, or the data format of the input data of the network level is 2bit, the quantization precision of the network level is 4bit or 8bit, wherein, when the quantization precision of the network level is 8bit or 4bit, the operation process is based on numerical value operation, and when the data format of the input data of the network level is 2bit or 1bit, the operation process is based on bit operation, the operation logics of the two quantization precisions are inconsistent, at this moment, the two precisions are identified to be unmatched, and the data precision of the data format is smaller than the quantization precision of the network level, at this moment, the operation of S1101 is executed.

In this embodiment, the terminal device is not suitable for inverse quantization calculation in the process of converting the low-bit input data to be matched with the high-bit network-level weight factor, and the inverse quantization calculation involves multiplication of a numerical value, so that the time consumption of a computer in the process of data processing multiplication is long, that is, the inverse quantization can reduce the training time and the operation time of the neural network, and can greatly increase the operation pressure of the terminal device, and especially under the condition that the operation capability of the edge device is limited, the occupation of operation resources can be greatly increased by performing inverse quantization operation, and even the normal operation of the terminal device is affected. In order to solve the above problem, the terminal equipment still uses logical operations between bits in the process of converting low-bit data into high-bit data.

In this embodiment, the network hierarchy may be configured with corresponding learning weights, and the terminal device may configure an xor module between the learning weights and the input data, that is, perform an xor operation on the two parameters by using the learning weights and the input data as inputs of the xor module. Wherein the neural network may be configured with learning weights based on the low bit representation.

In S1102, the output of the xor module is used as the input of a counting Popcount module, and the xor module and the Popcount module are connected in series, and the serially connected xor module and the Popcount module are used as the preprocessing function.

In this embodiment, since the xor operation is equivalent to a product operation in the neural network, and the data obtained after the xor operation is still represented based on bits, in this case, the terminal device may connect a counting Popcount module in series at the output port of the xor module, and count the number of "1" included in the operation result after the xor operation. For example, the operation result after the xor operation may be [10001101], the operation result is imported into the Popcount module, the corresponding output result is 4, and the output result is data expressed by a numerical value, that is, high-bit data.

In a possible implementation manner, the terminal device may divide the operation result after the xor operation into a plurality of data blocks according to the quantization precision of the network hierarchy, perform Popcount operation on each data block, and package the operation result corresponding to each data block according to the order in the operation result as the operation result of the Popcount operation. For example, the quantization precision of the network level is 4 bits, and the output result after the xor operation is [10001101], which is int8 data represented by 8 pieces of 1-bit data, so that the operation result may be divided into two data blocks, the division ratio is [1000] and [1101], the number of "1" characters in the two data blocks is counted respectively to be 1 and 3, the result of the two Popcount operations may be packaged according to the order in the operation result corresponding to the xor operation, that is, 13, and 13 is used as the operation result of the Popcount operation.

In the embodiment of the application, when the data precision of the input data is lower than the quantization precision required by a network level, the XOR XNOR module and the Popcount module are established, and the low-precision data is converted into the high-bit data through bit operation, so that inverse quantization operation is not needed, the data processing pressure of the terminal equipment in the training and operation processes is reduced, and the operation rate is improved.

Illustratively, fig. 12 shows a schematic diagram of data conversion with mixed precision provided by an embodiment of the present application. Referring to fig. 12, if the data format of the input data of the network hierarchy is consistent with the quantization precision of the network hierarchy, operations can be performed between the same hierarchies without conversion through a preprocessing function. Wherein, the operation process has the following four conditions:

1, the data format of the input data X of the network level is 8bit data, and the quantization precision of the network level is also 8bit, that is, the data format of the adopted weighted value is also 8bit, in this case, the product between the two data can be directly calculated, and the product can be superposed with another weighted value to obtain the corresponding output result Y.

2, the data format of the input data X 'of the network level is 2 bits, and the quantization precision of the network level is 2 bits, that is, the data format of the adopted weighted value is 2 bits, in this case, the two parameters can be directly subjected to bit operation, a Popcount module is connected in series after the bit operation, and the Popcount module is subjected to low bit activation to obtain a corresponding output result Y'.

And 3, when the data format of the input data X is 4 bits or 8 bits and the quantization precision of the network layer is 1bit or 2 bits, the quantization precision of the data format of the input data is greater than that of the network layer, at this time, the input data is divided into a plurality of data blocks, each data block is imported into a low bit activation function, the input data X represented by high bits is converted into the input data X ' represented by low bits, and bit operation is performed on the input data X ' and a weight value represented by low bits in the network layer to obtain an output result Y ' represented by low bits.

4, when the data format of the input data X ' is 1bit or 2bit, and the quantization precision of the network level is 4bit or 8bit, the quantization precision of the data format of the input data is smaller than that of the network level, at this time, the input data X ' represented by low bits needs to be converted into a data form represented by high bits, that is, X ', through an XNOR module and a Popcount module, and the X ″ converted into the high bit identifier and a weight value in the network level perform corresponding numerical operation, so as to obtain an output result Y represented by the high bit identifier.

Fig. 13 is a flowchart illustrating a specific implementation of a data processing method of a terminal network model according to a fifth embodiment of the present application. Referring to fig. 13, with respect to any one of the embodiments in fig. 3, fig. 8, fig. 9, and fig. 11, in the data processing method of a terminal network model according to this embodiment, before inputting the data to be processed into a target network model for processing and obtaining a processing result, the method further includes: s1301 to S1302 are specifically detailed as follows:

further, before the data to be processed is input into the target network model for processing and a processing result is obtained, the method further includes:

in S1301, associated initial weights are configured for each network hierarchy according to the quantization precision.

In this embodiment, the terminal device may configure an initial value for the network weight of the network hierarchy according to the quantization precision corresponding to the network hierarchy, that is, set an initial weight, where a data format of the initial weight matches the quantization precision of the network hierarchy. For example, if the quantization precision of the network level is 8 bits, an 8-bit network weight can be configured for the network level, and an initial value can be configured for the network weight, so as to obtain the initial weight.

In S1302, the target network model is generated according to each of the initial weights and the preprocessing function.

In this embodiment, if the input of the preprocessing function includes the initial weight, the input relationship between the preprocessing function and the initial weight may be established; if the input of the preprocessing function does not contain the initial weight, the output data of the preprocessing function and the initial weight can be used as the input of an operation module preset in the network hierarchy, so that a hierarchy model corresponding to the network hierarchy is constructed, and the input and output relations among the hierarchy models of different network hierarchies are established according to the hierarchy models of the network hierarchies and the hierarchy order of the network hierarchy, so that the target network model is generated.

In the embodiment of the application, the initial weights are configured for different network levels, and the target network model is generated according to the initial weights and the preprocessing function, so that the aim of automatically creating the target network model with mixed quantization precision is fulfilled, and the automation degree is improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 14 shows a block diagram of a data processing apparatus of a terminal network model provided in the embodiment of the present application, which corresponds to the data processing method of the terminal network model described in the above embodiment, and only shows the relevant parts in the embodiment of the present application for convenience of description.

Referring to fig. 14, the data processing apparatus of the terminal network model includes:

a data acquisition unit 141 for acquiring data to be processed;

a data processing unit 142, configured to input the data to be processed into a target network model for processing, so as to obtain a processing result, where at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is configured to preprocess the input data of a corresponding network level when a data format of the input data does not match quantization precision of the network level.

Optionally, the data processing unit 142 is specifically configured to: inputting the data to be processed into the target network model, and performing the following processing for each network layer in the target network model:

judging whether the data format of input data of the current network level in the target network model is matched with the quantization precision of the network level, wherein the input data comprises the data to be processed or the data of the data to be processed after the data to be processed is processed in the corresponding network level;

Optionally, the data processing unit 142 further includes:

a data block dividing unit, configured to determine, according to a ratio between the data precision and the quantization precision, the number of data blocks obtained by dividing the input data if the data precision of the data format is greater than the quantization precision;

an execution time determining unit, configured to obtain a low bit quantization function corresponding to the quantization precision, and determine the execution time of the low bit quantization function according to the number of data blocks;

a first pre-processing function generating unit, configured to generate the pre-processing function according to the execution times and the low bit quantization function.

Optionally, if the quantization precision is 1-bit quantization, the low-bit quantization function is specifically:

wherein the content of the first and second substances,

passing the output value of the low bit quantization function for the ith data block; x is a radical of a fluorine atom _i The original value corresponding to the input data for the ith data block; t is a preset threshold value.

Optionally, if the quantization precision is 2-bit quantization, the low-bit quantization function is specifically:

Optionally, the data processing unit 142 further includes:

an exclusive or module configuration unit configured to configure an exclusive or module between the input data and the learning weight of the target network model if the data accuracy of the data format is smaller than the quantization accuracy;

and the second preprocessing function generating unit is used for taking the output end of the XOR module as the input end of the counting Popcount module, serially connecting the XOR module and the Popcount module, and taking the serially connected XOR module and the Popcount module as the preprocessing function.

Optionally, the data processing apparatus of the terminal network model further includes:

an initial weight configuration unit, configured to configure associated initial weights for each of the network hierarchies according to the quantization precision;

and the initial weight packaging unit is used for generating the target network model according to each initial weight and the preprocessing function.

Therefore, the data processing apparatus of the terminal network model provided in this embodiment of the present application may also determine quantization precisions corresponding to different network hierarchies by obtaining network information of the target network model before generating the target network model, configure a preprocessing function for converting data formats between different precisions based on the quantization precision of the current hierarchy and the quantization precision of the previous hierarchy, and generate the target network model according to the preprocessing function, thereby implementing processing of data of different precisions in the same target network model, solving the problem of compatibility of a neural network with mixed precision, and improving the operation efficiency.

Fig. 15 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 15, the terminal device 15 of this embodiment includes: at least one processor 150 (only one shown in fig. 15), a memory 151, and a computer program 152 stored in the memory 151 and executable on the at least one processor 150, the processor 150 implementing the steps in the data processing method embodiments of any of the various terminal network models described above when executing the computer program 152.

The terminal device 15 may be a computing device such as a desktop computer, a notebook, a palm computer, and a cloud server. The terminal device may include, but is not limited to, a processor 150, a memory 151. Those skilled in the art will appreciate that fig. 15 is merely an example of the terminal device 15, and does not constitute a limitation to the terminal device 15, and may include more or less components than those shown, or combine some components, or different components, such as an input/output device, a network access device, and the like.

The Processor 150 may be a Central Processing Unit (CPU), and the Processor 150 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 151 may in some embodiments be an internal storage unit of the terminal device 15, such as a hard disk or a memory of the terminal device 15. In other embodiments, the memory 151 may also be an external storage device of the terminal device 15, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the terminal device 15. Further, the memory 151 may also include both an internal storage unit and an external storage device of the terminal device 15. The memory 151 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer programs. The memory 151 may also be used to temporarily store data that has been output or is to be output.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

It should be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units and modules is only used for illustration, and in practical applications, the above function distribution may be performed by different functional units and modules as needed, that is, the internal structure of the apparatus may be divided into different functional units or modules to perform all or part of the above described functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present application. For the specific working processes of the units and modules in the system, reference may be made to the corresponding processes in the foregoing method embodiments, which are not described herein again.

An embodiment of the present application further provides a network device, where the network device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a mobile terminal, enables the mobile terminal to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the embodiments of the methods described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunication signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other ways. For example, the above-described apparatus/network device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A data processing method of a terminal network model is characterized by comprising the following steps:

acquiring data to be processed;

inputting the data to be processed into a target network model for processing to obtain a processing result, wherein at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is used for preprocessing the input data when the data format of the input data of the corresponding network level is not matched with the quantization precision of the network level;

the inputting the data to be processed into a target network model for processing, and the obtaining of the processing result comprises:

inputting the data to be processed into the target network model, and performing the following processing for each network layer in the target network model:

if not, configuring a preprocessing function for the current network level, and preprocessing input data of the current network level through the preprocessing function to obtain preprocessed data; whether the data format of the input data of the current network layer in the target network model is not matched with the quantization precision of the network layer comprises the following steps: the data precision of the data format is greater than the quantization precision, and the data precision of the data format is less than the quantization precision;

after all network levels in the target network model are processed, taking the operation result of the last network level as the processing result of the data to be processed;

if the data format of the input data of the current network level in the target network model is not matched with the quantization precision of the network level, configuring a preprocessing function for the current network level, including:

if the data precision of the data format is larger than the quantization precision, determining the number of data blocks obtained by dividing the input data according to the ratio of the data precision to the quantization precision;

2. The data processing method of claim 1, wherein if the quantization precision is 1-bit quantization, the low-bit quantization function is specifically:

3. The data processing method of claim 1, wherein if the quantization precision is 2-bit quantization, the low-bit quantization function is specifically:

4. The data processing method of claim 1, wherein if the data format of the input data of the current network hierarchy in the target network model does not match the quantization precision of the network hierarchy, configuring a preprocessing function for the current network hierarchy, further comprising:

5. The data processing method according to any one of claims 1 to 4, wherein before the inputting the data to be processed into the target network model for processing and obtaining the processing result, the method further comprises:

configuring associated initial weights for each of the network levels according to the quantization precision;

6. A data processing apparatus of a terminal network model, comprising:

the data acquisition unit is used for acquiring data to be processed;

the data processing unit is used for inputting the data to be processed into a target network model for processing to obtain a processing result, wherein at least one network level in the target network model is configured with a preprocessing function, and the preprocessing function is used for preprocessing the input data when the data format of the input data of the corresponding network level is not matched with the quantization precision of the network level;

the data processing unit is specifically configured to: inputting the data to be processed into the target network model, and performing the following processing for each network level in the target network model:

outputting an operation result of the current network hierarchy based on the network weight corresponding to the quantization precision in the current network hierarchy and the preprocessed data;

if the data format of the input data of the current network level in the target network model is not matched with the quantization precision of the network level, a preprocessing function is configured for the current network level, and the preprocessing function comprises the following steps:

7. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 5 when executing the computer program.

8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.