CN116261729A

CN116261729A - Neural network model construction method and equipment thereof

Info

Publication number: CN116261729A
Application number: CN202080104556.9A
Authority: CN
Inventors: 袁宏辉; 伍玮翔; 钟钊
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2023-06-13
Also published as: WO2022021199A1

Abstract

A neural network model construction method and equipment thereof are used in the process of constructing a neural network. The method comprises the following steps: a first neural network model (601) is built through a first model generator, a first performance index (602) of the first neural network model when running on a target chip is obtained according to the first neural network model, the first model generator is adjusted according to the first performance index to obtain a second model generator (603), a second neural network model is built through the second model generator, and the second performance index of the second neural network model is superior to the first performance index. According to the scheme, the first theoretical performance index of the first neural network model when the first neural network model runs on the target chip is obtained, and the first model generator is adjusted according to the first theoretical performance index, so that the neural network model with better hardware performance index when the first neural network model runs in the target chip is constructed.

Description

Neural network model construction method and equipment thereof

Technical Field

The application relates to the field of artificial intelligence, in particular to a neural network model construction method and equipment thereof.

Background

Deep neural networks have gained excellent success in the processing and analysis of a variety of media signals, such as images, video and speech, over the years. A neural network with good performance often has a subtle network structure, and requires a great deal of effort from human experts with high skill and experience to design.

The structure search of the neural network, namely the construction of the neural network model changes the manual design mode, automatically searches the neural network structure to obtain the neural network structure with excellent performance, and obtains excellent results on tasks such as image recognition, image semantic segmentation, natural language processing and the like.

In the conventional structure search, the neural network structure search is trained in a certain chip environment according to target indexes of tasks (such as image classification, image segmentation and other applied model test accuracy), and when the neural network structure search trained based on the certain chip environment is applied in other chip environments, compatibility problems, such as excessive time consumption, low chip utilization rate and the like, occur in the running process of the neural network structure search due to different chip parameters.

Disclosure of Invention

The embodiment of the application provides a neural network model construction method and equipment thereof, which are used for constructing a neural network model with better hardware performance index when running in a target chip by acquiring a first theoretical performance index of the first neural network model when running on the target chip and adjusting corresponding weight in the first model generator according to the first theoretical performance index when the model generator is used for generating the neural network model.

A first aspect of an embodiment of the present application provides a neural network model building method.

The neural network model building means builds a first neural network model by a first model generator preset in the neural network model building means, the first neural network model being built by the first model generator based on the respective building units.

The first model generator, after the first neural network model is built, the neural network model building device obtains a first performance index of the first neural network model when the first neural network model runs on the target chip according to the first neural network model.

The neural network model building device adjusts the first model generator according to the first performance index to obtain a second model generator. The neural network model building device builds a second neural network model according to the second model generator after the second model generator is obtained, wherein the second performance index of the second neural network model is better than the first performance index, namely, the performance index of the second neural network model when running on the target chip is better than the performance index of the first neural network model when running on the target chip.

In the embodiment of the application, the first neural network model is obtained when the first neural network model runs on the target chip, and the first model generator is adjusted according to the first performance index, so that a second neural network model with better hardware performance index when the first neural network model runs in the target chip is constructed.

Based on the neural network model building method of the first aspect of the embodiments of the present application, in one possible implementation manner,

the first model generator, after the first neural network model is built, the neural network model building device acquires a theoretical performance index of the first neural network model, the first theoretical performance index representing a theoretical value of a performance index of the first neural network model when the first neural network model is running on the target chip.

The neural network model building device adjusts the first model generator according to the first theoretical performance index to obtain a second model generator. The neural network model building device builds a second neural network model according to the second model generator after the second model generator is obtained, wherein the second theoretical performance index of the second neural network model, that is, the theoretical value of the performance index of the second neural network when the second neural network runs on the target chip is superior to the first theoretical performance index.

In the embodiment of the application, the first theoretical performance index of the first neural network model when running on the target chip is obtained, and the first model generator is adjusted according to the first theoretical performance index, so that the second neural network model with better hardware performance index when running in the target chip is constructed.

In a possible implementation manner of the neural network model building method according to the first aspect of the embodiment of the present application, after the neural network model building device builds the second neural network model through the second model generator, the neural network model building device further obtains a first actually measured performance index, where the first actually measured performance index represents an actually measured value of a performance index when the second neural network model runs on the target chip, that is, the actually measured performance index of the target chip obtained after the second neural network model runs on the target chip.

The neural network model building device adjusts the corresponding weight factors in the second model generator according to the first actually measured performance indexes to obtain a third model generator. After the third model generator is obtained, the neural network model building device builds a third neural network model through the third model generator, wherein the second measured performance index of the third neural network model, namely, the measured value of the performance index of the third neural network model when the third neural network model runs on the target chip is superior to the first measured performance index.

In the embodiment of the application, the second neural network is operated in the actual target chip, the corresponding actual measurement performance index is obtained, and the second model generator is adjusted according to the actual measurement performance index, so that the neural network model which is more suitable for the target chip can be obtained.

Based on the neural network model building method of the first aspect of the embodiments of the present application, in a possible implementation manner, before the neural network model building device obtains the first actually measured performance index, the neural network model building device further trains the second neural network model to obtain a fourth neural network model. After obtaining the fourth neural network model, the neural network model building device obtains the model performance of the fourth neural network model, and adjusts the second model generator according to the model performance of the fourth neural network and the first measured performance index to obtain the third model generator.

According to the embodiment of the application, the second model generator is adjusted according to the model performance of the fourth neural network model trained by the second neural network model and the first actually measured performance index, so that the performance index of the neural network model in the target chip running process can be better improved under the condition that the model performance of the neural network model generated by the adjusted third model generator is guaranteed to be equal.

In a possible implementation manner of the neural network model building method according to the first aspect of the embodiment of the present application, the neural network model building device obtains the first theoretical performance index through a performance evaluation tool, where the theoretical performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.

In the embodiment of the application, the neural network model building device acquires the first theoretical performance index through the performance evaluation tool, so that the feasibility of acquiring the first theoretical performance index is improved.

Based on the neural network model building method of the first aspect of the embodiments of the present application, in one possible implementation manner, the neural network model building device determines, through the performance evaluation tool, a first building unit of the first neural network, where the first building unit includes at least one of: the method comprises the steps of a convolution layer of a first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model and a normalization layer of the first neural network model. The convolution layer, the pooling layer, the activation function and the normalization layer included in the first construction unit may be one or more. The neural network model building device calculates according to the first building unit to obtain a first theoretical performance index.

In the embodiment of the application, the neural network model construction device calculates one or more first construction units of the first neural network model through the performance evaluation tool to obtain the first theoretical performance index, and the first construction units further comprise at least one convolution layer, a pooling layer, an activation function and a normalization layer, so that the theoretical performance index of the first neural network model can be adjusted according to each layer which most forms the first neural network model, and the flexibility is improved.

According to the neural network model building method of the first aspect of the embodiments of the present application, in a possible implementation manner, after the neural network model building device builds the first neural network model, the neural network model building device further trains the first neural network model to obtain a fifth neural network model. The neural network model building device obtains model performance of a fifth neural network model after obtaining the fifth neural network model, and adjusts the first model generator according to the model performance of the fifth neural network and the first theoretical performance index to obtain the second model generator.

According to the embodiment of the application, the first model generator is adjusted according to the model performance of the fifth neural network model trained by the first neural network model and the first theoretical performance index, so that the theoretical performance index of the neural network model running in the target chip can be better improved under the condition that the model performance of the neural network model generated by the adjusted second model generator is ensured to be equal.

Based on the neural network model construction method of the first aspect of the embodiments of the present application, in one possible implementation manner, the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: theoretical vector module bound (vector bound), theoretical memory bound (memory bound), theoretical cube module (cube) utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module (vector cycle) operation times, theoretical vector module (cube) operation times.

In the embodiment of the application, specific references of the first theoretical performance index and the second theoretical performance index are exemplarily described, so that the feasibility of the scheme is improved.

Based on the neural network model construction method of the first aspect of the embodiments of the present application, in one possible implementation manner, the first measured performance index and the second measured performance index respectively include at least one of the following: the method comprises the steps of actually measuring a vector module limit, actually measuring a memory limit, actually measuring a cube module utilization rate, actually measuring a MAC utilization rate of a high-speed parallel multiplication accumulator, actually measuring the cube module operation times and actually measuring the vector module operation times.

In the embodiment of the application, specific references of the first actually measured performance index and the second actually measured performance index are exemplarily described, so that the feasibility of the scheme is improved.

The second aspect of the embodiment of the application provides a neural network model building device.

A neural network model building apparatus, comprising:

a building unit for building a first neural network model through a first model generator;

the acquisition unit is used for acquiring a first performance index of the first neural network model when the first neural network model runs on the target chip according to the first neural network model;

The processing unit is used for adjusting the first model generator according to the first performance index to obtain a second model generator;

the building unit is further configured to build a second neural network model by means of the second model generator, the second performance index of the second neural network model being better than the first performance index.

Optionally, the first performance index is a first theoretical performance index, the first theoretical performance index represents a theoretical value of a performance index of the first neural network model when running on the target chip, the second performance index is a second theoretical performance index, and the second theoretical performance index is better than the first theoretical performance index.

Optionally, the acquiring unit is specifically configured to acquire a first actually measured performance index, where the first actually measured performance index represents an actually measured value of a performance index when the second neural network model runs on the target chip;

the processing unit is also used for adjusting the second model generator according to the first actually measured performance index to obtain a third model generator;

the construction unit is further configured to construct a third neural network model through the third model generator, the second measured performance index of the third neural network model being better than the first measured performance index.

Optionally, the neural network model building device further includes:

The training unit is used for training the second neural network model to obtain a fourth neural network model;

the processing unit is further configured to adjust the second model generator according to the first measured performance index, and the obtaining the third model generator includes:

the processing unit is further used for adjusting the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.

Optionally, the first performance index is a first theoretical performance index, and the acquiring unit is specifically configured to acquire the first theoretical performance index through a performance evaluation tool, where the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model, so as to obtain the first theoretical performance index.

Optionally, the neural network model building device further includes:

a determining unit for determining, by the performance evaluation tool, a first building unit of a first neural network model, the first building unit comprising at least one of: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model;

the processing unit is further used for calculating according to the first construction unit so as to obtain a first theoretical performance index.

Optionally, the training unit is further configured to train the first neural network model to obtain a fifth neural network model;

the processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model, to obtain a second model generator.

Optionally, the first theoretical performance index and the second theoretical performance index each include at least one of: theoretical vector module limit, theoretical memory limit, theoretical cube module utilization, theoretical high-speed parallel multiply accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.

Optionally, the first measured performance index and the second measured performance index each include at least one of: the method comprises the steps of actually measuring a vector module limit, actually measuring a memory limit, actually measuring a cube module utilization rate, actually measuring a MAC utilization rate of a high-speed parallel multiplication accumulator, actually measuring the cube module operation times and actually measuring the vector module operation times.

A third aspect of the embodiments of the present application provides a neural network model building apparatus, including:

the device comprises a processor, a memory and an input/output interface, wherein the processor and the memory are connected with the input/output interface; the memory is used for storing program codes; the processor, when calling the program code in the memory, performs the method provided by the embodiment of the first aspect of the present application.

In a fourth aspect of the present application, it should be noted that, in essence, a portion of the technical solution or all or a portion of the technical solution that contributes to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium, and the computer software instructions for the device are stored, where the computer software instructions include a program designed to execute the metadata storage method in the first aspect.

The storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

A fifth aspect of the embodiments of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method as in the embodiments of the first aspect of the present application.

The processor mentioned in any of the above may be a general purpose central processing unit (Central Processing Unit, CPU), microprocessor, application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the program execution of the method for port detection in the first aspect.

According to the technical scheme provided by the embodiment of the application, the first theoretical performance index of the first neural network running on the target chip is obtained, and the first model generator is adjusted according to the first theoretical performance index, so that the first model generator can be adjusted according to the performance index of the target chip, and the compatibility of a second neural network model generated by the adjusted second model is improved.

Drawings

FIG. 1 is a schematic diagram of a framework of an embodiment of a method for constructing a neural network model according to an embodiment of the present application;

FIG. 2 is another schematic diagram of a neural network model building method according to an embodiment of the present application;

FIG. 3 is another schematic diagram of a neural network model building method according to an embodiment of the present application;

FIG. 4 is another schematic diagram of a neural network model construction method according to an embodiment of the present application;

FIG. 5 is another schematic diagram of a neural network model building method according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of an embodiment of a neural network model building method according to the present application;

FIG. 7 is another schematic flow chart of an embodiment of a neural network model building method according to the present application;

FIG. 8 is another schematic flow chart of an embodiment of a neural network model building method according to the present application;

FIG. 9 is another schematic flow chart of an embodiment of a neural network model building method according to the present application;

FIG. 10 is a schematic structural diagram of an embodiment of a neural network model building device according to the present application;

FIG. 11 is another schematic structural diagram of an embodiment of a neural network model building device according to the present application;

fig. 12 is another schematic structural diagram of an embodiment of a neural network model building apparatus according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

FIG. 1 illustrates a schematic diagram of an artificial intelligence framework that describes the overall workflow of an artificial intelligence system, applicable to general artificial intelligence field requirements.

The above-described artificial intelligence topic framework is described below in terms of two dimensions, the "Intelligent information chain" (horizontal axis) and the "IT value chain" (vertical axis).

The "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general processes of intelligent information sensing, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process.

The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure:

the infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice, video and text, and also relate to internet of things data of traditional equipment, wherein the data comprise service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data is processed as mentioned above, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing (e.g., image recognition, object detection, etc.), voice recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, safe city, intelligent terminal and the like.

Referring to fig. 2, an embodiment of the present application provides a system architecture 200. The system architecture includes a database 230 and a client device 240. The data collection device 260 is configured to collect data and store the data in the database 230, and the training module 220 generates the target model/rule 201 based on the data maintained in the database 230.

The operation of each layer in a deep neural network can be described by the mathematical expression y=a (W x+b): the work of each layer in a physical layer deep neural network can be understood as completing the transformation of input space into output space (i.e., row space to column space of the matrix) by five operations on the input space (set of input vectors), including: 1. dimension increasing/decreasing; 2. zoom in/out; 3. rotating; 4. translating; 5. "bending". Wherein operations 1, 2, 3 are completed by W x, operation 4 is completed by +b, and operation 5 is completed by a (). The term "space" is used herein to describe two words because the object being classified is not a single thing, but rather a class of things, space referring to the collection of all individuals of such things. Where W is a weight vector, each value in the vector representing a weight value of a neuron in the layer neural network. The vector determines the above spatial transformation of the input space into the output space, i.e. the weights of each layer control how the space is transformed. The purpose of training the deep neural network is to finally obtain the weight matrix of all layers of the trained neural network. Thus, the training process of the neural network is essentially a way to learn and control the spatial transformation, more specifically to learn a weight matrix, which in the following embodiments of the present application may be refined into a set of structural parameters and a set of network parameters, see in particular the description of the correlation in fig. 2 below.

Because the output of the desired deep neural network is as close to the target value as possible, the weight vector of each layer of neural network can be updated by comparing the predicted value of the current network with the target value and according to the difference between the predicted value and the target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network). For example, if the predicted value of the network is too high, the values of the weights in the weight matrix are adjusted to decrease the predicted value, and the adjustment is continued until the value output by the neural network approaches or equals the target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", that is, a loss function (loss function) which is an important equation for measuring the difference between the predicted value and the target value, or an objective function (objective function). Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and the training of the neural network can be understood as a process of reducing loss as much as possible.

The calculation module may include a training module 220, and the target model/rule obtained by the training module 220 may be applied to different systems or devices. In fig. 2, the executing device 210 configures a transceiver 212, where the transceiver 212 may be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to interact with external devices, and a "user" may input data to the transceiver 212 through the client device 240, e.g., in the following embodiments of the present application, the client device 240 may send a target task to the executing device 210, request the executing device to build a neural network, and send a database for training to the executing device 210.

The execution device 210 may call data, code, etc. in the data storage system 250, or may store data, instructions, etc. in the data storage system 250.

The calculation module 211 processes the input data using the target model/rule 201. Specifically, the calculation module 211 is configured to: the method comprises the steps of constructing a first neural network model through a first model generator, acquiring a first performance index of the first neural network model when the first neural network model runs on a target chip according to the first neural network model, adjusting the first model generator according to the first performance index to obtain a second model generator, constructing a second neural network model through the second model generator, wherein the second performance index of the second neural network model is superior to the first performance index.

The association function module 21 may specifically be a module for training a model generator.

The association function 214 may be configured to perform search construction according to basic operations included in the search space, resulting in a first model generator.

Finally, transceiver 212 returns the built neural network model to client device 240 to deploy the neural network model in client device 240 or other devices.

Further, the training module 220 may obtain corresponding target models/rules 201 based on different data for different target tasks to provide better results to the user.

In the case shown in fig. 2, the user may manually specify data in the input execution device 210, for example, to operate in an interface provided by the transceiver 212. In another case, the client device 240 may automatically input data to the transceiver 212 and obtain the result, and if the client device 240 automatically inputs data to obtain authorization of the user, the user may set the corresponding rights in the client device 240. The user may view the results output by the execution device 210 at the client device 240, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 240 may also act as a data collection terminal to store the collected data associated with the target task in the database 230.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the positional relationship between the devices, the modules, and the like shown in the drawings does not constitute any limitation. For example, in FIG. 2, data storage system 250 is external memory to execution device 210, and in other scenarios, data storage system 250 may be located within execution device 210.

Referring to fig. 3, an embodiment of the present application provides a system architecture 300. The execution device 210 is implemented by one or more servers, optionally in cooperation with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The execution device 210 may use the data in the data storage system 250 or invoke the program code in the data storage system 250 to implement the steps of the neural network model building method corresponding to fig. 6-8 below in this application.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smart phone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set top box, game console, etc.

The local device of each user may interact with the performing device 210 through a communication network of any communication mechanism/communication standard, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof. In particular, the communication network may comprise a wireless network, a wired network, a combination of a wireless network and a wired network, or the like. The wireless network includes, but is not limited to: a fifth Generation mobile communication technology (5 th-Generation, 5G) system, a Long term evolution (Long term evolution, LTE) system, a global system for mobile communication (global system for mobile communication, GSM) or code division multiple access (code division multiple access, CDMA) network, a wideband code division multiple access (wideband code division multiple access, WCDMA) network, a wireless fidelity (wireless fidelity, wiFi), bluetooth (blue), zigbee, radio frequency identification technology (radio frequency identification, RFID), long Range (Lora) wireless communication, near field wireless communication (near field communication, NFC). The wired network may include a network of fiber optic communications or coaxial cables, etc.

In another implementation, one or more aspects of the execution device 210 may be implemented by each local device, e.g., the local device 301 may provide local data or feedback calculations to the execution device 210.

It should be noted that all functions of the execution device 210 may also be implemented by the local device. For example, the local device 301 implements the functions of the execution device 210 and provides services to its own users, or to the users of the local devices 302.

Referring to fig. 4, a schematic diagram of a neural network model building framework according to an embodiment of the present application is provided.

The neural network model building framework includes at least one controller model and a neural network model generated via the controller model. The controller model obtains the architecture of a neural network model through searching, trains the architecture of the neural network model through a training set, and evaluates the architecture of the neural network model through a verification set to obtain accuracy. The feedback results (e.g., accuracy) are then returned to the control model, which is updated with reinforcement learning, allowing the controller to generate a better network structure in the next cycle. After the process is repeated for a plurality of times, a new architecture is generated, the test is performed again, and the feedback result is transmitted to the controller model for reinforcement learning again. Finally, the controller model described above may tend to design which architectures can achieve higher accuracy in the validation set.

Further, referring to fig. 5, a schematic diagram of a building framework of a neural network model is provided according to an embodiment of the present application.

The neural network model building framework schematic diagram at least comprises a network structure population and a performance evaluation tool. The network structure population includes a plurality of neural network model building units, and the model generator searches for an appropriate neural network building unit in the network structure population and builds the searched neural network building unit into a neural network model, which may be understood to include one or more neural network model building units, which is not limited herein.

After the neural network model is built, the neural network model is input into a performance evaluation tool to obtain a chip performance index when the neural network model runs on a target chip. Optionally, the neural network model may be trained by a chip placed on the cloud to obtain the network structure performance of the neural network structure, and it is understood that the neural network model may be trained in other chip environments, which is not limited herein.

In the continuous iterative process, the network structure population is adjusted through the chip performance index and the network structure performance so as to obtain a model generator with higher chip performance and better network structure performance. In one possible implementation, the network structure population is updated in a pareto optimization manner, that is, the constructed neural network model has higher chip performance on the premise of not affecting the network structure performance.

It should be noted that the neural network model building device in the embodiment of the present application may be a computer device with a chip, such as a server, a desktop computer, a notebook computer, a computer cluster, and the like, which is not limited herein.

The neural network model construction method provided by the application is described below based on the application scenario.

Referring to fig. 6, a flowchart of a neural network model building method according to an embodiment of the present application is shown.

In step 601, the neural network model building device builds a first neural network model through a first model generator.

The neural network model building means builds a first neural network model by a first model generator preset for the neural network model building means.

Specifically, in one possible implementation manner, the neural network model building device builds the first neural network through the first model generator and the target task. The neural network model building means may acquire the target task before the first model generator builds the first neural network. The target task is determined according to the own requirement, and can also be determined according to the operation of a user. For example, the target task may include: the type of neural network, the accuracy of the neural network, etc. The type of the neural network includes an output type of the neural network requested to be constructed, for example, the target task may be to construct a face recognition neural network for recognizing a face and outputting corresponding character information. For another example, the target task may be initiated by the terminal, and a neural network for vehicle identification is constructed for identifying information of the vehicle included in the picture obtained by the serializer.

It should be noted that, the neural network in the present application may be a convolutional neural network, a cyclic neural network, a sensor neural network, and the like, and may specifically be adjusted according to an actual application scenario, which is not limited in the present application.

Corresponding training data may also be obtained at the same time as or after the target task is obtained. The training data is data associated with a target task. The training data may include input data for the target task and real measurement data. For example, if the target task is to construct a face recognition neural network, the training data includes a large number of face pictures and task information corresponding to each picture. In one possible implementation, the training data may be divided into a training set and a verification set, where the training set represents a picture of training the neural network model and the verification set represents a picture of verifying accuracy of the network model.

In one possible implementation, before the first model generator builds the first neural network, hardware constraints including parameters of the target chip are also input to the first model generator. Specifically, the hardware constraints may include at least one of: the frequency of the chip, the size of the chip memory, the size of the chip arithmetic module, the bandwidth between the memory in the chip, and the like. It will be appreciated that in practical applications, further parameters may be included, and are not limited in this particular context. After the hardware constraint condition is input into the first model generator, the first model generator builds a first neural network according to the hardware constraint condition when building the first neural network model.

In step 602, the neural network model building device obtains a first theoretical performance index.

The neural network model building device obtains a first theoretical performance index, wherein the first theoretical performance index represents a theoretical value of a performance index of the first neural network model when the first neural network model runs on the target chip, namely, the theoretical performance index is a theoretical performance index.

The target chip may be a central processing unit (central processing unit, CPU) graphics processor (graphics processing unit, GPU), digital signal processor (digital signal processing, DSP), field programmable gate array (field programmable gate array, FPGA), application specific integrated circuit (application specific integrated circuit, ASIC), or general purpose processor, among others. For example, the first theoretical performance index is a theoretical performance index when the CPU runs the first neural network.

In one possible implementation, the first theoretical performance indicator includes at least one of: theoretical vector module bound (vector bound), theoretical memory bound (memory bound), theoretical cube module (cube) utilization, theoretical high-speed parallel multiply-accumulator MAC (MAC) utilization, theoretical cube module operation number (cube cycle), theoretical vector module operation number (vector cycle), L1 and L2 memory fusion effect, batch computation (computer batch) effect, tiling strategy and performance thereof, performance effect under hybrid accuracy, performance effect under different modes of data flow, per operator or network layer cycle or delay in a neural network model, total cycle number or delay of the whole neural network model.

In one possible implementation manner, the neural network model building device obtains the first theoretical performance index through a performance evaluation tool, where the performance evaluation tool includes a calculation function, and the calculation function is used for calculating the first neural network model to obtain the first theoretical performance index. Specifically, the performance evaluation tool may be a software-type tool for acquiring a performance index corresponding to the target chip when the target chip operates the first neural network. For example, in a preferred manner, the performance assessment tool is PTM, and the neural network model building device obtains the theoretical performance index via PTM. In practice, the performance evaluation tool may also exist in other forms, such as a hardware module, which is not limited herein.

Specifically, in one possible implementation, the neural network model building device divides the neural network into one or more first building units with at least one constituent unit (which may include at least one operator or take a layer as a unit) that constitutes the first neural network as one first building unit. And inputting the plurality of first building units into a performance evaluation tool, and calculating by the performance evaluation tool according to the parameters of the plurality of first building units and the target chip to obtain theoretical performance indexes of the first building units. And then, superposing the theoretical performance indexes of the first construction units to obtain a first theoretical performance index of the target chip running the first neural network. For example, the theoretical performance index is time consumption when the target chip operates the first neural network, and one construction unit is an operator in the first neural network, so that the performance evaluation tool calculates the time consumption corresponding to each operator respectively, and further performs superposition to finally obtain the time consumption corresponding to all operators in the whole neural network, namely the time consumption when the target chip operates the first neural network. The time consumption of the whole network can be calculated by the following formula:

Whole network time consumption = Σ ⁿ f (Performance assessment tool estimates time consumption of each layer of operators)

It should be noted that other performance indexes may be realized by similar formulas, for example, when calculating the number of cube cycles, the calculation may be performed by the following formulas:

whole network cube cycle = Σ ⁿ f (Performance assessment tool predicts cube cycle of each layer of operators)

Specifically, in one possible implementation manner, the neural network model building device inputs the entire first neural network model into the performance evaluation tool, and the performance evaluation tool calculates according to the parameters of the first neural network model and the target chip to obtain a theoretical performance index of the first neural network model, that is, a first theoretical performance index of the target chip running the first neural network. For example, the first theoretical performance index is the time consumption of the target chip when running the first neural network, and the performance evaluation tool obtains the time consumption of the target chip when running the first neural network through calculation of the time consumption of the first neural network.

In one possible implementation, the neural network model building means obtains the first theoretical performance index by inputting the data stream into a performance evaluation tool. Specifically, the neural network model building means determines one or more first building units of the first neural network model, the first building units including at least one of: the method comprises the steps of a convolution layer of a first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model and a normalization layer of the first neural network model. The different layers in the neural network are further divided into dimensions of a task according to the one or more first building units, for example into one dimension by one or more convolution layers, pooling layers, activation functions, normalization layers, etc., and in a preferred manner into one dimension by one convolution layer, pooling layer, activation functions, normalization layers. And the data flow is analyzed according to each task, for example, when the chip runs the neural network, different types of chip memories can process different data. For example, in the process from L2 to L1, where L2 needs to transmit data onto L1, the efficiency in this process is analyzed in the dimension of a neural network. The process may further include: l1- > L0A/L0B, UB- > L1, and the like. And then calculating and analyzing each dimension in the neural network according to the data flow, and carrying out calculation and data transmission through which pipelines (pipe) when data transmission is carried out. And further calculating theoretical performance indexes such as cycle number, cube utilization rate, mac utilization rate, vector bound, memory bound (DDR, L2) and the like according to the processing conditions of each pipeline (pipe) and data cutting.

memory bound, the case of the smallest memory cell multiplexing in the chip. When the output feature map is larger than the minimum memory unit of the chip, the minimum memory unit of the chip needs to be multiplexed multiple times. In the embodiment of the application, the output characteristic diagram can be reduced by the corresponding weight when the neural network is constructed so as to adapt to the size of the minimum memory unit of the chip, and the use efficiency of the minimum memory unit of the chip is improved.

vector bound: multiplexing of vector modules in a chip.

Cube utilization: the cube module is mainly used for calculating a matrix, and the cube utilization rate refers to the number of times the cube module is used in unit time. When training the neural network, if the utilization rate of the subsequent cube is required to be improved, the cube utilization rate can be set as a target task, and when training the model, the model is closer to convergence, the higher the utilization rate of the cube is.

MAC (Multi-collector/Multi-accumulate operation, MAC) utilization: and counting the times of multiplication and addition of the neural network in the chip, wherein the mac utilization rate refers to the times of using the mac module in unit time. The method for improving the mac utilization rate in the embodiment of the present application is similar to the method for improving the cube utilization rate, and details thereof are not described herein.

cube cycle number: and (5) utilizing the number of times of computation of a cube module in the chip.

vector cycle number: the number of operations using the vector module.

L1 and L2 memory fusion: l1 and L2 are two different types in the chip memory, and the compatibility of L1 and L2 in processing data corresponding to the neural network can be improved by setting the corresponding weight for constructing the neural network.

Computer batch: the number of feature maps is processed in batch in the same time period. By setting the corresponding weight for constructing the neural network, the computer batch can be improved as much as possible on the premise of ensuring the running performance of the chip.

In addition to the performance index of the chip described above, the performance index in the embodiments of the present application may further include more parameters, so long as the performance index can affect the operation efficiency of the neural network on the chip, and is not limited herein.

In step 603, the neural network model building device adjusts the first model generator according to the first theoretical performance index to obtain the second model generator.

After the first theoretical performance index is obtained, the neural network model building device adjusts the first model generator according to the first theoretical performance index to obtain a second model generator, and the second model generator can build a neural network with better theoretical performance indexes when running on a target chip relative to the first model generator.

In one possible implementation manner, the weight factors in the corresponding first model generators may be adjusted according to the first theoretical performance index, so that the first model generators may construct a second neural network model with better performance index when running in the target chip, and the second theoretical performance index of the second neural network model is better than the first theoretical performance index. For example, when the theoretical performance index is the theoretical time consumption, when the model generator is adjusted, the corresponding time consumption weight factor is adjusted, so that the model generator builds the neural network with shorter time consumption when the target chip is operated. It can be appreciated that when the theoretical performance index is a plurality of theoretical performance indexes, a plurality of weight factors are correspondingly adjusted, so that the first model generator constructs a neural network with better performance indexes when the target chip operates.

In one possible implementation, the first neural network is trained, and when the first neural network tends to converge, a fifth neural network model is obtained, and performance parameters of the fifth neural network are obtained. The performance parameters of the fifth neural network may include accuracy, peak signal-to-noise ratio describing the picture, and the like, which are not limited herein. And then adjusting the weight factors in the first model generator according to the performance parameters of the fifth neural network and the first theoretical performance index. Therefore, the performance parameters of the neural network which can be generated by the model generator are guaranteed to be excellent, and the performance index of the neural network running on the target chip is better.

It should be noted that, as shown in fig. 7, steps 601 to 603 are a process of iterating the first model generator once according to the first theoretical performance index, and in the actual application process, one or more iterations may be performed to finally generate a model generator, that is, a second model generator, which may construct a neural network with an optimal theoretical performance index.

In step 604, the neural network model building device obtains measured performance indicators.

After the neural network model building means obtains the second model generator, the neural network model building means obtains a first measured performance index representing an actual performance index when the target chip operates the second neural network.

After the neural network model building means obtains the second model generator, the second neural network generated by the second model generator may be directly used, and the second model generator may be further improved for the second round. The second round of improvement is based on the adjustment of the performance index obtained by the second neural network running on the actual target chip.

In one possible implementation, a second neural network model is constructed from the second model generator, and the second neural network model is placed into the target chip to operate, and the corresponding actual performance index when the target chip operates the second network model is obtained.

The target chip may be a central processing unit (central processing unit, CPU) graphics processor (graphics processing unit, GPU), digital signal processor (digital signal processing, DSP), field programmable gate array (field programmable gate array, FPGA), application specific integrated circuit (application specific integrated circuit, ASIC), or general purpose processor, among others.

In one possible implementation, the first measured performance indicator may include at least one of: the actual vector module boundary (vector bound), the actual memory boundary (memory bound), the actual cube module (cube) utilization, the actual high-speed parallel multiply-accumulate (MAC) utilization, the actual cube module operation number (cube cycle), the L1 and L2 memory actual fusion effect, the actual batch calculation (computer batch) effect, the actual Tiling (Tiling) strategy and performance thereof, the performance effect under actual mixing accuracy, the actual performance effect under different modes of data flow, the actual time delay of each operator or network layer cycle in the neural network model, the total cycle number or time delay of the overall neural network model, etc., it can be understood that in the actual application process, more chip performance indexes, for example, the actual vector module operation number (vector cycle) may also be included, so long as the actual performance can reflect the second network performance of the target chip, and the actual performance is not limited herein.

In one possible implementation manner, the system in which the target chip is located may acquire, in a software or hardware form, a performance index corresponding to the target chip when the target chip operates the second neural network, and a manner of specifically acquiring the performance index is not limited herein.

In step 605, the neural network model building device adjusts the second model generator according to the first measured performance index to obtain a third model generator.

After the neural network model building device obtains the first measured performance index, the neural network model building device adjusts the second model generator according to the first measured performance index to obtain a third model generator, and the third model generator can build a third neural network model, wherein the second measured performance index of the third neural network model is better than the first measured performance index.

In one possible implementation, the weight factors in the corresponding second model generator may be adjusted according to the measured performance index, so that the second model generator may construct a neural network with better performance index when running in the target chip. For example, when the first measured performance index is the measured time consumption of the second neural network when running on the target chip, when the second model generator is adjusted, the corresponding time consumption weight factor is adjusted, so that the model generator builds the neural network with shorter time consumption when running on the target chip. It can be appreciated that when the first measured theoretical performance index is a plurality of indexes, a plurality of weight factors are correspondingly adjusted, so that the second model generator constructs a neural network with better performance indexes when running on the target chip.

In one possible implementation, the second neural network is trained, and when the second neural network tends to converge, a fourth neural network model is obtained, and performance parameters of the fourth neural network are obtained. The performance parameters of the fourth neural network may include accuracy, peak signal-to-noise ratio describing the picture, and the like, which are not limited herein. And then adjusting the weight factors in the first model generator according to the performance parameters of the fourth neural network and the first measured performance index. Therefore, the performance parameters of the neural network generated by the model generator are guaranteed to be good, and the performance index running on the target chip is better.

It should be noted that, as shown in fig. 8, steps 604 to 605 are a process of iterating the second model generator according to the measured performance index, and in the actual application process, one or more iterations may be performed to finally generate a model generator, that is, a third model generator, which may construct a neural network with an optimal measured performance index.

In the embodiment of the present application, steps 604 to 605 are optional steps, and when steps 604 to 605 are not performed, the second neural network model constructed by the second model generator is used as the neural network model used on the target chip.

In the embodiment of the present application, when the theoretical performance index and the actually measured performance index are obtained, the theoretical performance index and the actually measured performance index may be obtained in the neural network model building device, or may be obtained by another computer device and then sent to the neural network model building device, which is not limited herein.

In the embodiment of the application, the theoretical performance index of the target chip running the first neural network is obtained, and the corresponding weight in the first model generator is adjusted according to the theoretical performance index, so that a neural network model with better hardware performance index during running in the target chip is constructed.

The embodiment shown in fig. 6 is an application scenario of the embodiment of the present application, and another application scenario of the neural network model building method of the embodiment of the present application is described below.

Referring to fig. 9, another flow chart of the neural network model construction method provided in the embodiment of the application is shown.

In this embodiment, the model generator is described by taking the structure search space as an example.

According to the task requirements of the application scene, a search space is also required to be built according to indexes before the neural network model is built. Each constituent element in the search space is a code that indicates the construction of the neural network structure. In the structure searching process, each sampling utilizes the subsection relation of an undegraded constituent unit in the searching space and a network model constructed based on the evaluated constituent unit. Specifically, as shown in fig. 9, the portion in the dashed box is the step of constructing the initial search space. The search space is composed of a plurality of network structures composed of a plurality of constituent units. After the initial search spaces of the plurality of network structures are constructed, unreasonable network structures are screened based on preset rules or application scene tasks, and a second search space is obtained. And forming a new third search space through network structure clustering. Training the cluster center structure based on the third search space, modeling the unevaluated structure according to the training loss value of the evaluated network structure, selecting a plurality of network structures for training based on Bayesian optimization, and modeling the unevaluated structure according to the training loss value, wherein the training is circulated until the construction of the search space is completed.

After the initial search space is built, the neural network model building method in the embodiment shown in fig. 6 is further used to adjust the search space, which is not described herein.

In the embodiment of the application, the performance of the search space is improved by carrying out initial construction and training on the search space.

The neural network model construction method in the embodiment of the present application is described above, and the neural network model construction device in the embodiment of the present application is described below.

Referring to fig. 10, a schematic structural diagram of a neural network model building apparatus according to an embodiment of the present application is provided.

A neural network model building apparatus, comprising:

a building unit 1001 for building a first neural network model by means of a first model generator;

an obtaining unit 1002, configured to obtain, according to the first neural network model, a first performance index when the first neural network model runs on the target chip;

a processing unit 1003, configured to adjust the first model generator according to the first performance index, to obtain a second model generator;

the building unit 1001 is further configured to build, by means of the second model generator, a second neural network model, the second performance index of which is better than the first performance index.

In this embodiment, the operations performed by each unit of the neural network model building apparatus are similar to those described in the embodiments shown in fig. 6 and fig. 7, and detailed descriptions thereof are omitted here.

Referring to fig. 11, another schematic structural diagram of the neural network model building apparatus according to the embodiment of the present application is provided.

A neural network model building apparatus, comprising:

a building unit 1101 for building a first neural network model by means of a first model generator;

an obtaining unit 1102, configured to obtain a first performance index of the first neural network model when the first neural network model runs on the target chip according to the first neural network model;

a processing unit 1103, configured to adjust the first model generator according to the first performance index to obtain a second model generator;

the building unit 1101 is further configured to build a second neural network model by means of the second model generator, the second performance index of the second neural network model being better than the first performance index.

Optionally, the obtaining unit 1102 is specifically configured to obtain a first measured performance index, where the first measured performance index represents an actual measurement value of a performance index of the second neural network model when the second neural network model runs on the target chip;

the processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index to obtain a third model generator;

the construction unit 1101 is further configured to construct a third neural network model by means of a third model generator, the second measured performance index of the third neural network model being better than the first measured performance index.

Optionally, the neural network model building device further includes:

a training unit 1104 for training the second neural network model to obtain a fourth neural network model;

the processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index, and the obtaining the third model generator includes:

the processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model, so as to obtain a third model generator.

Optionally, the neural network model building device further includes:

a determining unit 1105 for determining, by the performance evaluation tool, a first building unit of the first neural network model, the first building unit including at least one of: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model;

the processing unit 1103 is further configured to perform calculation according to the first construction unit to obtain a first theoretical performance index.

Optionally, the training unit 1104 is further configured to train the first neural network model to obtain a fifth neural network model;

the processing unit 1103 is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model, to obtain a second model generator.

Referring to fig. 12, another schematic structural diagram of the neural network model building apparatus according to the embodiment of the present application is provided.

Processor 1201, memory 1202, bus 1205, interface 1204, processor 1201 is connected to memory 1202, interface 1204, bus 1205 is connected to processor 1201, memory 1202, and interface 1204, respectively, interface 1204 is for receiving or transmitting data, processor 1201 is a single-core or multi-core central processing unit, or is a specific integrated circuit, or is one or more integrated circuits configured to implement embodiments of the invention. The memory 1202 may be a random access memory (random access memory, RAM) or a nonvolatile memory (non-volatile memory), such as at least one hard disk memory. Memory 1202 is used to store computer-executable instructions. Specifically, the computer-executable instructions may include a program 1203.

In this embodiment, when the processor 1201 invokes the program 1203, the neural network model building apparatus in fig. 12 may be caused to perform the operations performed by the neural network model building apparatus in the embodiment shown in fig. 6 or fig. 9, which are not described herein.

It should be understood that the processor mentioned in the neural network model building apparatus in the above embodiment of the present application, or the processor provided in the above embodiment of the present application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application-specific integrated circuits (ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the number of processors in the neural network model building apparatus in the above embodiments in the present application may be one or more, and may be adjusted according to the actual application scenario, which is merely illustrative and not limiting. The number of the memories in the embodiment of the present application may be one or more, and may be adjusted according to the actual application scenario, which is only illustrative and not limiting.

It should be further noted that, when the neural network model building apparatus includes a processor (or a processing unit) and a memory, the processor in the present application may be integrated with the memory, or the processor and the memory may be connected through an interface, which may be adjusted according to an actual application scenario, and is not limited.

The present application further provides a computer program or a computer program product comprising a computer program, where the computer program when executed on a computer causes the computer to implement the method flow related to the neural network model building apparatus in any of the above method embodiments.

The present application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a computer, implements the method flow related to the neural network model building apparatus in any of the above method embodiments.

In the various embodiments described above in fig. 6-9, may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The names of the messages/frames/information, modules or units, etc. provided in the embodiments of the present application are only examples, and other names may be used as long as the roles of the messages/frames/information, modules or units, etc. are the same.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the embodiments of the present application, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that in the description of the present application, unless otherwise indicated, "/" means that the associated object is an "or" relationship, e.g., A/B may represent A or B; the term "and/or" in this application is merely an association relation describing an association object, and means that three kinds of relations may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural.

The word "if" or "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

The neural network model construction method is characterized by comprising the following steps of:

constructing a first neural network model by a first model generator;

acquiring a first performance index of the first neural network model when the first neural network model runs on a target chip according to the first neural network model;

Adjusting the first model generator according to the first performance index to obtain a second model generator;

a second neural network model is constructed by the second model generator, the second performance index of the second neural network model being better than the first performance index.
The method of claim 1, wherein the first performance index is a first theoretical performance index that represents a theoretical value of a performance index of the first neural network model when running on a target chip, and the second performance index is a second theoretical performance index that is better than the first theoretical performance index.
The method of claim 2, wherein after constructing a second neural network model by the second model generator, the method further comprises:

acquiring a first measured performance index, wherein the first measured performance index represents a measured value of a performance index of the second neural network model when the second neural network model operates on the target chip;

adjusting the second model generator according to the first measured performance index to obtain a third model generator;

and constructing a third neural network model through the third model generator, wherein a second measured performance index of the third neural network model is better than the first measured performance index.
A method according to claim 3, wherein prior to obtaining the first measured performance indicator, the method further comprises:

training the second neural network model to obtain a fourth neural network model;

adjusting the second model generator according to the first measured performance index to obtain a third model generator comprising:

and adjusting the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain the third model generator.
The method of any one of claims 2 to 4, wherein the first performance index is a first theoretical performance index, and wherein obtaining, from the first neural network model, the first performance index of the first neural network model when running on a target chip comprises:

the first theoretical performance index is obtained through a performance evaluation tool, wherein the performance evaluation tool comprises a calculation function, and the calculation function is used for calculating the first neural network model so as to obtain the first theoretical performance index.
The method of claim 5, wherein obtaining the first theoretical performance metric via a performance assessment tool comprises:

Determining, by the performance evaluation tool, a first building element of the first neural network model, the first building element comprising at least one of: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model;

and calculating according to the first construction unit to obtain the first theoretical performance index.
The method according to any one of claims 2 to 5, wherein after constructing the first neural network model by the first model generator, the method further comprises:

training the first neural network model to obtain a fifth neural network model;

adjusting the first model generator according to the first performance index to obtain a second model generator, wherein the obtaining comprises the following steps:

and adjusting the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
The method according to any one of claims 2 to 6, wherein the first theoretical performance index and the second theoretical performance index each comprise at least one of: theoretical vector module limit, theoretical memory limit, theoretical cube module utilization, theoretical high-speed parallel multiply accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.
The method according to any one of claims 3 to 6, wherein the first measured performance index and the second measured performance index each comprise at least one of: the method comprises the steps of actually measuring a vector module limit, actually measuring a memory limit, actually measuring a cube module utilization rate, actually measuring a MAC utilization rate of a high-speed parallel multiplication accumulator, actually measuring the cube module operation times and actually measuring the vector module operation times.
A neural network model building apparatus, comprising:

a building unit for building a first neural network model through a first model generator;

the acquisition unit is used for acquiring a first performance index of the first neural network model when the first neural network model runs on a target chip according to the first neural network model;

the processing unit is used for adjusting the first model generator according to the first performance index to obtain a second model generator;

the building unit is further configured to build, by the second model generator, a second neural network model, a second performance index of which is better than the first performance index.
The neural network model building apparatus of claim 10, wherein the first performance index is a first theoretical performance index that represents a theoretical value of a performance index of the first neural network model when running on a target chip, and the second performance index is a second theoretical performance index that is better than the first theoretical performance index.
The neural network model building device according to claim 11, wherein the obtaining unit is specifically configured to obtain a first measured performance index, where the first measured performance index represents a measured value of a performance index of the second neural network model when the second neural network model is running on the target chip;

the processing unit is further used for adjusting the second model generator according to the first measured performance index to obtain a third model generator;

the construction unit is further configured to construct a third neural network model through the third model generator, where a second measured performance index of the third neural network model is better than the first measured performance index.
The neural network model building apparatus according to claim 12, characterized in that the neural network model building apparatus further comprises:

the training unit is used for training the second neural network model to obtain a fourth neural network model;

the processing unit is further configured to adjust the second model generator according to the first measured performance index, and obtaining a third model generator includes:

the processing unit is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model, to obtain the third model generator.
The neural network model building device according to any one of claims 11 to 13, wherein the first performance index is a first theoretical performance index, and the obtaining unit is specifically configured to obtain the first theoretical performance index through a performance evaluation tool, and the performance evaluation tool includes a calculation function for calculating the first neural network model to obtain the first theoretical performance index.
The neural network model building apparatus of claim 14, further comprising:

a determining unit for determining, by the performance evaluation tool, a first building unit of the first neural network model, the first building unit comprising at least one of: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model;

the processing unit is further configured to perform calculation according to the first construction unit, so as to obtain the first theoretical performance index.
The neural network model building apparatus according to any one of claims 11 to 15, wherein the training unit is further configured to train the first neural network model to obtain a fifth neural network model;

The processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model, to obtain the second model generator.
The neural network model building apparatus according to any one of claims 11 to 16, wherein the first theoretical performance index and the second theoretical performance index each include at least one of: theoretical vector module limit, theoretical memory limit, theoretical cube module utilization, theoretical high-speed parallel multiply accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.
The neural network model building apparatus according to any one of claims 12 to 16, wherein the first measured performance index and the second measured performance index each include at least one of: the method comprises the steps of actually measuring a vector module limit, actually measuring a memory limit, actually measuring a cube module utilization rate, actually measuring a MAC utilization rate of a high-speed parallel multiplication accumulator, actually measuring the cube module operation times and actually measuring the vector module operation times.
A computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method of any of claims 1 to 9.