WO2022021199A1

WO2022021199A1 - Neural network model construction method and device therefor

Info

Publication number: WO2022021199A1
Application number: PCT/CN2020/105773
Authority: WO
Inventors: 袁宏辉; 伍玮翔; 钟钊
Original assignee: 华为技术有限公司
Priority date: 2020-07-30
Filing date: 2020-07-30
Publication date: 2022-02-03
Also published as: CN116261729A

Abstract

A neural network model construction method and a device therefor, which are used in constructing a neural network. The method comprises: constructing a first neural network model by means of a first model generator (601), obtaining, according to the first neural network model, a first performance indicator when the first neural network model is running on a target chip (602), adjusting the first model generator according to the first performance indicator, and obtaining a second model generator (603), and constructing a second neural network model by means of a second model generator, where a second performance indicator for the second neural network model is superior to the first performance indicator. By means of obtaining a first theoretical performance indicator when a first neural network model is running on a target chip, and adjusting a first model generator according to the first theoretical performance indicator, a neural network model with a superior hardware performance indicator when running on the target chip can thereby be constructed.

Description

A method for constructing a neural network model and its device

technical field

The present application relates to the field of artificial intelligence, and in particular, to a method and device for constructing a neural network model.

Background technique

Deep neural networks have achieved remarkable achievements in the processing and analysis tasks of various media signals such as image, video and speech this year. A well-performing neural network often has a delicate network structure, which requires a lot of effort to design by human experts with high skills and rich experience.

The structure search of neural network, that is, the construction of neural network model changes this manual design mode, automatically searches the neural network structure, and obtains a neural network structure with excellent performance, which is achieved in tasks such as image recognition, image semantic segmentation and natural language processing. excellent results.

In the traditional structure search, the neural network structure search is trained in a certain chip environment according to the target indicators of the task (such as the model test accuracy of image classification, image segmentation and other applications), and this is based on a certain chip environment When the trained neural network structure search is used in other chip environments, due to the different chip parameters, there will be compatibility problems when the neural network structure search is run, such as high time consumption, low chip utilization, and so on.

SUMMARY OF THE INVENTION

The embodiments of the present application provide a method and device for constructing a neural network model, which are used to obtain a first theoretical performance index of the first neural network model when the first neural network model runs on a target chip when a model generator is used to generate a neural network model, And according to the first theoretical performance index, the corresponding weight in the first model generator is adjusted, so as to construct a neural network model with better hardware performance index when running in the target chip.

A first aspect of the embodiments of the present application provides a method for constructing a neural network model.

The neural network model constructing apparatus constructs a first neural network model through a first model generator preset in the neural network model constructing apparatus, and the first neural network model is constructed by the first model generator based on each construction unit.

After the first model generator builds the first neural network model, the neural network model building apparatus obtains, according to the first neural network model, the first performance index when the first neural network model runs on the target chip.

The apparatus for constructing a neural network model adjusts the first model generator according to the first performance index to obtain a second model generator. After obtaining the second model generator, the neural network model building device builds a second neural network model according to the second model generator, and the second performance index of the second neural network model is better than the first performance index, that is The performance index of the second neural network model when running on the target chip is better than that of the first neural network model when running on the target chip.

In the embodiment of the present application, by acquiring the first performance index of the first neural network model when running on the target chip, and adjusting the first model generator according to the first performance index, the hardware performance index when running on the target chip is constructed. A better second neural network model.

Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner,

After the first model generator builds the first neural network model, the neural network model building device obtains a theoretical performance index of the first neural network model, where the first theoretical performance index indicates that the first neural network model runs on the target chip The theoretical value of the performance index.

The apparatus for constructing a neural network model adjusts the first model generator according to the first theoretical performance index to obtain a second model generator. After obtaining the second model generator, the neural network model construction device constructs a second neural network model according to the second model generator, and the second theoretical performance index of the second neural network model, that is, the second neural network is on the target chip. The theoretical value of the performance index at runtime is better than the first theoretical performance index.

In the embodiment of the present application, by acquiring the first theoretical performance index of the first neural network model when running on the target chip, and adjusting the first model generator according to the first theoretical performance index, the hardware running on the target chip is constructed. The second neural network model with better performance indicators.

Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner, after the neural network model construction apparatus constructs the second neural network model through the second model generator, the neural network model construction apparatus further Obtain a first measured performance index, where the first measured performance index represents the measured value of the performance index when the second neural network model runs on the target chip, that is, the obtained target chip after the second neural network model runs on the target chip measured performance indicators.

The neural network model construction device adjusts the corresponding weighting factor in the second model generator according to the first measured performance index to obtain a third model generator. After obtaining the third model generator, the neural network model construction device uses the third model generator to construct a third neural network model, and the second measured performance index of the third neural network model, that is, the third neural network model is on the target chip. The measured value of the performance index at runtime is better than the first measured performance index.

In the embodiment of the present application, by running the second neural network in the actual target chip and obtaining the corresponding measured performance index, and then adjusting the second model generator according to the measured performance index, a more suitable target chip can be obtained. neural network model.

Based on the neural network model construction method according to the first aspect of the embodiments of the present application, in a possible implementation manner, before the neural network model construction apparatus acquires the first measured performance index, the neural network model construction apparatus will further analyze the second neural network The model is trained to obtain a fourth neural network model. After obtaining the fourth neural network model, the neural network model building device will obtain the model performance of the fourth neural network model, and adjust the second model generator according to the model performance of the fourth neural network and the first measured performance index, so as to Get the third model generator.

In the embodiment of the present application, the second model generator is adjusted according to the model performance of the fourth neural network model trained by the second neural network model and the first measured performance index, so that the adjusted third model generator can be guaranteed The generated neural network model better improves the performance index of the neural network model when it runs in the target chip while ensuring the same performance of the model.

Based on the neural network model construction method of the first aspect of the embodiment of the present application, in a possible implementation manner, the neural network model construction apparatus obtains the first theoretical performance index through a performance evaluation tool, the theoretical performance evaluation tool includes a calculation function, the The calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.

In the embodiment of the present application, the apparatus for constructing the neural network model obtains the first theoretical performance index through a performance evaluation tool, which improves the achievability of obtaining the first theoretical performance index.

Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner, the neural network model construction apparatus determines a first construction unit of the first neural network by using a performance evaluation tool, and the first construction unit includes At least one of the following: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model. The convolution layer, pooling layer, activation function, and normalization layer included in the first building unit may be one or more than one. The neural network model construction device performs calculation according to the first construction unit to obtain the first theoretical performance index.

In the embodiment of the present application, the neural network model construction apparatus calculates one or more first construction units of the first neural network model through a performance evaluation tool to obtain the first theoretical performance index, and the first construction unit further includes at least one convolution layer , pooling layer, activation function, and normalization layer, so the theoretical performance indicators of the first neural network model can be adjusted according to each layer that constitutes the first neural network model, which improves flexibility.

Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner, after the neural network model construction apparatus builds the first neural network model, the neural network model construction apparatus will further The model is trained to obtain a fifth neural network model. After obtaining the fifth neural network model, the neural network model building device will obtain the model performance of the fifth neural network model, and adjust the first model generator according to the model performance of the fifth neural network and the first theoretical performance index, to Get the second model generator.

In the embodiment of the present application, the first model generator is adjusted according to the model performance of the fifth neural network model trained by the first neural network model and the first theoretical performance index, so that the adjusted second model generator can be guaranteed The generated neural network model better improves the theoretical performance index of the neural network model running in the target chip while ensuring the same performance of the model.

Based on the neural network model construction method of the first aspect of the embodiment of the present application, in a possible implementation manner, the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module bound (vector bound ), theoretical memory bound, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module (vector cycle) number of operations, theoretical vector module (cube) number of operations.

In the embodiments of the present application, the specific designations of the first theoretical performance index and the second theoretical performance index are exemplarily described, which improves the achievability of the solution.

Based on the neural network model construction method of the first aspect of the embodiment of the present application, in a possible implementation manner, the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured Memory limit, measured cube module utilization, measured high-speed parallel multiply-accumulator MAC utilization, measured cube module operation times, and measured vector module operation times.

In the embodiment of the present application, the specific designations of the first measured performance index and the second measured performance index are exemplarily described, which improves the achievability of the solution.

A second aspect of the embodiments of the present application provides an apparatus for constructing a neural network model.

A device for constructing a neural network model, comprising:

a construction unit for constructing a first neural network model by a first model generator;

an obtaining unit, configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;

a processing unit, configured to adjust the first model generator according to the first performance index to obtain the second model generator;

The construction unit is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.

Optionally, the first performance index is a first theoretical performance index, the first theoretical performance index represents a theoretical value of the performance index when the first neural network model runs on the target chip, and the second performance index is a second theoretical performance index, The second theoretical performance index is better than the first theoretical performance index.

Optionally, the obtaining unit is specifically configured to obtain a first measured performance index, where the first measured performance index represents an actual measured value of the performance index when the second neural network model runs on the target chip;

The processing unit is further configured to adjust the second model generator according to the first measured performance index to obtain the third model generator;

The construction unit is further configured to construct a third neural network model through the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.

Optionally, the apparatus for constructing the neural network model further includes:

a training unit for training the second neural network model to obtain the fourth neural network model;

The processing unit is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:

The processing unit is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.

Optionally, the first performance index is a first theoretical performance index, and the obtaining unit is specifically configured to obtain the first theoretical performance index through a performance evaluation tool, the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model. , to obtain the first theoretical performance index.

A determination unit, used for determining a first construction unit of the first neural network model by a performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a pooling layer of the first neural network model, The activation function of the first neural network model and the normalization layer of the first neural network model;

The processing unit is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.

Optionally, the training unit is also used to train the first neural network model to obtain the fifth neural network model;

The processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.

Optionally, the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module limit, a theoretical memory limit, a theoretical cube module utilization rate, and a theoretical high-speed parallel multiply-accumulator MAC utilization rate. , the number of operations of the theoretical cube module, and the number of operations of the theoretical vector module.

Optionally, the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured memory limit, the measured cube module utilization rate, and the measured high-speed parallel multiply-accumulator MAC utilization rate. , the measured number of operations of the cube module, and the measured number of operations of the vector module.

A third aspect of the embodiments of the present application provides an apparatus for constructing a neural network model, including:

A processor, a memory, and an input-output interface, the processor and the memory are connected to the input-output interface; the memory is used to store program codes; the processor executes the first aspect of the present application when calling the program codes in the memory provided method.

A fourth aspect of the embodiments of the present application provides a storage medium. It should be noted that the technical solution of the present invention is essentially or a part that contributes to the prior art, or all or part of the technical solution can be produced in software. In the form of embodiment, the computer software product is stored in a storage medium for storing computer software instructions for the above-mentioned device, which includes a program for executing the above-mentioned first aspect for the metadata storage method.

The storage medium includes: U disk, mobile hard disk, read-only memory (English abbreviation ROM, English full name: Read-Only Memory), random access memory (English abbreviation: RAM, English full name: Random Access Memory), magnetic disk or CD-ROM and other media that can store program codes.

A fifth aspect of the embodiments of the present application provides a computer program product including instructions, which, when executed on a computer, cause the computer to execute the method according to the embodiment of the first aspect of the present application.

Wherein, the processor mentioned in any of the above may be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more An integrated circuit for controlling program execution of the method for port detection in the first aspect.

In the technical solutions provided by the embodiments of the present application, the first theoretical performance index of the first neural network running on the target chip is obtained, and the first model generator is adjusted according to the first theoretical performance index, so that the first model generator can Adjusting according to the performance index of the target chip improves the compatibility of the second neural network model generated by the adjusted second model.

Description of drawings

1 is a schematic diagram of a framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

2 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

3 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

4 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

5 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

6 is a schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

7 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

8 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

9 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;

10 is a schematic structural diagram of an embodiment of an apparatus for constructing a neural network model in an embodiment of the application;

FIG. 11 is another schematic structural diagram of the embodiment of the apparatus for constructing the neural network model in the embodiment of the application;

FIG. 12 is another schematic structural diagram of an embodiment of an apparatus for constructing a neural network model according to an embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

Figure 1 shows a schematic diagram of an artificial intelligence main frame, which describes the overall workflow of an artificial intelligence system and is suitable for general artificial intelligence field requirements.

The above artificial intelligence theme framework will be explained from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis).

The "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".

The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.

(1) Infrastructure:

The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.

(2) Data

The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, video, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.

(3) Data processing

Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.

Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.

Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.

Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.

(4) General ability

After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing (such as image recognition, object detection, etc.), speech recognition, etc.

(5) Smart products and industry applications

Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, and the productization of intelligent information decision-making and implementation of applications. Its application areas mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminals, etc.

Referring to FIG. 2 , an embodiment of the present application provides a system architecture 200 . The system architecture includes a database 230 and a client device 240 . The data collection device 260 is used to collect data and store it in the database 230 , and the training module 220 generates the target model/rule 201 based on the data maintained in the database 230 .

The work of each layer in the deep neural network can be described by the mathematical expression y=a(W*x+b): from the physical level, the work of each layer in the deep neural network can be understood as through five pairs of input space (set of input vectors) operations to complete the transformation from input space to output space (that is, from the row space of the matrix to the column space), these five operations include: 1. Dimension raising/reducing; 2. Enlarging/reducing; 3. Rotation; 4, translation; 5, "bend". Among them, the operations of 1, 2, and 3 are completed by W*x, the operation of 4 is completed by +b, and the operation of 5 is implemented by a(). The reason why the word "space" is used here is because the object to be classified is not a single thing, but a type of thing, and space refers to the collection of all individuals of this type of thing. Among them, W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer. This vector determines the spatial transformation from the input space to the output space above, that is, the weight of each layer controls how the space is transformed. The purpose of training a deep neural network is to finally get the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially the method of learning and controlling the spatial transformation, and more specifically, the learning of the weight matrix. In the following embodiments of the present application, the weight matrix can be refined into a set of structural parameters and a set of network parameters. For details, refer to The relevant introduction in Figure 2 below.

Because the output of the deep neural network is expected to be as close to the target value as possible, the weight vector of each layer of neural network can be updated according to the difference between the predicted value and the target value of the current network (of course, in There is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in a deep neural network). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", that is, the loss function or the objective function. The loss function is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, and the training of the neural network can be understood as the process of reducing the loss as much as possible.

The computing module may include a training module 220, and the target model/rule obtained by the training module 220 may be applied to different systems or devices. In FIG. 2, the execution device 210 configures a transceiver 212, which can be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to perform data interaction with external devices, and a "user" can The client device 240 inputs data to the transceiver 212. For example, in the following embodiments of the present application, the client device 240 can send target tasks to the execution device 210, request the execution device to build a neural network, and send the execution device 210 a database for training.

The execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .

The calculation module 211 uses the target model/rule 201 to process the input data. Specifically, the computing module 211 is configured to: construct a first neural network model by using a first model generator, obtain a first performance index of the first neural network model when running on the target chip according to the first neural network model, and obtain a first performance index according to the first performance The index adjusts the first model generator to obtain a second model generator, and a second neural network model is constructed by the second model generator, and the second performance index of the second neural network model is better than the first performance index.

The association function module 21 may specifically be a module for training a model generator.

The association function module 214 may be configured to perform search and construction according to the basic operations included in the search space to obtain the first model generator.

Finally, the transceiver 212 returns the constructed neural network model to the client device 240 to deploy the neural network model in the client device 240 or other devices.

More deeply, the training module 220 can obtain corresponding target models/rules 201 based on different data for different target tasks, so as to provide users with better results.

In the case shown in FIG. 2, the user may manually specify data entered into the execution device 210, for example, operating in an interface provided by the transceiver 212. In another case, the client device 240 can automatically input data to the transceiver 212 and obtain the result. If the client device 240 automatically enters data and needs to obtain the user's authorization, the user can set the corresponding permission in the client device 240 . The user can view the result output by the execution device 210 on the client device 240, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 240 can also act as a data collection end to store the collected data associated with the target task into the database 230 .

It should be noted that FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2 , the data storage system 250 is an external memory relative to the execution device 210 . In other scenarios, the data storage system 250 may also be placed in the execution device 210 .

Referring to FIG. 3 , an embodiment of the present application provides a system architecture 300 . The execution device 210 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 may be arranged on a physical site, or distributed in multiple on the physical site. The execution device 210 can use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the steps of the neural network model construction method corresponding to FIGS. 6-8 below in this application.

A user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.

Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof. Specifically, the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like. The wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, the long term evolution (long term evolution, LTE) system, the global system for mobile communication (global system for mobile communication, GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), Any one or a combination of radio frequency identification technology (radio frequency identification, RFID), long range (Long Range, Lora) wireless communication, and near field communication (near field communication, NFC). The wired network may include an optical fiber communication network or a network composed of coaxial cables, and the like.

In another implementation, one or more aspects of the execution device 210 may be implemented by each local device, for example, the local device 301 may provide the execution device 210 with local data or feedback calculation results.

It should be noted that all the functions of the execution device 210 can also be implemented by the local device. For example, the local device 301 implements the functions of the execution device 210 and provides services for its own users, or provides services for the users of the local device 302 .

Please refer to FIG. 4 , which is a schematic diagram of a framework for constructing a neural network model according to an embodiment of the present application.

The neural network model construction framework includes at least one controller model and a neural network model generated via the controller model. The controller model obtains the architecture of a neural network model through searching, trains the architecture of the neural network model through the training set, and then evaluates the architecture of the neural network model through the validation set to obtain the accuracy. After that, the feedback results (such as accuracy) are returned to the control model, and the controller model is updated using reinforcement learning, so that the controller can generate a better network structure in the next cycle. After this process is repeated many times, a new architecture is generated, tested again, and the feedback results are sent to the controller model for reinforcement learning again. Ultimately, the above controller models will tend to design architectures that achieve higher accuracy in the validation set.

Further, based on the schematic diagram of the neural network model construction framework shown in FIG. 4 , please refer to FIG. 5 , which is a schematic diagram of another neural network model construction framework provided in this embodiment of the present application.

The schematic diagram of the neural network model construction framework includes at least network structure population and performance evaluation tools. Among them, the network structure population contains a variety of neural network model building units. The model generator searches for suitable neural network building units in the network structure population, and constructs the searched neural network building units into a neural network model. It can be understood that the neural network model may include one or more neural network model construction units, which are not specifically limited here.

After the neural network model is constructed, the neural network model is input into the performance evaluation tool to obtain the chip performance index when the neural network model runs on the target chip. Optionally, the neural network model can also be trained on a chip on the cloud to obtain the network structure performance of the neural network structure. It is understandable that the neural network model can also be trained in other chip environments. There is no specific limitation here.

In the continuous iterative process, the network structure population is adjusted through chip performance indicators and network structure performance to obtain a model generator that can build a higher chip performance and better network structure performance. In a possible implementation, the network structure population is updated through Pareto optimization, that is, on the premise of not affecting the performance of the network structure, the constructed neural network model has higher chip performance.

It should be noted that the apparatus for constructing the neural network model in the embodiment of the present application may be a computer device with a chip such as a server, a desktop computer, a notebook computer, and a computer cluster, which is not specifically limited here.

Based on the aforementioned application scenarios, the method for constructing the neural network model provided by the present application will be described below.

Please refer to FIG. 6 , which is a schematic flowchart of a method for constructing a neural network model according to an embodiment of the present application.

In step 601, the apparatus for constructing a neural network model constructs a first neural network model through a first model generator.

The apparatus for constructing a neural network model constructs a first neural network model through a first model generator, and the first model generator is preset by the apparatus for constructing a neural network model.

Specifically, in a possible implementation manner, the apparatus for constructing a neural network model constructs a first neural network by using a first model generator and a target task. Before the first model generator builds the first neural network, the neural network model building apparatus acquires the target task. The target task is determined according to its own needs, and may also be determined according to the user's operation. For example, the target task may include: the type of neural network, the accuracy of the neural network, and the like. The type of neural network includes the output type of the neural network requested to be constructed. For example, the target task may be to construct a neural network for face recognition, which is used to recognize faces and output corresponding character information. For another example, the target task may be initiated by the terminal, and a neural network for vehicle identification is constructed to identify the information of the vehicle included in the picture obtained by the cross-modifier.

It should be noted that the neural network in this application may be a convolutional neural network, a recurrent neural network, a perceptron neural network, etc., which may be adjusted according to actual application scenarios, which is not limited in this application.

Corresponding training data can also be acquired while acquiring the target task or after acquiring the target task. The training data is data associated with the target task. The training data may include input data for the target task and real measurement data. For example, if the target task is to construct a face recognition neural network, the training data includes a large number of face pictures and task information corresponding to each picture. Wherein, in a possible implementation manner, the training data may be divided into a training set and a validation set, where the training set represents a picture for training the neural network model, and the validation set represents a picture for verifying the accuracy of the network model.

In a possible implementation manner, before the first model generator constructs the first neural network, hardware constraints are also input to the first model generator, where the hardware constraints include various parameters of the target chip. Specifically, the hardware constraints may include at least one of the following: the frequency of the chip, the size of the chip memory, the size of the chip computing module, the bandwidth between the memory in the chip and the memory, and the like. It can be understood that in the actual application process, more parameters may also be included, which are not specifically limited here. After the hardware constraint is input into the first model generator, the first model generator will construct the first neural network according to the hardware constraint when constructing the first neural network model.

In step 602, the apparatus for constructing the neural network model obtains the first theoretical performance index.

The apparatus for constructing the neural network model obtains a first theoretical performance index, where the first theoretical performance index represents a theoretical value of the performance index of the first neural network model running on the target chip, that is, the theoretical performance index is a theoretical performance index.

The target chip can be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA) ), application specific integrated circuit (ASIC) or general-purpose processor, etc. For example, the first theoretical performance index is a theoretical performance index when the CPU runs the first neural network.

In a possible implementation manner, the first theoretical performance index includes at least one of the following: theoretical vector bound, theoretical memory bound, theoretical cube utilization, Theoretical high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization, theoretical cube module operations (cube cycle), theoretical vector module operations (vector cycle), L1 and L2 Memory fusion effect, compute batch effect, tiling strategy and its performance, performance effect under mixed precision, performance effect under different modes of data flow, cycles of each operator or network layer in the neural network model Or delay, the total number of cycles or delay of the entire neural network model.

In a possible implementation manner, the neural network model building apparatus obtains the first theoretical performance index through a performance evaluation tool, and the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index. Performance. Specifically, the performance evaluation tool may be a tool in the form of software, and is used to obtain the corresponding performance index when the target chip runs the first neural network. For example, in a preferred manner, the performance evaluation tool is PTM, and the apparatus for constructing a neural network model obtains theoretical performance indicators through PTM. In a practical application process, the performance evaluation tool may also exist in other forms, for example, a hardware module, which is not specifically limited here.

Specifically, in a possible implementation manner, the device for constructing a neural network model takes at least one component unit (the component unit may include at least one operator or a layer as a unit) composing the first neural network as a first A building unit is divided, and the neural network is divided into one or more first building units. The multiple first building units are input into a performance evaluation tool, and the performance evaluation tool performs calculations according to the multiple first building units and the parameters of the target chip to obtain theoretical performance indicators of each first building unit. Further, the theoretical performance indicators of the first building units are superimposed, that is, the first theoretical performance indicators of the target chip running the first neural network. For example, the theoretical performance index is the time-consuming when the target chip runs the first neural network, and one building unit is an operator in the first neural network, then the performance evaluation tool calculates the time-consuming corresponding to each operator, and then Then superimpose, and finally obtain the time-consuming corresponding to all operators in the entire neural network, that is, obtain the time-consuming when the target chip runs the first neural network. Among them, the time-consuming of the whole network can be calculated by the following formula:

Time-consuming of the whole network = ∑ ⁿ f (the performance evaluation tool estimates the time-consuming of operators at each layer)

It should be noted that other performance indicators can also be achieved by similar formulas. For example, when calculating the number of cube cycles, it can be calculated by the following formula:

The entire network cube cycle=∑ ⁿ f (the performance evaluation tool estimates the cube cycle of each layer operator)

Specifically, in a possible implementation manner, the neural network model construction device inputs the entire first neural network model into the performance evaluation tool, and the performance evaluation tool performs calculation according to the first neural network model and the parameters of the target chip, and obtains The theoretical performance index of the first neural network model is the first theoretical performance index of the target chip running the first neural network. For example, if the first theoretical performance index is the time consumed when the target chip runs the first neural network, the performance evaluation tool obtains the time consumed when the target chip runs the first neural network by calculating the time consuming of the first neural network.

In a possible implementation manner, the apparatus for constructing the neural network model obtains the first theoretical performance index by inputting the data stream into the performance evaluation tool. Specifically, the neural network model construction apparatus determines one or more first construction units of the first neural network model, where the first construction units include at least one of the following: a convolution layer of the first neural network model, a The pooling layer, the activation function of the first neural network model, and the normalization layer of the first neural network model. Then, according to the one or more first building units, different layers in the neural network are divided into a dimension of a task, for example, one or more convolution layers, pooling layers, activation functions, normalization layers, etc. are divided into one dimension , in a preferred manner, a convolution layer, a pooling layer, an activation function, and a normalization layer are divided into one dimension. And analyze the data flow according to each task, such as when the chip runs a neural network, different types of chip memory will process different data. For example, in the process from L2 to L1, L2 needs to transmit data to L1, then analyze the efficiency of this process under the dimension of a neural network. The process may also include: L1->LOA/L0B, UB->L1, and so on. Then, according to the data flow calculation, analyze the pipelines through which each dimension in the neural network performs data transmission for calculation and data transmission. Then, theoretical performance indicators such as the number of cycles, cube utilization, mac utilization, vector bound, memory bound (DDR, L2), etc. are calculated according to the processing of each pipeline (pipe) and data.

memory bound: The situation where the smallest memory unit in the chip is reused. When the output feature map is larger than the minimum memory unit of the chip, the minimum memory unit of the chip needs to be reused multiple times. In the embodiment of the present application, the output feature map can be reduced by the corresponding weight when constructing the neural network to adapt to the size of the minimum memory unit of the chip, thereby improving the use efficiency of the minimum memory unit of the chip.

vector bound: The situation of multiplexing of the vector module in the chip.

Cube utilization: The cube module is mainly used to calculate the matrix. The cube utilization refers to the number of times the cube module is used per unit time. During neural network training, if you want to improve the utilization of subsequent cubes, you can set the utilization of cubes as the target task. When training the model, the closer the model is to convergence, the higher the utilization of cubes.

MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization rate: a module in the chip that counts the number of neural network multiplication and addition operations, and the mac utilization rate refers to the number of times the mac module is used per unit time. The method for improving the mac utilization rate in the embodiment of the present application is similar to the method for improving the cube utilization rate, and details are not described herein again.

Number of cube cycles: The number of operations performed by the cube module in the chip.

The number of vector cycles: the number of operations using the vector module.

L1 and L2 memory fusion: L1 and L2 are two different types of chip memory. By setting the corresponding weights for building a neural network, the compatibility of L1 and L2 in processing the data corresponding to the neural network can be improved.

Compute batch: The number of batch processing feature maps in the same time period. By setting the corresponding weights for constructing the neural network, the compute batch can be improved as much as possible on the premise of ensuring the running performance of the chip.

In addition to the performance indicators of the chip described above, the performance indicators in the embodiments of the present application may also include more parameters, as long as they can affect the operating efficiency of the neural network on the chip, which is not specifically limited here.

In step 603, the apparatus for constructing a neural network model adjusts the first model generator according to the first theoretical performance index to obtain a second model generator.

After obtaining the first theoretical performance index, the neural network model building apparatus adjusts the first model generator according to the first theoretical performance index to obtain a second model generator, and the second model generator can construct a A neural network with better theoretical performance indicators when running on the target chip.

In a possible implementation manner, the weighting factor in the corresponding first model generator can be adjusted according to the first theoretical performance index, so that the first model generator can construct a better performance index when running in the target chip The second neural network model, and the second theoretical performance index of the second neural network model is better than the first theoretical performance index. For example, when the theoretical performance index is the theoretical time-consuming, when adjusting the model generator, adjust the corresponding time-consuming weight factor, so that the model generator can build a neural network that takes less time to run the target chip. . It can be understood that when the theoretical performance indicators are multiple theoretical performance indicators, multiple weighting factors are adjusted correspondingly, so that the first model generator constructs a neural network with better performance indicators when the target chip is running.

In a possible implementation manner, the first neural network is trained, and when the first neural network tends to converge, a fifth neural network model is obtained, and performance parameters of the fifth neural network are obtained. The performance parameters of the fifth neural network may include an accuracy rate, a peak signal-to-noise ratio describing a picture, etc., which are not specifically limited here. Further, the weight factor in the first model generator is adjusted according to the performance parameters of the fifth neural network and the first theoretical performance index. In this way, the performance parameters of the neural network that can be generated by the model generator can be guaranteed, and the performance indicators running on the target chip are better.

It should be noted that, as shown in FIG. 7 , steps 601 to 603 are a process in which the first model generator iterates once according to the first theoretical performance index. A model generator capable of constructing a neural network with the optimal theoretical performance index is generated, that is, the second model generator.

In step 604, the apparatus for constructing the neural network model obtains the measured performance index.

After the neural network model building device obtains the second model generator, the neural network model building device obtains the first measured performance index, where the first measured performance index represents the actual performance index when the target chip runs the second neural network.

After the apparatus for constructing the neural network model obtains the second model generator, the second neural network generated by the second model generator can be directly used, and the second model generator can also be improved in a second round. The second round of improvement is based on the performance indicators obtained by the second neural network running on the actual target chip.

In a possible implementation manner, the second neural network model is constructed from the second model generator, and the second neural network model is placed in the target chip to run, and the corresponding actual situation when the target chip runs the second network model is obtained. performance indicators.

The target chip can be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA) ), application specific integrated circuit (ASIC) or general-purpose processor, etc.

In a possible implementation manner, the first measured performance index may include at least one of the following: measured vector bound, measured memory bound, measured cube utilization , the measured high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization rate, the measured number of cube module operations (cube cycle), L1 and L2 memory measured fusion effect, measured batch calculation (compute batch ) effect, measured tiling strategy and its performance, measured performance effect under mixed precision, measured performance effect under different data flow modes, measured delay of each operator or network layer cycles in the neural network model, measured It can be understood that in the actual application process, more chip performance indicators can be included, such as the measured number of vector module operations (vector cycle), as long as it is It is sufficient to reflect the performance index reflected by the target chip running the second network performance, which is not specifically limited here.

In a possible implementation manner, the system where the target chip is located can obtain the performance index corresponding to the target chip running the second neural network in the form of software or hardware, and the specific method of obtaining the performance index is not limited here.

In step 605, the neural network model building apparatus adjusts the second model generator according to the first measured performance index to obtain a third model generator.

After the neural network model construction device obtains the first measured performance index, the neural network model construction device adjusts the second model generator according to the first measured performance index to obtain a third model generator, which can construct a third model generator. Three neural network models, the second measured performance index of the third neural network model is better than the first measured performance index.

In a possible implementation manner, the weight factor in the corresponding second model generator can be adjusted according to the measured performance index, so that the second model generator can construct a neural network with better performance index when running in the target chip . For example, when the first measured performance index is the measured time-consuming when the second neural network runs on the target chip, when adjusting the second model generator, the corresponding time-consuming weight factor is adjusted so that the model generates The processor builds a neural network that takes less time to run on the target chip. It can be understood that when the first measured theoretical performance index is multiple indices, the multiple weighting factors are adjusted correspondingly, so that the second model generator can construct a neural network with better performance indices when running on the target chip. .

In a possible implementation manner, the second neural network is trained, and when the second neural network tends to converge, a fourth neural network model is obtained, and performance parameters of the fourth neural network are obtained. The performance parameters of the fourth neural network may include an accuracy rate, a peak signal-to-noise ratio describing a picture, etc., which are not specifically limited here. Further, the weight factor in the first model generator is adjusted according to the performance parameter of the fourth neural network and the first measured performance index. This can ensure that the performance parameters of the neural network generated by the model generator are excellent, and the performance indicators running on the target chip are better.

It should be noted that, as shown in FIG. 8 , steps 604 to 605 are a process in which the second model generator iterates once according to the measured performance indicators. The model generator of the neural network with the best measured performance index, that is, the third model generator.

In this embodiment of the present application, steps 604 to 605 are optional steps, and when steps 604 to 605 are not performed, the second neural network model constructed by the second model generator is used as the neural network model used on the target chip .

It should be noted that, in the embodiment of the present application, when obtaining the theoretical performance index and the measured performance index, it can be obtained in the neural network model construction device, or it can be obtained by other computer equipment, and then sent to the neural network model construction device. Yes, there is no specific limitation here.

In the embodiment of the present application, by obtaining the theoretical performance index of the target chip running the first neural network, and adjusting the corresponding weights in the first model generator according to the theoretical performance index, a better hardware performance index when running on the target chip is constructed. neural network model.

The above-mentioned embodiment shown in FIG. 6 is an application scenario of the embodiment of the present application, and another application scenario of the neural network model construction method of the embodiment of the present application is described below.

Please refer to FIG. 9 , which is another schematic flowchart of a method for constructing a neural network model according to an embodiment of the present application.

In this embodiment, the structure search space is taken as an example to represent the model generator for description.

According to the needs of application scenarios and tasks, it is necessary to construct a search space according to indicators before constructing a neural network model. Each constituent unit in the search space is a code indicating the construction of the neural network structure. In the process of structure search, each sampling utilizes the subdivision relationship of the unevaluated constituent units in the search space and the network model constructed based on the evaluated constituent units. Specifically, as shown in FIG. 9 , the part in the dotted box is the step of constructing the initial search space. The search space consists of multiple network structures, which consist of multiple constituent units. After constructing initial search spaces of multiple network structures, unreasonable network structures are screened based on preset rules or application scenario tasks to obtain a second search space. Then through the network structure clustering, a new third search space is formed. Then, the cluster center structure is trained based on the third search space, the unevaluated structure is modeled according to the training loss value of the evaluated network structure, and then several network structures are selected for training based on Bayesian optimization, and then according to the training loss The value models the unevaluated structure, and this loops until the search space is constructed.

After the initial search space is constructed, the method for constructing the neural network model in the embodiment shown in FIG. 6 is used to further adjust the search space, and details are not repeated here.

In the embodiment of the present application, the performance of the search space is improved by initially constructing and training the search space.

The method for constructing a neural network model in the embodiment of the present application has been described above, and the apparatus for constructing a neural network model in the embodiment of the present application is described below.

Please refer to FIG. 10 , which is a schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.

A device for constructing a neural network model, comprising:

A construction unit 1001, configured to construct a first neural network model by a first model generator;

an obtaining unit 1002, configured to obtain, according to the first neural network model, a first performance index when the first neural network model runs on the target chip;

a processing unit 1003, configured to adjust the first model generator according to the first performance index to obtain the second model generator;

The constructing unit 1001 is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.

In this embodiment, the operations performed by each unit of the apparatus for constructing a neural network model are similar to those described in the foregoing embodiments shown in FIG. 6 and FIG. 7 , and details are not repeated here.

Please refer to FIG. 11 , which is another schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.

A device for constructing a neural network model, comprising:

a construction unit 1101, configured to construct a first neural network model by a first model generator;

an obtaining unit 1102, configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;

a processing unit 1103, configured to adjust the first model generator according to the first performance index to obtain the second model generator;

The constructing unit 1101 is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.

Optionally, the obtaining unit 1102 is specifically configured to obtain the first measured performance index, where the first measured performance index represents the measured value of the performance index when the second neural network model runs on the target chip;

The processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index to obtain the third model generator;

The constructing unit 1101 is further configured to construct a third neural network model through the third model generator, where the second measured performance index of the third neural network model is better than the first measured performance index.

The training unit 1104 is used for training the second neural network model to obtain the fourth neural network model;

The processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:

The processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.

Determining unit 1105, configured to determine a first construction unit of the first neural network model by using a performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a pooling layer of the first neural network model , the activation function of the first neural network model, the normalization layer of the first neural network model;

The processing unit 1103 is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.

Optionally, the training unit 1104 is further configured to train the first neural network model to obtain the fifth neural network model;

The processing unit 1103 is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.

Please refer to FIG. 12 , which is another schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.

The processor 1201, the memory 1202, the bus 1205, and the interface 1204. The processor 1201 is connected to the memory 1202 and the interface 1204. The bus 1205 is respectively connected to the processor 1201, the memory 1202, and the interface 1204. The interface 1204 is used to receive or send data. The processor 1201 be a single-core or multi-core central processing unit, or be a specific integrated circuit, or be one or more integrated circuits configured to implement embodiments of the invention. The memory 1202 may be random access memory (RAM), or may be non-volatile memory (non-volatile memory), such as at least one hard disk memory. Memory 1202 is used to store computer-executable instructions. Specifically, the program 1203 may be included in the computer-executed instructions.

In this embodiment, when the processor 1201 calls the program 1203, it can make the neural network model construction apparatus in FIG. 12 execute the operations performed by the neural network model construction apparatus in the embodiment shown in FIG. 6 or FIG. 9, specifically here No longer.

It should be understood that the processor mentioned in the apparatus for constructing a neural network model in the above embodiments of the present application, or the processor provided by the above embodiments of the present application, may be a central processing unit (CPU), or other General-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be understood that the number of processors in the apparatus for constructing a neural network model in the above embodiments of the present application may be one or more, and may be adjusted according to actual application scenarios. limited. The number of memories in this embodiment of the present application may be one or multiple, and may be adjusted according to actual application scenarios, which is merely illustrative and not limiting.

It should also be noted that when the neural network model building apparatus includes a processor (or a processing unit) and a memory, the processor in this application may be integrated with the memory, or the processor and the memory may be connected through an interface, It can be adjusted according to the actual application scenario and is not limited.

The embodiments of the present application also provide a computer program or a computer program product including a computer program, and when the computer program is executed on a computer, the computer will enable the computer to implement the neural network and the neural network in any of the above method embodiments. The method flow related to the model building device.

Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a computer, implements the method process related to the apparatus for constructing a neural network model in any of the above method embodiments.

The various embodiments in the above-mentioned FIGS. 6-9 may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.

The names of messages/frames/information, modules or units, etc. provided in the embodiments of the present application are only examples, and other names may be used, as long as the functions of the messages/frames/information, modules or units are the same.

The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. As used in the embodiments of this application, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that, in the description of this application, unless otherwise specified, "/" indicates that the associated objects are in an "or" relationship, for example, A/B can indicate A or B; in this application, "and" "/or" is just an association relationship that describes an associated object, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. where A and B can be singular or plural.

Depending on the context, the words "if" or "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting." Similarly, the phrases "if determined" or "if detected (the stated condition or event)" can be interpreted as "when determined" or "in response to determining" or "when detected (the stated condition or event)," depending on the context )" or "in response to detection (a stated condition or event)".

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present application.

Claims

A method for constructing a neural network model, comprising:

constructing a first neural network model by a first model generator;

Obtaining the first performance index when the first neural network model runs on the target chip according to the first neural network model;

Adjust the first model generator according to the first performance index to obtain a second model generator;

A second neural network model is constructed by the second model generator, and the second performance index of the second neural network model is better than the first performance index.
The method according to claim 1, wherein the first performance index is a first theoretical performance index, and the first theoretical performance index represents a performance index of the first neural network model when running on a target chip The theoretical value of , the second performance index is a second theoretical performance index, and the second theoretical performance index is better than the first theoretical performance index.
The method according to claim 2, wherein after the second neural network model is constructed by the second model generator, the method further comprises:

obtaining a first measured performance index, where the first measured performance index represents an actual measured value of the performance index when the second neural network model runs on the target chip;

Adjust the second model generator according to the first measured performance index to obtain a third model generator;

A third neural network model is constructed by the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.
The method according to claim 3, wherein before obtaining the first measured performance index, the method further comprises:

training the second neural network model to obtain a fourth neural network model;

Adjusting the second model generator according to the first measured performance index to obtain the third model generator includes:

The third model generator is obtained by adjusting the second model generator according to the first measured performance index and the model performance of the fourth neural network model.
The method according to any one of claims 2 to 4, wherein the first performance index is a first theoretical performance index, and the first neural network model is obtained according to the first neural network model at the target The first performance metrics when running on chip include:

The first theoretical performance index is obtained through a performance evaluation tool, and the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
The method according to claim 5, wherein obtaining the first theoretical performance index by using a performance evaluation tool comprises:

A first construction unit of the first neural network model is determined by the performance evaluation tool, and the first construction unit includes at least one of the following: a convolution layer of the first neural network model, a pooling layer of the first neural network model , the activation function of the first neural network model, the normalization layer of the first neural network model;

Calculation is performed according to the first construction unit to obtain the first theoretical performance index.
The method according to any one of claims 2 to 5, wherein after the first neural network model is constructed by the first model generator, the method further comprises:

training the first neural network model to obtain a fifth neural network model;

Adjusting the first model generator according to the first performance index to obtain the second model generator includes:

The second model generator is obtained by adjusting the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model.
The method according to any one of claims 2 to 6, wherein the first theoretical performance index and the second theoretical performance index respectively comprise at least one of the following: a theoretical vector module limit, a theoretical memory Bounds, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.
The method according to any one of claims 3 to 6, wherein the first measured performance index and the second measured performance index respectively comprise at least one of the following: measured vector module limit, measured memory Bounds, measured cube module utilization, measured high-speed parallel multiply-accumulator MAC utilization, measured cube module operation times, and measured vector module operation times.
A device for constructing a neural network model, comprising:

a construction unit for constructing a first neural network model by a first model generator;

an obtaining unit, configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;

a processing unit, configured to adjust the first model generator according to the first performance index to obtain a second model generator;

The construction unit is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
The apparatus for constructing a neural network model according to claim 10, wherein the first performance index is a first theoretical performance index, and the first theoretical performance index indicates that the first neural network model runs on a target chip The theoretical value of the performance index at the time, the second performance index is a second theoretical performance index, and the second theoretical performance index is better than the first theoretical performance index.
The apparatus for constructing a neural network model according to claim 11, wherein the obtaining unit is specifically configured to obtain a first measured performance index, and the first measured performance index indicates that the second neural network model is at the target The measured value of the performance index when running on the chip;

The processing unit is further configured to adjust the second model generator according to the first measured performance index to obtain a third model generator;

The construction unit is further configured to construct a third neural network model through the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.
The neural network model construction device according to claim 12, wherein the neural network model construction device further comprises:

a training unit for training the second neural network model to obtain a fourth neural network model;

The processing unit is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:

The processing unit is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain the third model generator.
The apparatus for constructing a neural network model according to any one of claims 11 to 13, wherein the first performance index is a first theoretical performance index, and the obtaining unit is specifically configured to obtain the The first theoretical performance index, the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
The neural network model construction device according to claim 14, wherein the neural network model construction device further comprises:

A determination unit, configured to determine a first construction unit of the first neural network model by the performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a first neural network The pooling layer of the model, the activation function of the first neural network model, and the normalization layer of the first neural network model;

The processing unit is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.
The neural network model construction device according to any one of claims 11 to 15, wherein the training unit is further configured to train the first neural network model to obtain a fifth neural network model;

The processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
The apparatus for constructing a neural network model according to any one of claims 11 to 16, wherein the first theoretical performance index and the second theoretical performance index respectively comprise at least one of the following: a theoretical vector module limit , Theoretical memory limit, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.
The neural network model construction device according to any one of claims 12 to 16, wherein the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit , the measured memory limit, the measured cube module utilization, the measured high-speed parallel multiply-accumulator MAC utilization, the measured cube module operation times, and the measured vector module operation times.
A computer storage medium, characterized in that, instructions are stored in the computer storage medium, and when executed on a computer, the instructions cause the computer to execute the method according to any one of claims 1 to 9.