WO2022021199A1 - Neural network model construction method and device therefor - Google Patents

Neural network model construction method and device therefor Download PDF

Info

Publication number
WO2022021199A1
WO2022021199A1 PCT/CN2020/105773 CN2020105773W WO2022021199A1 WO 2022021199 A1 WO2022021199 A1 WO 2022021199A1 CN 2020105773 W CN2020105773 W CN 2020105773W WO 2022021199 A1 WO2022021199 A1 WO 2022021199A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network model
performance index
model
theoretical
Prior art date
Application number
PCT/CN2020/105773
Other languages
French (fr)
Chinese (zh)
Inventor
袁宏辉
伍玮翔
钟钊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2020/105773 priority Critical patent/WO2022021199A1/en
Priority to CN202080104556.9A priority patent/CN116261729A/en
Publication of WO2022021199A1 publication Critical patent/WO2022021199A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present application relates to the field of artificial intelligence, and in particular, to a method and device for constructing a neural network model.
  • Deep neural networks have achieved remarkable achievements in the processing and analysis tasks of various media signals such as image, video and speech this year.
  • a well-performing neural network often has a delicate network structure, which requires a lot of effort to design by human experts with high skills and rich experience.
  • the structure search of neural network that is, the construction of neural network model changes this manual design mode, automatically searches the neural network structure, and obtains a neural network structure with excellent performance, which is achieved in tasks such as image recognition, image semantic segmentation and natural language processing. excellent results.
  • the neural network structure search is trained in a certain chip environment according to the target indicators of the task (such as the model test accuracy of image classification, image segmentation and other applications), and this is based on a certain chip environment
  • the target indicators of the task such as the model test accuracy of image classification, image segmentation and other applications
  • the trained neural network structure search is used in other chip environments, due to the different chip parameters, there will be compatibility problems when the neural network structure search is run, such as high time consumption, low chip utilization, and so on.
  • the embodiments of the present application provide a method and device for constructing a neural network model, which are used to obtain a first theoretical performance index of the first neural network model when the first neural network model runs on a target chip when a model generator is used to generate a neural network model, And according to the first theoretical performance index, the corresponding weight in the first model generator is adjusted, so as to construct a neural network model with better hardware performance index when running in the target chip.
  • a first aspect of the embodiments of the present application provides a method for constructing a neural network model.
  • the neural network model constructing apparatus constructs a first neural network model through a first model generator preset in the neural network model constructing apparatus, and the first neural network model is constructed by the first model generator based on each construction unit.
  • the neural network model building apparatus obtains, according to the first neural network model, the first performance index when the first neural network model runs on the target chip.
  • the apparatus for constructing a neural network model adjusts the first model generator according to the first performance index to obtain a second model generator.
  • the neural network model building device builds a second neural network model according to the second model generator, and the second performance index of the second neural network model is better than the first performance index, that is The performance index of the second neural network model when running on the target chip is better than that of the first neural network model when running on the target chip.
  • the hardware performance index when running on the target chip is constructed.
  • a better second neural network model by acquiring the first performance index of the first neural network model when running on the target chip, and adjusting the first model generator according to the first performance index, the hardware performance index when running on the target chip is constructed.
  • the neural network model building device obtains a theoretical performance index of the first neural network model, where the first theoretical performance index indicates that the first neural network model runs on the target chip The theoretical value of the performance index.
  • the apparatus for constructing a neural network model adjusts the first model generator according to the first theoretical performance index to obtain a second model generator.
  • the neural network model construction device constructs a second neural network model according to the second model generator, and the second theoretical performance index of the second neural network model, that is, the second neural network is on the target chip.
  • the theoretical value of the performance index at runtime is better than the first theoretical performance index.
  • the hardware running on the target chip is constructed.
  • the second neural network model with better performance indicators.
  • the neural network model construction apparatus further Obtain a first measured performance index, where the first measured performance index represents the measured value of the performance index when the second neural network model runs on the target chip, that is, the obtained target chip after the second neural network model runs on the target chip measured performance indicators.
  • the neural network model construction device adjusts the corresponding weighting factor in the second model generator according to the first measured performance index to obtain a third model generator. After obtaining the third model generator, the neural network model construction device uses the third model generator to construct a third neural network model, and the second measured performance index of the third neural network model, that is, the third neural network model is on the target chip. The measured value of the performance index at runtime is better than the first measured performance index.
  • the neural network model construction apparatus will further analyze the second neural network The model is trained to obtain a fourth neural network model. After obtaining the fourth neural network model, the neural network model building device will obtain the model performance of the fourth neural network model, and adjust the second model generator according to the model performance of the fourth neural network and the first measured performance index, so as to Get the third model generator.
  • the second model generator is adjusted according to the model performance of the fourth neural network model trained by the second neural network model and the first measured performance index, so that the adjusted third model generator can be guaranteed
  • the generated neural network model better improves the performance index of the neural network model when it runs in the target chip while ensuring the same performance of the model.
  • the neural network model construction apparatus obtains the first theoretical performance index through a performance evaluation tool, the theoretical performance evaluation tool includes a calculation function, the The calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
  • the apparatus for constructing the neural network model obtains the first theoretical performance index through a performance evaluation tool, which improves the achievability of obtaining the first theoretical performance index.
  • the neural network model construction apparatus determines a first construction unit of the first neural network by using a performance evaluation tool, and the first construction unit includes At least one of the following: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model.
  • the convolution layer, pooling layer, activation function, and normalization layer included in the first building unit may be one or more than one.
  • the neural network model construction device performs calculation according to the first construction unit to obtain the first theoretical performance index.
  • the neural network model construction apparatus calculates one or more first construction units of the first neural network model through a performance evaluation tool to obtain the first theoretical performance index, and the first construction unit further includes at least one convolution layer , pooling layer, activation function, and normalization layer, so the theoretical performance indicators of the first neural network model can be adjusted according to each layer that constitutes the first neural network model, which improves flexibility.
  • the neural network model construction apparatus will further The model is trained to obtain a fifth neural network model.
  • the neural network model building device will obtain the model performance of the fifth neural network model, and adjust the first model generator according to the model performance of the fifth neural network and the first theoretical performance index, to Get the second model generator.
  • the first model generator is adjusted according to the model performance of the fifth neural network model trained by the first neural network model and the first theoretical performance index, so that the adjusted second model generator can be guaranteed
  • the generated neural network model better improves the theoretical performance index of the neural network model running in the target chip while ensuring the same performance of the model.
  • the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module bound (vector bound ), theoretical memory bound, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module (vector cycle) number of operations, theoretical vector module (cube) number of operations.
  • the specific designations of the first theoretical performance index and the second theoretical performance index are exemplarily described, which improves the achievability of the solution.
  • the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured Memory limit, measured cube module utilization, measured high-speed parallel multiply-accumulator MAC utilization, measured cube module operation times, and measured vector module operation times.
  • the specific designations of the first measured performance index and the second measured performance index are exemplarily described, which improves the achievability of the solution.
  • a second aspect of the embodiments of the present application provides an apparatus for constructing a neural network model.
  • a device for constructing a neural network model comprising:
  • a construction unit for constructing a first neural network model by a first model generator
  • an obtaining unit configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;
  • a processing unit configured to adjust the first model generator according to the first performance index to obtain the second model generator
  • the construction unit is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
  • the first performance index is a first theoretical performance index
  • the first theoretical performance index represents a theoretical value of the performance index when the first neural network model runs on the target chip
  • the second performance index is a second theoretical performance index
  • the second theoretical performance index is better than the first theoretical performance index
  • the obtaining unit is specifically configured to obtain a first measured performance index, where the first measured performance index represents an actual measured value of the performance index when the second neural network model runs on the target chip;
  • the processing unit is further configured to adjust the second model generator according to the first measured performance index to obtain the third model generator;
  • the construction unit is further configured to construct a third neural network model through the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.
  • the apparatus for constructing the neural network model further includes:
  • a training unit for training the second neural network model to obtain the fourth neural network model
  • the processing unit is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:
  • the processing unit is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.
  • the first performance index is a first theoretical performance index
  • the obtaining unit is specifically configured to obtain the first theoretical performance index through a performance evaluation tool
  • the performance evaluation tool includes a calculation function
  • the calculation function is used to calculate the first neural network model. , to obtain the first theoretical performance index.
  • the apparatus for constructing the neural network model further includes:
  • a determination unit used for determining a first construction unit of the first neural network model by a performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a pooling layer of the first neural network model, The activation function of the first neural network model and the normalization layer of the first neural network model;
  • the processing unit is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.
  • the training unit is also used to train the first neural network model to obtain the fifth neural network model;
  • the processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
  • the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module limit, a theoretical memory limit, a theoretical cube module utilization rate, and a theoretical high-speed parallel multiply-accumulator MAC utilization rate. , the number of operations of the theoretical cube module, and the number of operations of the theoretical vector module.
  • the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured memory limit, the measured cube module utilization rate, and the measured high-speed parallel multiply-accumulator MAC utilization rate. , the measured number of operations of the cube module, and the measured number of operations of the vector module.
  • a third aspect of the embodiments of the present application provides an apparatus for constructing a neural network model, including:
  • a fourth aspect of the embodiments of the present application provides a storage medium.
  • the technical solution of the present invention is essentially or a part that contributes to the prior art, or all or part of the technical solution can be produced in software.
  • the computer software product is stored in a storage medium for storing computer software instructions for the above-mentioned device, which includes a program for executing the above-mentioned first aspect for the metadata storage method.
  • the storage medium includes: U disk, mobile hard disk, read-only memory (English abbreviation ROM, English full name: Read-Only Memory), random access memory (English abbreviation: RAM, English full name: Random Access Memory), magnetic disk or CD-ROM and other media that can store program codes.
  • a fifth aspect of the embodiments of the present application provides a computer program product including instructions, which, when executed on a computer, cause the computer to execute the method according to the embodiment of the first aspect of the present application.
  • processors mentioned in any of the above may be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more An integrated circuit for controlling program execution of the method for port detection in the first aspect.
  • CPU Central Processing Unit
  • ASIC application-specific integrated circuit
  • the first theoretical performance index of the first neural network running on the target chip is obtained, and the first model generator is adjusted according to the first theoretical performance index, so that the first model generator can Adjusting according to the performance index of the target chip improves the compatibility of the second neural network model generated by the adjusted second model.
  • FIG. 1 is a schematic diagram of a framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application
  • FIG. 2 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application
  • FIG. 3 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application
  • FIG. 4 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application
  • FIG. 5 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application
  • FIG. 6 is a schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application
  • FIG. 7 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application.
  • FIG. 8 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application.
  • FIG. 9 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an embodiment of an apparatus for constructing a neural network model in an embodiment of the application
  • FIG. 11 is another schematic structural diagram of the embodiment of the apparatus for constructing the neural network model in the embodiment of the application.
  • FIG. 12 is another schematic structural diagram of an embodiment of an apparatus for constructing a neural network model according to an embodiment of the present application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main frame, which describes the overall workflow of an artificial intelligence system and is suitable for general artificial intelligence field requirements.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
  • the infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, video, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
  • machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
  • Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing (such as image recognition, object detection, etc.), speech recognition, etc.
  • algorithms or a general system such as translation, text analysis, computer vision processing (such as image recognition, object detection, etc.), speech recognition, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, and the productization of intelligent information decision-making and implementation of applications. Its application areas mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminals, etc.
  • an embodiment of the present application provides a system architecture 200 .
  • the system architecture includes a database 230 and a client device 240 .
  • the data collection device 260 is used to collect data and store it in the database 230 , and the training module 220 generates the target model/rule 201 based on the data maintained in the database 230 .
  • W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer.
  • This vector determines the spatial transformation from the input space to the output space above, that is, the weight of each layer controls how the space is transformed.
  • the purpose of training a deep neural network is to finally get the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially the method of learning and controlling the spatial transformation, and more specifically, the learning of the weight matrix.
  • the weight matrix can be refined into a set of structural parameters and a set of network parameters. For details, refer to The relevant introduction in Figure 2 below.
  • the weight vector of each layer of neural network can be updated according to the difference between the predicted value and the target value of the current network (of course, in There is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in a deep neural network). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", that is, the loss function or the objective function.
  • the loss function is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, and the training of the neural network can be understood as the process of reducing the loss as much as possible.
  • the computing module may include a training module 220, and the target model/rule obtained by the training module 220 may be applied to different systems or devices.
  • the execution device 210 configures a transceiver 212, which can be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to perform data interaction with external devices, and a "user" can
  • the client device 240 inputs data to the transceiver 212.
  • the client device 240 can send target tasks to the execution device 210, request the execution device to build a neural network, and send the execution device 210 a database for training.
  • the execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .
  • the calculation module 211 uses the target model/rule 201 to process the input data.
  • the computing module 211 is configured to: construct a first neural network model by using a first model generator, obtain a first performance index of the first neural network model when running on the target chip according to the first neural network model, and obtain a first performance index according to the first performance
  • the index adjusts the first model generator to obtain a second model generator, and a second neural network model is constructed by the second model generator, and the second performance index of the second neural network model is better than the first performance index.
  • the association function module 21 may specifically be a module for training a model generator.
  • the association function module 214 may be configured to perform search and construction according to the basic operations included in the search space to obtain the first model generator.
  • the transceiver 212 returns the constructed neural network model to the client device 240 to deploy the neural network model in the client device 240 or other devices.
  • the user may manually specify data entered into the execution device 210, for example, operating in an interface provided by the transceiver 212.
  • the client device 240 can automatically input data to the transceiver 212 and obtain the result. If the client device 240 automatically enters data and needs to obtain the user's authorization, the user can set the corresponding permission in the client device 240 .
  • the user can view the result output by the execution device 210 on the client device 240, and the specific presentation form can be a specific manner such as display, sound, and action.
  • the client device 240 can also act as a data collection end to store the collected data associated with the target task into the database 230 .
  • FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 250 is an external memory relative to the execution device 210 . In other scenarios, the data storage system 250 may also be placed in the execution device 210 .
  • an embodiment of the present application provides a system architecture 300 .
  • the execution device 210 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 may be arranged on a physical site, or distributed in multiple on the physical site.
  • the execution device 210 can use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the steps of the neural network model construction method corresponding to FIGS. 6-8 below in this application.
  • a user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 .
  • Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.
  • Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like.
  • the wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, the long term evolution (long term evolution, LTE) system, the global system for mobile communication (global system for mobile communication, GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), Any one or a combination of radio frequency identification technology (radio frequency identification, RFID), long range (Long Range, Lora) wireless communication, and near field communication (near field communication, NFC).
  • the wired network may include an optical fiber communication network or a network composed of coaxial cables, and the like.
  • one or more aspects of the execution device 210 may be implemented by each local device, for example, the local device 301 may provide the execution device 210 with local data or feedback calculation results.
  • the local device 301 implements the functions of the execution device 210 and provides services for its own users, or provides services for the users of the local device 302 .
  • FIG. 4 is a schematic diagram of a framework for constructing a neural network model according to an embodiment of the present application.
  • the neural network model construction framework includes at least one controller model and a neural network model generated via the controller model.
  • the controller model obtains the architecture of a neural network model through searching, trains the architecture of the neural network model through the training set, and then evaluates the architecture of the neural network model through the validation set to obtain the accuracy. After that, the feedback results (such as accuracy) are returned to the control model, and the controller model is updated using reinforcement learning, so that the controller can generate a better network structure in the next cycle. After this process is repeated many times, a new architecture is generated, tested again, and the feedback results are sent to the controller model for reinforcement learning again.
  • the above controller models will tend to design architectures that achieve higher accuracy in the validation set.
  • FIG. 5 is a schematic diagram of another neural network model construction framework provided in this embodiment of the present application.
  • the schematic diagram of the neural network model construction framework includes at least network structure population and performance evaluation tools.
  • the network structure population contains a variety of neural network model building units.
  • the model generator searches for suitable neural network building units in the network structure population, and constructs the searched neural network building units into a neural network model.
  • the neural network model may include one or more neural network model construction units, which are not specifically limited here.
  • the neural network model is input into the performance evaluation tool to obtain the chip performance index when the neural network model runs on the target chip.
  • the neural network model can also be trained on a chip on the cloud to obtain the network structure performance of the neural network structure. It is understandable that the neural network model can also be trained in other chip environments. There is no specific limitation here.
  • the network structure population is adjusted through chip performance indicators and network structure performance to obtain a model generator that can build a higher chip performance and better network structure performance.
  • the network structure population is updated through Pareto optimization, that is, on the premise of not affecting the performance of the network structure, the constructed neural network model has higher chip performance.
  • the apparatus for constructing the neural network model in the embodiment of the present application may be a computer device with a chip such as a server, a desktop computer, a notebook computer, and a computer cluster, which is not specifically limited here.
  • FIG. 6 is a schematic flowchart of a method for constructing a neural network model according to an embodiment of the present application.
  • step 601 the apparatus for constructing a neural network model constructs a first neural network model through a first model generator.
  • the apparatus for constructing a neural network model constructs a first neural network model through a first model generator, and the first model generator is preset by the apparatus for constructing a neural network model.
  • the apparatus for constructing a neural network model constructs a first neural network by using a first model generator and a target task.
  • the neural network model building apparatus acquires the target task.
  • the target task is determined according to its own needs, and may also be determined according to the user's operation.
  • the target task may include: the type of neural network, the accuracy of the neural network, and the like.
  • the type of neural network includes the output type of the neural network requested to be constructed.
  • the target task may be to construct a neural network for face recognition, which is used to recognize faces and output corresponding character information.
  • the target task may be initiated by the terminal, and a neural network for vehicle identification is constructed to identify the information of the vehicle included in the picture obtained by the cross-modifier.
  • the neural network in this application may be a convolutional neural network, a recurrent neural network, a perceptron neural network, etc., which may be adjusted according to actual application scenarios, which is not limited in this application.
  • Corresponding training data can also be acquired while acquiring the target task or after acquiring the target task.
  • the training data is data associated with the target task.
  • the training data may include input data for the target task and real measurement data. For example, if the target task is to construct a face recognition neural network, the training data includes a large number of face pictures and task information corresponding to each picture.
  • the training data may be divided into a training set and a validation set, where the training set represents a picture for training the neural network model, and the validation set represents a picture for verifying the accuracy of the network model.
  • hardware constraints are also input to the first model generator, where the hardware constraints include various parameters of the target chip.
  • the hardware constraints may include at least one of the following: the frequency of the chip, the size of the chip memory, the size of the chip computing module, the bandwidth between the memory in the chip and the memory, and the like. It can be understood that in the actual application process, more parameters may also be included, which are not specifically limited here.
  • step 602 the apparatus for constructing the neural network model obtains the first theoretical performance index.
  • the target chip can be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA) ), application specific integrated circuit (ASIC) or general-purpose processor, etc.
  • the first theoretical performance index is a theoretical performance index when the CPU runs the first neural network.
  • the first theoretical performance index includes at least one of the following: theoretical vector bound, theoretical memory bound, theoretical cube utilization, Theoretical high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization, theoretical cube module operations (cube cycle), theoretical vector module operations (vector cycle), L1 and L2 Memory fusion effect, compute batch effect, tiling strategy and its performance, performance effect under mixed precision, performance effect under different modes of data flow, cycles of each operator or network layer in the neural network model Or delay, the total number of cycles or delay of the entire neural network model.
  • Theoretical high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization theoretical cube module operations (cube cycle), theoretical vector module operations (vector cycle), L1 and L2 Memory fusion effect, compute batch effect, tiling strategy and its performance, performance effect under mixed precision, performance effect under different modes of data flow, cycles of each operator or network layer in the neural network model
  • the neural network model building apparatus obtains the first theoretical performance index through a performance evaluation tool, and the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
  • the performance evaluation tool may be a tool in the form of software, and is used to obtain the corresponding performance index when the target chip runs the first neural network.
  • the performance evaluation tool is PTM, and the apparatus for constructing a neural network model obtains theoretical performance indicators through PTM.
  • the performance evaluation tool may also exist in other forms, for example, a hardware module, which is not specifically limited here.
  • the device for constructing a neural network model takes at least one component unit (the component unit may include at least one operator or a layer as a unit) composing the first neural network as a first A building unit is divided, and the neural network is divided into one or more first building units.
  • the multiple first building units are input into a performance evaluation tool, and the performance evaluation tool performs calculations according to the multiple first building units and the parameters of the target chip to obtain theoretical performance indicators of each first building unit. Further, the theoretical performance indicators of the first building units are superimposed, that is, the first theoretical performance indicators of the target chip running the first neural network.
  • the theoretical performance index is the time-consuming when the target chip runs the first neural network, and one building unit is an operator in the first neural network, then the performance evaluation tool calculates the time-consuming corresponding to each operator, and then Then superimpose, and finally obtain the time-consuming corresponding to all operators in the entire neural network, that is, obtain the time-consuming when the target chip runs the first neural network.
  • the time-consuming of the whole network can be calculated by the following formula:
  • Time-consuming of the whole network ⁇ n f (the performance evaluation tool estimates the time-consuming of operators at each layer)
  • the neural network model construction device inputs the entire first neural network model into the performance evaluation tool, and the performance evaluation tool performs calculation according to the first neural network model and the parameters of the target chip, and obtains
  • the theoretical performance index of the first neural network model is the first theoretical performance index of the target chip running the first neural network. For example, if the first theoretical performance index is the time consumed when the target chip runs the first neural network, the performance evaluation tool obtains the time consumed when the target chip runs the first neural network by calculating the time consuming of the first neural network.
  • the apparatus for constructing the neural network model obtains the first theoretical performance index by inputting the data stream into the performance evaluation tool.
  • the neural network model construction apparatus determines one or more first construction units of the first neural network model, where the first construction units include at least one of the following: a convolution layer of the first neural network model, a The pooling layer, the activation function of the first neural network model, and the normalization layer of the first neural network model.
  • different layers in the neural network are divided into a dimension of a task, for example, one or more convolution layers, pooling layers, activation functions, normalization layers, etc.
  • a convolution layer is divided into one dimension , in a preferred manner, a convolution layer, a pooling layer, an activation function, and a normalization layer are divided into one dimension.
  • a convolution layer is divided into one dimension , in a preferred manner, a convolution layer, a pooling layer, an activation function, and a normalization layer are divided into one dimension.
  • the process may also include: L1->LOA/L0B, UB->L1, and so on.
  • L1->LOA/L0B analyze the pipelines through which each dimension in the neural network performs data transmission for calculation and data transmission.
  • theoretical performance indicators such as the number of cycles, cube utilization, mac utilization, vector bound, memory bound (DDR, L2), etc. are calculated according to the processing of each pipeline (pipe) and data.
  • the output feature map can be reduced by the corresponding weight when constructing the neural network to adapt to the size of the minimum memory unit of the chip, thereby improving the use efficiency of the minimum memory unit of the chip.
  • vector bound The situation of multiplexing of the vector module in the chip.
  • Cube utilization The cube module is mainly used to calculate the matrix.
  • the cube utilization refers to the number of times the cube module is used per unit time.
  • MAC multiplier-accumulator/multiply-accumulate operation, MAC
  • utilization rate a module in the chip that counts the number of neural network multiplication and addition operations, and the mac utilization rate refers to the number of times the mac module is used per unit time.
  • the method for improving the mac utilization rate in the embodiment of the present application is similar to the method for improving the cube utilization rate, and details are not described herein again.
  • Number of cube cycles The number of operations performed by the cube module in the chip.
  • the number of vector cycles the number of operations using the vector module.
  • L1 and L2 memory fusion are two different types of chip memory. By setting the corresponding weights for building a neural network, the compatibility of L1 and L2 in processing the data corresponding to the neural network can be improved.
  • Compute batch The number of batch processing feature maps in the same time period. By setting the corresponding weights for constructing the neural network, the compute batch can be improved as much as possible on the premise of ensuring the running performance of the chip.
  • the performance indicators in the embodiments of the present application may also include more parameters, as long as they can affect the operating efficiency of the neural network on the chip, which is not specifically limited here.
  • step 603 the apparatus for constructing a neural network model adjusts the first model generator according to the first theoretical performance index to obtain a second model generator.
  • the neural network model building apparatus After obtaining the first theoretical performance index, the neural network model building apparatus adjusts the first model generator according to the first theoretical performance index to obtain a second model generator, and the second model generator can construct a A neural network with better theoretical performance indicators when running on the target chip.
  • the weighting factor in the corresponding first model generator can be adjusted according to the first theoretical performance index, so that the first model generator can construct a better performance index when running in the target chip
  • the second neural network model, and the second theoretical performance index of the second neural network model is better than the first theoretical performance index.
  • the theoretical performance index is the theoretical time-consuming
  • when adjusting the model generator adjust the corresponding time-consuming weight factor, so that the model generator can build a neural network that takes less time to run the target chip.
  • the theoretical performance indicators are multiple theoretical performance indicators, multiple weighting factors are adjusted correspondingly, so that the first model generator constructs a neural network with better performance indicators when the target chip is running.
  • the first neural network is trained, and when the first neural network tends to converge, a fifth neural network model is obtained, and performance parameters of the fifth neural network are obtained.
  • the performance parameters of the fifth neural network may include an accuracy rate, a peak signal-to-noise ratio describing a picture, etc., which are not specifically limited here.
  • the weight factor in the first model generator is adjusted according to the performance parameters of the fifth neural network and the first theoretical performance index. In this way, the performance parameters of the neural network that can be generated by the model generator can be guaranteed, and the performance indicators running on the target chip are better.
  • steps 601 to 603 are a process in which the first model generator iterates once according to the first theoretical performance index.
  • a model generator capable of constructing a neural network with the optimal theoretical performance index is generated, that is, the second model generator.
  • step 604 the apparatus for constructing the neural network model obtains the measured performance index.
  • the neural network model building device After the neural network model building device obtains the second model generator, the neural network model building device obtains the first measured performance index, where the first measured performance index represents the actual performance index when the target chip runs the second neural network.
  • the apparatus for constructing the neural network model obtains the second model generator
  • the second neural network generated by the second model generator can be directly used, and the second model generator can also be improved in a second round.
  • the second round of improvement is based on the performance indicators obtained by the second neural network running on the actual target chip.
  • the second neural network model is constructed from the second model generator, and the second neural network model is placed in the target chip to run, and the corresponding actual situation when the target chip runs the second network model is obtained. performance indicators.
  • the target chip can be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA) ), application specific integrated circuit (ASIC) or general-purpose processor, etc.
  • CPU central processing unit
  • GPU graphics processing unit
  • DSP digital signal processor
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • the first measured performance index may include at least one of the following: measured vector bound, measured memory bound, measured cube utilization , the measured high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization rate, the measured number of cube module operations (cube cycle), L1 and L2 memory measured fusion effect, measured batch calculation (compute batch ) effect, measured tiling strategy and its performance, measured performance effect under mixed precision, measured performance effect under different data flow modes, measured delay of each operator or network layer cycles in the neural network model, measured It can be understood that in the actual application process, more chip performance indicators can be included, such as the measured number of vector module operations (vector cycle), as long as it is It is sufficient to reflect the performance index reflected by the target chip running the second network performance, which is not specifically limited here.
  • the system where the target chip is located can obtain the performance index corresponding to the target chip running the second neural network in the form of software or hardware, and the specific method of obtaining the performance index is not limited here.
  • step 605 the neural network model building apparatus adjusts the second model generator according to the first measured performance index to obtain a third model generator.
  • the neural network model construction device After the neural network model construction device obtains the first measured performance index, the neural network model construction device adjusts the second model generator according to the first measured performance index to obtain a third model generator, which can construct a third model generator. Three neural network models, the second measured performance index of the third neural network model is better than the first measured performance index.
  • the weight factor in the corresponding second model generator can be adjusted according to the measured performance index, so that the second model generator can construct a neural network with better performance index when running in the target chip .
  • the first measured performance index is the measured time-consuming when the second neural network runs on the target chip
  • the corresponding time-consuming weight factor is adjusted so that the model generates
  • the processor builds a neural network that takes less time to run on the target chip. It can be understood that when the first measured theoretical performance index is multiple indices, the multiple weighting factors are adjusted correspondingly, so that the second model generator can construct a neural network with better performance indices when running on the target chip. .
  • the second neural network is trained, and when the second neural network tends to converge, a fourth neural network model is obtained, and performance parameters of the fourth neural network are obtained.
  • the performance parameters of the fourth neural network may include an accuracy rate, a peak signal-to-noise ratio describing a picture, etc., which are not specifically limited here.
  • the weight factor in the first model generator is adjusted according to the performance parameter of the fourth neural network and the first measured performance index. This can ensure that the performance parameters of the neural network generated by the model generator are excellent, and the performance indicators running on the target chip are better.
  • steps 604 to 605 are a process in which the second model generator iterates once according to the measured performance indicators.
  • steps 604 to 605 are optional steps, and when steps 604 to 605 are not performed, the second neural network model constructed by the second model generator is used as the neural network model used on the target chip .
  • the theoretical performance index and the measured performance index when obtaining the theoretical performance index and the measured performance index, it can be obtained in the neural network model construction device, or it can be obtained by other computer equipment, and then sent to the neural network model construction device. Yes, there is no specific limitation here.
  • FIG. 6 is an application scenario of the embodiment of the present application, and another application scenario of the neural network model construction method of the embodiment of the present application is described below.
  • FIG. 9 is another schematic flowchart of a method for constructing a neural network model according to an embodiment of the present application.
  • the structure search space is taken as an example to represent the model generator for description.
  • Each constituent unit in the search space is a code indicating the construction of the neural network structure.
  • each sampling utilizes the subdivision relationship of the unevaluated constituent units in the search space and the network model constructed based on the evaluated constituent units.
  • the part in the dotted box is the step of constructing the initial search space.
  • the search space consists of multiple network structures, which consist of multiple constituent units. After constructing initial search spaces of multiple network structures, unreasonable network structures are screened based on preset rules or application scenario tasks to obtain a second search space. Then through the network structure clustering, a new third search space is formed.
  • the cluster center structure is trained based on the third search space, the unevaluated structure is modeled according to the training loss value of the evaluated network structure, and then several network structures are selected for training based on Bayesian optimization, and then according to the training loss The value models the unevaluated structure, and this loops until the search space is constructed.
  • the method for constructing the neural network model in the embodiment shown in FIG. 6 is used to further adjust the search space, and details are not repeated here.
  • the performance of the search space is improved by initially constructing and training the search space.
  • FIG. 10 is a schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.
  • a device for constructing a neural network model comprising:
  • a construction unit 1001 configured to construct a first neural network model by a first model generator
  • an obtaining unit 1002 configured to obtain, according to the first neural network model, a first performance index when the first neural network model runs on the target chip;
  • a processing unit 1003 configured to adjust the first model generator according to the first performance index to obtain the second model generator
  • the constructing unit 1001 is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
  • each unit of the apparatus for constructing a neural network model is similar to those described in the foregoing embodiments shown in FIG. 6 and FIG. 7 , and details are not repeated here.
  • FIG. 11 is another schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.
  • a device for constructing a neural network model comprising:
  • a construction unit 1101 configured to construct a first neural network model by a first model generator
  • an obtaining unit 1102 configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;
  • a processing unit 1103, configured to adjust the first model generator according to the first performance index to obtain the second model generator
  • the constructing unit 1101 is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
  • the first performance index is a first theoretical performance index
  • the first theoretical performance index represents a theoretical value of the performance index when the first neural network model runs on the target chip
  • the second performance index is a second theoretical performance index
  • the second theoretical performance index is better than the first theoretical performance index
  • the obtaining unit 1102 is specifically configured to obtain the first measured performance index, where the first measured performance index represents the measured value of the performance index when the second neural network model runs on the target chip;
  • the processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index to obtain the third model generator;
  • the constructing unit 1101 is further configured to construct a third neural network model through the third model generator, where the second measured performance index of the third neural network model is better than the first measured performance index.
  • the apparatus for constructing the neural network model further includes:
  • the training unit 1104 is used for training the second neural network model to obtain the fourth neural network model
  • the processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:
  • the processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.
  • the first performance index is a first theoretical performance index
  • the obtaining unit is specifically configured to obtain the first theoretical performance index through a performance evaluation tool
  • the performance evaluation tool includes a calculation function
  • the calculation function is used to calculate the first neural network model. , to obtain the first theoretical performance index.
  • the apparatus for constructing the neural network model further includes:
  • Determining unit 1105 configured to determine a first construction unit of the first neural network model by using a performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a pooling layer of the first neural network model , the activation function of the first neural network model, the normalization layer of the first neural network model;
  • the processing unit 1103 is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.
  • the training unit 1104 is further configured to train the first neural network model to obtain the fifth neural network model;
  • the processing unit 1103 is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
  • the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module limit, a theoretical memory limit, a theoretical cube module utilization rate, and a theoretical high-speed parallel multiply-accumulator MAC utilization rate. , the number of operations of the theoretical cube module, and the number of operations of the theoretical vector module.
  • the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured memory limit, the measured cube module utilization rate, and the measured high-speed parallel multiply-accumulator MAC utilization rate. , the measured number of operations of the cube module, and the measured number of operations of the vector module.
  • each unit of the apparatus for constructing a neural network model is similar to those described in the foregoing embodiments shown in FIG. 6 and FIG. 7 , and details are not repeated here.
  • FIG. 12 is another schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.
  • the processor 1201 is connected to the memory 1202 and the interface 1204.
  • the bus 1205 is respectively connected to the processor 1201, the memory 1202, and the interface 1204.
  • the interface 1204 is used to receive or send data.
  • the processor 1201 be a single-core or multi-core central processing unit, or be a specific integrated circuit, or be one or more integrated circuits configured to implement embodiments of the invention.
  • the memory 1202 may be random access memory (RAM), or may be non-volatile memory (non-volatile memory), such as at least one hard disk memory.
  • Memory 1202 is used to store computer-executable instructions. Specifically, the program 1203 may be included in the computer-executed instructions.
  • the processor 1201 calls the program 1203, it can make the neural network model construction apparatus in FIG. 12 execute the operations performed by the neural network model construction apparatus in the embodiment shown in FIG. 6 or FIG. 9, specifically here No longer.
  • the processor mentioned in the apparatus for constructing a neural network model in the above embodiments of the present application may be a central processing unit (CPU), or other General-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the number of processors in the apparatus for constructing a neural network model in the above embodiments of the present application may be one or more, and may be adjusted according to actual application scenarios. limited.
  • the number of memories in this embodiment of the present application may be one or multiple, and may be adjusted according to actual application scenarios, which is merely illustrative and not limiting.
  • the neural network model building apparatus includes a processor (or a processing unit) and a memory
  • the processor in this application may be integrated with the memory, or the processor and the memory may be connected through an interface, It can be adjusted according to the actual application scenario and is not limited.
  • the embodiments of the present application also provide a computer program or a computer program product including a computer program, and when the computer program is executed on a computer, the computer will enable the computer to implement the neural network and the neural network in any of the above method embodiments.
  • the method flow related to the model building device.
  • Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a computer, implements the method process related to the apparatus for constructing a neural network model in any of the above method embodiments.
  • FIGS. 6-9 may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • wire eg, coaxial cable, fiber optic, digital subscriber line (DSL)
  • wireless eg, infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
  • the words “if” or “if” as used herein may be interpreted as “at” or “when” or “in response to determining” or “in response to detecting.”
  • the phrases “if determined” or “if detected (the stated condition or event)” can be interpreted as “when determined” or “in response to determining” or “when detected (the stated condition or event),” depending on the context )” or “in response to detection (a stated condition or event)”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A neural network model construction method and a device therefor, which are used in constructing a neural network. The method comprises: constructing a first neural network model by means of a first model generator (601), obtaining, according to the first neural network model, a first performance indicator when the first neural network model is running on a target chip (602), adjusting the first model generator according to the first performance indicator, and obtaining a second model generator (603), and constructing a second neural network model by means of a second model generator, where a second performance indicator for the second neural network model is superior to the first performance indicator. By means of obtaining a first theoretical performance indicator when a first neural network model is running on a target chip, and adjusting a first model generator according to the first theoretical performance indicator, a neural network model with a superior hardware performance indicator when running on the target chip can thereby be constructed.

Description

一种神经网络模型构建方法及其设备A method for constructing a neural network model and its device 技术领域technical field
本申请涉及人工智能领域,尤其涉及一种神经网络模型构建方法及其设备。The present application relates to the field of artificial intelligence, and in particular, to a method and device for constructing a neural network model.
背景技术Background technique
深度神经网络今年来在图像、视频和语音等多种媒体信号的处理与分析任务中取得了卓越的成就。一个性能良好的神经网络往往拥有精妙的网络结构,需要具有高超技能和丰富经验的人类专家花费大量精力进行设计。Deep neural networks have achieved remarkable achievements in the processing and analysis tasks of various media signals such as image, video and speech this year. A well-performing neural network often has a delicate network structure, which requires a lot of effort to design by human experts with high skills and rich experience.
神经网络的结构搜索,即神经网络模型构建改变了这种手工的设计模式,自动化地搜索神经网络结构,得到性能优异的神经网络结构,在图像识别、图像语义分割和自然语言处理等任务上取得了优异的成绩。The structure search of neural network, that is, the construction of neural network model changes this manual design mode, automatically searches the neural network structure, and obtains a neural network structure with excellent performance, which is achieved in tasks such as image recognition, image semantic segmentation and natural language processing. excellent results.
在传统的结构搜索中,会针对任务的目标指标(例如图像分类、图像分割等应用的模型测试准确度),对神经网络结构搜索在某个芯片环境中进行训练,而该基于某个芯片环境训练出来的神经网络结构搜索在其他芯片环境中运用时,由于芯片参数的不同,会导致神经网络结构搜索运行时出现兼容性的问题,例如耗时过高、芯片利用率低等等。In the traditional structure search, the neural network structure search is trained in a certain chip environment according to the target indicators of the task (such as the model test accuracy of image classification, image segmentation and other applications), and this is based on a certain chip environment When the trained neural network structure search is used in other chip environments, due to the different chip parameters, there will be compatibility problems when the neural network structure search is run, such as high time consumption, low chip utilization, and so on.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供了一种神经网络模型构建方法及其设备,用于在使用模型生成器生成神经网络模型时,通过获取第一神经网络模型在目标芯片上运行时的第一理论性能指标,并根据第一理论性能指标调整第一模型生成器中对应的权重,从而构建出在目标芯片中运行时硬件性能指标更优的神经网络模型。The embodiments of the present application provide a method and device for constructing a neural network model, which are used to obtain a first theoretical performance index of the first neural network model when the first neural network model runs on a target chip when a model generator is used to generate a neural network model, And according to the first theoretical performance index, the corresponding weight in the first model generator is adjusted, so as to construct a neural network model with better hardware performance index when running in the target chip.
本申请实施例的第一方面提供一种神经网络模型构建方法。A first aspect of the embodiments of the present application provides a method for constructing a neural network model.
神经网络模型构建装置通过预置在神经网络模型构建装置中的第一模型生成器构建了第一神经网络模型,该第一神经网络模型是第一模型生成器基于各个构建单元构建的。The neural network model constructing apparatus constructs a first neural network model through a first model generator preset in the neural network model constructing apparatus, and the first neural network model is constructed by the first model generator based on each construction unit.
第一模型生成器在构建了第一神经网络模型之后,神经网络模型构建装置根据第一神经网络模型获取该第一神经网络模型在目标芯片上运行时的第一性能指标。After the first model generator builds the first neural network model, the neural network model building apparatus obtains, according to the first neural network model, the first performance index when the first neural network model runs on the target chip.
神经网络模型构建装置根据该第一性能指标调整第一模型生成器,得到第二模型生成器。神经网络模型构建装置在得到了第二模型生成器之后,根据该第二模型生成器构建第二神经网络模型,该第二神经网络模型的第二性能指标优于优于第一性能指标,即第二神经网络模型在目标芯片上运行时的性能指标优于第一神经网络模型在目标芯片上运行时的性能指标。The apparatus for constructing a neural network model adjusts the first model generator according to the first performance index to obtain a second model generator. After obtaining the second model generator, the neural network model building device builds a second neural network model according to the second model generator, and the second performance index of the second neural network model is better than the first performance index, that is The performance index of the second neural network model when running on the target chip is better than that of the first neural network model when running on the target chip.
本申请实施例中,通过获取第一神经网络模型在目标芯片上运行时的第一性能指标,并根据第一性能指标调整第一模型生成器,从而构建出在目标芯片中运行时硬件性能指标更优的第二神经网络模型。In the embodiment of the present application, by acquiring the first performance index of the first neural network model when running on the target chip, and adjusting the first model generator according to the first performance index, the hardware performance index when running on the target chip is constructed. A better second neural network model.
基于本申请实施例的第一方面的神经网络模型构建方法,在一种可能的实现方式中,Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner,
第一模型生成器在构建了第一神经网络模型之后,神经网络模型构建装置获取该第一神经网络模型的理论性能指标,该第一理论性能指标表示第一神经网络模型在目标芯片上运行时的性能指标的理论值。After the first model generator builds the first neural network model, the neural network model building device obtains a theoretical performance index of the first neural network model, where the first theoretical performance index indicates that the first neural network model runs on the target chip The theoretical value of the performance index.
神经网络模型构建装置根据该第一理论性能指标调整第一模型生成器,得到第二模型生成器。神经网络模型构建装置在得到了第二模型生成器之后,根据该第二模型生成器构建第二神经网络模型,该第二神经网络模型第二理论性能指标,即第二神经网络在目标芯片上运行时的性能指标的理论值优于第一理论性能指标。The apparatus for constructing a neural network model adjusts the first model generator according to the first theoretical performance index to obtain a second model generator. After obtaining the second model generator, the neural network model construction device constructs a second neural network model according to the second model generator, and the second theoretical performance index of the second neural network model, that is, the second neural network is on the target chip. The theoretical value of the performance index at runtime is better than the first theoretical performance index.
本申请实施例中,通过获取第一神经网络模型在目标芯片上运行时的第一理论性能指标,并根据第一理论性能指标调整第一模型生成器,从而构建出在目标芯片中运行时硬件性能指标更优的第二神经网络模型。In the embodiment of the present application, by acquiring the first theoretical performance index of the first neural network model when running on the target chip, and adjusting the first model generator according to the first theoretical performance index, the hardware running on the target chip is constructed. The second neural network model with better performance indicators.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,神经网络模型构建装置通过第二模型生成器构建第二神经网络模型之后,神经网络模型构建装置还会获取第一实测性能指标,该第一实测性能指标表示第二神经网络模型在目标芯片上运行时的性能指标的实测值,即第二神经网络模型在目标芯片上运行后,获取到的目标芯片的实测性能指标。Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner, after the neural network model construction apparatus constructs the second neural network model through the second model generator, the neural network model construction apparatus further Obtain a first measured performance index, where the first measured performance index represents the measured value of the performance index when the second neural network model runs on the target chip, that is, the obtained target chip after the second neural network model runs on the target chip measured performance indicators.
神经网络模型构建装置根据第一实测性能指标调整第二模型生成器中对应的权重因子,得到第三模型生成器。在得到第三模型生成器之后,神经网络模型构建装置通过第三模型生成器构建第三神经网络模型,该第三神经网络模型的第二实测性能指标,即第三神经网络模型在目标芯片上运行时的性能指标的实测值优于第一实测性能指标。The neural network model construction device adjusts the corresponding weighting factor in the second model generator according to the first measured performance index to obtain a third model generator. After obtaining the third model generator, the neural network model construction device uses the third model generator to construct a third neural network model, and the second measured performance index of the third neural network model, that is, the third neural network model is on the target chip. The measured value of the performance index at runtime is better than the first measured performance index.
本申请实施例中,通过将第二神经网络在实际的目标芯片中进行运行并获取对应的实测性能指标,再根据该实测性能指标调整第二模型生成器,进而可以获取到更适配目标芯片的神经网络模型。In the embodiment of the present application, by running the second neural network in the actual target chip and obtaining the corresponding measured performance index, and then adjusting the second model generator according to the measured performance index, a more suitable target chip can be obtained. neural network model.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,神经网络模型构建装置在获取第一实测性能指标之前,神经网络模型构建装置还会对第二神经网络模型进行训练,得到第四神经网络模型。神经网络模型构建装置在得到第四神经网络模型之后,会获取该第四神经网络模型的模型性能,并且根据该第四神经网络的模型性能和第一实测性能指标调整第二模型生成器,以得到第三模型生成器。Based on the neural network model construction method according to the first aspect of the embodiments of the present application, in a possible implementation manner, before the neural network model construction apparatus acquires the first measured performance index, the neural network model construction apparatus will further analyze the second neural network The model is trained to obtain a fourth neural network model. After obtaining the fourth neural network model, the neural network model building device will obtain the model performance of the fourth neural network model, and adjust the second model generator according to the model performance of the fourth neural network and the first measured performance index, so as to Get the third model generator.
本申请实施例中,根据第二神经网络模型训练后的第四神经网络模型的模型性能,和第一实测性能指标对第二模型生成器进行调整,可以在保证调整后的第三模型生成器生成的神经网络模型在保证模型性能同等的情况下,更好的提升了神经网络模型在目标芯片中运行时的性能指标。In the embodiment of the present application, the second model generator is adjusted according to the model performance of the fourth neural network model trained by the second neural network model and the first measured performance index, so that the adjusted third model generator can be guaranteed The generated neural network model better improves the performance index of the neural network model when it runs in the target chip while ensuring the same performance of the model.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,神经网络模型构建装置通过性能评估工具获取第一理论性能指标,该理论性能评估工具包括计算函数,该计算函数用于对第一神经网络模型进行计算,以得到第一理论性能指标。Based on the neural network model construction method of the first aspect of the embodiment of the present application, in a possible implementation manner, the neural network model construction apparatus obtains the first theoretical performance index through a performance evaluation tool, the theoretical performance evaluation tool includes a calculation function, the The calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
本申请实施例中,神经网络模型构建装置通过性能评估工具获取第一理论性能指标,提升了获取第一理论性能指标的可实现性。In the embodiment of the present application, the apparatus for constructing the neural network model obtains the first theoretical performance index through a performance evaluation tool, which improves the achievability of obtaining the first theoretical performance index.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,神经网络模型构建装置通过性能评估工具确定第一神经网络的第一构建单元,该第一构建单元包括以下至少一个:第一神经网络模型的卷积层、第一神经网络模型的池化层、第一神经网络模型的激活函数、第一神经网络模型的归一化层。该第一构建单元中包括的卷积层、 池化层、激活函数、归一化层可以是一个,也可以是多个。神经网络模型构建装置根据第一构建单元进行计算,以得到第一理论性能指标。Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner, the neural network model construction apparatus determines a first construction unit of the first neural network by using a performance evaluation tool, and the first construction unit includes At least one of the following: a convolution layer of the first neural network model, a pooling layer of the first neural network model, an activation function of the first neural network model, and a normalization layer of the first neural network model. The convolution layer, pooling layer, activation function, and normalization layer included in the first building unit may be one or more than one. The neural network model construction device performs calculation according to the first construction unit to obtain the first theoretical performance index.
本申请实施例中,神经网络模型构建装置通过性能评估工具计算第一神经网络模型的一个或者多个第一构建单元得到第一理论性能指标,而第一构建单元又包括了至少一个卷积层、池化层、激活函数、归一化层,因此可以根据该最组成第一神经网络模型的各个层对第一神经网络模型的理论性能指标进行调整,提升了灵活性。In the embodiment of the present application, the neural network model construction apparatus calculates one or more first construction units of the first neural network model through a performance evaluation tool to obtain the first theoretical performance index, and the first construction unit further includes at least one convolution layer , pooling layer, activation function, and normalization layer, so the theoretical performance indicators of the first neural network model can be adjusted according to each layer that constitutes the first neural network model, which improves flexibility.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,在神经网络模型构建装置构建第一神经网络模型之后,神经网络模型构建装置还会对第一神经网络模型进行训练,得到第五神经网络模型。神经网络模型构建装置在得到第五神经网络模型之后,会获取该第五神经网络模型的模型性能,并且根据该第五神经网络的模型性能和第一理论性能指标调整第一模型生成器,以得到第二模型生成器。Based on the neural network model construction method of the first aspect of the embodiments of the present application, in a possible implementation manner, after the neural network model construction apparatus builds the first neural network model, the neural network model construction apparatus will further The model is trained to obtain a fifth neural network model. After obtaining the fifth neural network model, the neural network model building device will obtain the model performance of the fifth neural network model, and adjust the first model generator according to the model performance of the fifth neural network and the first theoretical performance index, to Get the second model generator.
本申请实施例中,根据第一神经网络模型训练后的第五神经网络模型的模型性能,和第一理论性能指标对第一模型生成器进行调整,可以在保证调整后的第二模型生成器生成的神经网络模型在保证模型性能同等的情况下,更好的提升了神经网络模型在目标芯片中运行的理论性能指标。In the embodiment of the present application, the first model generator is adjusted according to the model performance of the fifth neural network model trained by the first neural network model and the first theoretical performance index, so that the adjusted second model generator can be guaranteed The generated neural network model better improves the theoretical performance index of the neural network model running in the target chip while ensuring the same performance of the model.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,第一理论性能指标和第二理论性能指标分别包括以下至少一种:理论的矢量模块界限(vector bound)、理论的内存界限(memory bound)、理论的立方体模块(cube)利用率、理论的高速并行乘法累加器MAC利用率、理论的立方体模块(vector cycle)运算次数、理论的矢量模块(cube)运算次数。Based on the neural network model construction method of the first aspect of the embodiment of the present application, in a possible implementation manner, the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module bound (vector bound ), theoretical memory bound, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module (vector cycle) number of operations, theoretical vector module (cube) number of operations.
本申请实施例中,示例性的说明了第一理论性能指标、第二理论性能指标的具体指代,提升了方案的可实现性。In the embodiments of the present application, the specific designations of the first theoretical performance index and the second theoretical performance index are exemplarily described, which improves the achievability of the solution.
基于本申请实施例第一方面的神经网络模型构建方法,在一种可能的实现方式中,第一实测性能指标和第二实测性能指标分别包括以下至少一种:实测的矢量模块界限、实测的内存界限、实测的立方体模块利用率、实测的高速并行乘法累加器MAC利用率、实测的立方体模块运算次数、实测的矢量模块运算次数。Based on the neural network model construction method of the first aspect of the embodiment of the present application, in a possible implementation manner, the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured Memory limit, measured cube module utilization, measured high-speed parallel multiply-accumulator MAC utilization, measured cube module operation times, and measured vector module operation times.
本申请实施例中,示例性的说明了第一实测性能指标、第二实测性能指标的具体指代,提升了方案的可实现性。In the embodiment of the present application, the specific designations of the first measured performance index and the second measured performance index are exemplarily described, which improves the achievability of the solution.
本申请实施例第二方面提供了一种神经网络模型构建装置。A second aspect of the embodiments of the present application provides an apparatus for constructing a neural network model.
一种神经网络模型构建装置,包括:A device for constructing a neural network model, comprising:
构建单元,用于通过第一模型生成器构建第一神经网络模型;a construction unit for constructing a first neural network model by a first model generator;
获取单元,用于根据第一神经网络模型获取第一神经网络模型在目标芯片上运行时的第一性能指标;an obtaining unit, configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;
处理单元,用于根据第一性能指标调整第一模型生成器,得到第二模型生成器;a processing unit, configured to adjust the first model generator according to the first performance index to obtain the second model generator;
构建单元还用于通过第二模型生成器构建第二神经网络模型,第二神经网络模型的第二性能指标优于第一性能指标。The construction unit is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
可选地,第一性能指标为第一理论性能指标,第一理论性能指标表示第一神经网络模 型在目标芯片上运行时的性能指标的理论值,第二性能指标为第二理论性能指标,第二理论性能指标优于第一理论性能指标。Optionally, the first performance index is a first theoretical performance index, the first theoretical performance index represents a theoretical value of the performance index when the first neural network model runs on the target chip, and the second performance index is a second theoretical performance index, The second theoretical performance index is better than the first theoretical performance index.
可选地,获取单元具体用于获取第一实测性能指标,第一实测性能指标表示第二神经网络模型在目标芯片上运行时的性能指标的实测值;Optionally, the obtaining unit is specifically configured to obtain a first measured performance index, where the first measured performance index represents an actual measured value of the performance index when the second neural network model runs on the target chip;
处理单元还用于根据第一实测性能指标调整第二模型生成器,得到第三模型生成器;The processing unit is further configured to adjust the second model generator according to the first measured performance index to obtain the third model generator;
构建单元还用于通过第三模型生成器构建第三神经网络模型,第三神经网络模型的第二实测性能指标优于第一实测性能指标。The construction unit is further configured to construct a third neural network model through the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.
可选地,神经网络模型构建装置还包括:Optionally, the apparatus for constructing the neural network model further includes:
训练单元,用于训练第二神经网络模型得到第四神经网络模型;a training unit for training the second neural network model to obtain the fourth neural network model;
处理单元还用于根据第一实测性能指标调整第二模型生成器,得到第三模型生成器包括:The processing unit is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:
处理单元还用于根据第一实测性能指标和第四神经网络模型的模型性能调整第二模型生成器,得到第三模型生成器。The processing unit is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.
可选地,第一性能指标为第一理论性能指标,获取单元具体用于通过性能评估工具获取第一理论性能指标,性能评估工具包括计算函数,计算函数用于对第一神经网络模型进行计算,以得到第一理论性能指标。Optionally, the first performance index is a first theoretical performance index, and the obtaining unit is specifically configured to obtain the first theoretical performance index through a performance evaluation tool, the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model. , to obtain the first theoretical performance index.
可选地,神经网络模型构建装置还包括:Optionally, the apparatus for constructing the neural network model further includes:
确定单元,用于通过性能评估工具确定第一神经网络模型的第一构建单元,第一构建单元包括以下至少一个:第一神经网络模型的卷积层、第一神经网络模型的池化层、第一神经网络模型的激活函数、第一神经网络模型的归一化层;A determination unit, used for determining a first construction unit of the first neural network model by a performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a pooling layer of the first neural network model, The activation function of the first neural network model and the normalization layer of the first neural network model;
处理单元还用于根据第一构建单元进行计算,以得到第一理论性能指标。The processing unit is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.
可选地,训练单元还用于训练第一神经网络模型得到第五神经网络模型;Optionally, the training unit is also used to train the first neural network model to obtain the fifth neural network model;
处理单元还用于根据第一理论性能指标和第五神经网络模型的模型性能调整第一模型生成器,得到第二模型生成器。The processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
可选地,第一理论性能指标和第二理论性能指标分别包括以下至少一种:理论的矢量模块界限、理论的内存界限、理论的立方体模块利用率、理论的高速并行乘法累加器MAC利用率、理论的立方体模块运算次数、理论的矢量模块运算次数。Optionally, the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module limit, a theoretical memory limit, a theoretical cube module utilization rate, and a theoretical high-speed parallel multiply-accumulator MAC utilization rate. , the number of operations of the theoretical cube module, and the number of operations of the theoretical vector module.
可选地,第一实测性能指标和第二实测性能指标分别包括以下至少一种:实测的矢量模块界限、实测的内存界限、实测的立方体模块利用率、实测的高速并行乘法累加器MAC利用率、实测的立方体模块运算次数、实测的矢量模块运算次数。Optionally, the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured memory limit, the measured cube module utilization rate, and the measured high-speed parallel multiply-accumulator MAC utilization rate. , the measured number of operations of the cube module, and the measured number of operations of the vector module.
本申请实施例第三方面提供一种神经网络模型构建装置,包括:A third aspect of the embodiments of the present application provides an apparatus for constructing a neural network model, including:
处理器、存储器以及输入输出接口,该处理器、该存储器与该输入输出接口连接;该存储器,用于存储程序代码;该处理器调用该存储器中的程序代码时执行本申请第一方面实施方式提供的方法。A processor, a memory, and an input-output interface, the processor and the memory are connected to the input-output interface; the memory is used to store program codes; the processor executes the first aspect of the present application when calling the program codes in the memory provided method.
本申请实施例第四方面提供一种存储介质,需要说明的是,本发的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产口的形式体现出来,该计算机软件产品存储在一个存储介质中,用于储存为上述设备所用的计算机软件 指令,其包含用于执行上述第一方面中为元数据存储方法所设计的程序。A fourth aspect of the embodiments of the present application provides a storage medium. It should be noted that the technical solution of the present invention is essentially or a part that contributes to the prior art, or all or part of the technical solution can be produced in software. In the form of embodiment, the computer software product is stored in a storage medium for storing computer software instructions for the above-mentioned device, which includes a program for executing the above-mentioned first aspect for the metadata storage method.
该存储介质包括:U盘、移动硬盘、只读存储器(英文缩写ROM,英文全称:Read-Only Memory)、随机存取存储器(英文缩写:RAM,英文全称:Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The storage medium includes: U disk, mobile hard disk, read-only memory (English abbreviation ROM, English full name: Read-Only Memory), random access memory (English abbreviation: RAM, English full name: Random Access Memory), magnetic disk or CD-ROM and other media that can store program codes.
本申请实施例第五方面提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如本申请第一方面实施方式的方法。A fifth aspect of the embodiments of the present application provides a computer program product including instructions, which, when executed on a computer, cause the computer to execute the method according to the embodiment of the first aspect of the present application.
其中,上述任一处提到的处理器,可以是一个通用中央处理器(Central Processing Unit,CPU),微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制上述第一方面中端口检测的方法的程序执行的集成电路。Wherein, the processor mentioned in any of the above may be a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more An integrated circuit for controlling program execution of the method for port detection in the first aspect.
本申请实施例提供的技术方案中,通过获取第一神经网络在目标芯片上运行的第一理论性能指标,并根据第一理论性能指标调整第一模型生成器,使得该第一模型生成器可以根据目标芯片的性能指标做调整,提升了调整后的第二模型生成的第二神经网络模型的兼容性。In the technical solutions provided by the embodiments of the present application, the first theoretical performance index of the first neural network running on the target chip is obtained, and the first model generator is adjusted according to the first theoretical performance index, so that the first model generator can Adjusting according to the performance index of the target chip improves the compatibility of the second neural network model generated by the adjusted second model.
附图说明Description of drawings
图1为本申请实施例中神经网络模型构建方法实施例的一个框架示意图;1 is a schematic diagram of a framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图2为本申请实施例中神经网络模型构建方法实施例的另一框架示意图;2 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图3为本申请实施例中神经网络模型构建方法实施例的另一框架示意图;3 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图4为本申请实施例中神经网络模型构建方法实施例的另一框架示意图;4 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图5为本申请实施例中神经网络模型构建方法实施例的另一框架示意图;5 is a schematic diagram of another framework of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图6为本申请实施例中神经网络模型构建方法实施例的一个流程示意图;6 is a schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图7为本申请实施例中神经网络模型构建方法实施例的另一流程示意图;7 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图8为本申请实施例中神经网络模型构建方法实施例的另一流程示意图;8 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图9为本申请实施例中神经网络模型构建方法实施例的另一流程示意图;9 is another schematic flowchart of an embodiment of a method for constructing a neural network model in an embodiment of the present application;
图10为本申请实施例中神经网络模型构建装置实施例的一个结构示意图;10 is a schematic structural diagram of an embodiment of an apparatus for constructing a neural network model in an embodiment of the application;
图11为本申请实施例中神经网络模型构建装置实施例的另一结构示意图;FIG. 11 is another schematic structural diagram of the embodiment of the apparatus for constructing the neural network model in the embodiment of the application;
图12为本申请实施例中神经网络模型构建装置实施例的另一结构示意图。FIG. 12 is another schematic structural diagram of an embodiment of an apparatus for constructing a neural network model according to an embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
图1示出一种人工智能主体框架示意图,该主体框架描述了人工智能系统总体工作流程,适用于通用的人工智能领域需求。Figure 1 shows a schematic diagram of an artificial intelligence main frame, which describes the overall workflow of an artificial intelligence system and is suitable for general artificial intelligence field requirements.
下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。The above artificial intelligence theme framework will be explained from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis).
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感 知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。The "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, data has gone through the process of "data-information-knowledge-wisdom".
“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。The "IT value chain" reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecological process of the system.
(1)基础设施:(1) Infrastructure:
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for artificial intelligence systems, realizes communication with the outside world, and supports through the basic platform. Communication with the outside world through sensors; computing power is provided by smart chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA); the basic platform includes distributed computing framework and network-related platform guarantee and support, which can include cloud storage and computing, interconnection networks, etc. For example, sensors communicate with external parties to obtain data, and these data are provided to the intelligent chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、视频、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, video, and text, as well as IoT data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making, etc.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can perform symbolic and formalized intelligent information modeling, extraction, preprocessing, training, etc. on data.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。Reasoning refers to the process of simulating human's intelligent reasoning method in a computer or intelligent system, using formalized information to carry out machine thinking and solving problems according to the reasoning control strategy, and the typical function is search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General ability
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理(如图像识别、目标检测等),语音识别等等。After the above-mentioned data processing, some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing (such as image recognition, object detection, etc.), speech recognition, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, and the productization of intelligent information decision-making and implementation of applications. Its application areas mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, autonomous driving, safe city, smart terminals, etc.
参见附图2,本申请实施例提供了一种系统架构200。该系统架构中包括数据库230、客户设备240。数据采集设备260用于采集数据并存入数据库230,训练模块220基于数据库230中维护的数据生成目标模型/规则201。Referring to FIG. 2 , an embodiment of the present application provides a system architecture 200 . The system architecture includes a database 230 and a client device 240 . The data collection device 260 is used to collect data and store it in the database 230 , and the training module 220 generates the target model/rule 201 based on the data maintained in the database 230 .
深度神经网络中的每一层的工作可以用数学表达式y=a(W*x+b)来描述:从物理层面深度神经网络中的每一层的工作可以理解为通过五种对输入空间(输入向量的集合)的操作,完成输入空间到输出空间的变换(即矩阵的行空间到列空间),这五种操作包括:1、升维/降维;2、放大/缩小;3、旋转;4、平移;5、“弯曲”。其中1、2、3的操作由W*x完成,4的操作由+b完成,5的操作则由a()来实现。这里之所以用“空间”二字来表述是 因为被分类的对象并不是单个事物,而是一类事物,空间是指这类事物所有个体的集合。其中,W是权重向量,该向量中的每一个值表示该层神经网络中的一个神经元的权重值。该向量决定着上文的输入空间到输出空间的空间变换,即每一层的权重控制着如何变换空间。训练深度神经网络的目的,也就是最终得到训练好的神经网络的所有层的权重矩阵。因此,神经网络的训练过程本质上就是学习控制空间变换的方式,更具体的就是学习权重矩阵,在本申请以下实施方式中,该权重矩阵可以细化为结构参数集合和网络参数集合,具体参阅以下图2中的相关介绍。The work of each layer in the deep neural network can be described by the mathematical expression y=a(W*x+b): from the physical level, the work of each layer in the deep neural network can be understood as through five pairs of input space (set of input vectors) operations to complete the transformation from input space to output space (that is, from the row space of the matrix to the column space), these five operations include: 1. Dimension raising/reducing; 2. Enlarging/reducing; 3. Rotation; 4, translation; 5, "bend". Among them, the operations of 1, 2, and 3 are completed by W*x, the operation of 4 is completed by +b, and the operation of 5 is implemented by a(). The reason why the word "space" is used here is because the object to be classified is not a single thing, but a type of thing, and space refers to the collection of all individuals of this type of thing. Among them, W is the weight vector, and each value in the vector represents the weight value of a neuron in the neural network of this layer. This vector determines the spatial transformation from the input space to the output space above, that is, the weight of each layer controls how the space is transformed. The purpose of training a deep neural network is to finally get the weight matrix of all layers of the trained neural network. Therefore, the training process of the neural network is essentially the method of learning and controlling the spatial transformation, and more specifically, the learning of the weight matrix. In the following embodiments of the present application, the weight matrix can be refined into a set of structural parameters and a set of network parameters. For details, refer to The relevant introduction in Figure 2 below.
因为期望深度神经网络的输出尽可能的接近目标值,所以可以通过比较当前网络的预测值和目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数)。例如,如果网络的预测值过高,则调整权重矩阵中的权重的值从而降低预测值,经过不断的调整,直到神经网络输出的值接近目标值或者等于目标值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,即损失函数(loss function)或目标函数(objective function),损失函数是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,神经网络的训练可以理解为尽可能缩小loss的过程。Because the output of the deep neural network is expected to be as close to the target value as possible, the weight vector of each layer of neural network can be updated according to the difference between the predicted value and the target value of the current network (of course, in There is usually an initialization process before the first update, that is, preconfiguring parameters for each layer in a deep neural network). For example, if the predicted value of the network is too high, the value of the weight in the weight matrix is adjusted to reduce the predicted value, and after continuous adjustment, the value output by the neural network is close to or equal to the target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", that is, the loss function or the objective function. The loss function is used to measure the difference between the predicted value and the target value. Important equation. Among them, taking the loss function as an example, the higher the output value of the loss function (loss), the greater the difference, and the training of the neural network can be understood as the process of reducing the loss as much as possible.
计算模块可以包括训练模块220,训练模块220得到的目标模型/规则可以应用不同的系统或设备中。在附图2中,执行设备210配置收发器212,该收发器212可以是无线收发器、光收发器或有线接口(如I/O接口)等,与外部设备进行数据交互,“用户”可以通过客户设备240向收发器212输入数据,例如,本申请以下实施方式,客户设备240可以向执行设备210发送目标任务,请求执行设备构建神经网络,并向执行设备210发送用于训练的数据库。The computing module may include a training module 220, and the target model/rule obtained by the training module 220 may be applied to different systems or devices. In FIG. 2, the execution device 210 configures a transceiver 212, which can be a wireless transceiver, an optical transceiver, or a wired interface (such as an I/O interface), etc., to perform data interaction with external devices, and a "user" can The client device 240 inputs data to the transceiver 212. For example, in the following embodiments of the present application, the client device 240 can send target tasks to the execution device 210, request the execution device to build a neural network, and send the execution device 210 a database for training.
执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。The execution device 210 can call data, codes, etc. in the data storage system 250 , and can also store data, instructions, etc. in the data storage system 250 .
计算模块211使用目标模型/规则201对输入的数据进行处理。具体地,计算模块211用于:通过第一模型生成器构建第一神经网络模型,根据第一神经网络模型获取第一神经网络模型在目标芯片上运行时的第一性能指标,根据第一性能指标调整第一模型生成器,得到第二模型生成器,通过第二模型生成器构建第二神经网络模型,第二神经网络模型的第二性能指标优于第一性能指标。The calculation module 211 uses the target model/rule 201 to process the input data. Specifically, the computing module 211 is configured to: construct a first neural network model by using a first model generator, obtain a first performance index of the first neural network model when running on the target chip according to the first neural network model, and obtain a first performance index according to the first performance The index adjusts the first model generator to obtain a second model generator, and a second neural network model is constructed by the second model generator, and the second performance index of the second neural network model is better than the first performance index.
关联功能模块21具体可以是用于训练模型生成器的模块。The association function module 21 may specifically be a module for training a model generator.
关联功能模块214可以用于根据搜索空间包括的基础运算进行搜索构建,得到第一模型生成器。The association function module 214 may be configured to perform search and construction according to the basic operations included in the search space to obtain the first model generator.
最后,收发器212将构建得到的神经网络模型返回给客户设备240,以在客户设备240或者其他设备中部署该神经网络模型。Finally, the transceiver 212 returns the constructed neural network model to the client device 240 to deploy the neural network model in the client device 240 or other devices.
更深层地,训练模块220可以针对不同的目标任务,基于不同的数据得到相应的目标模型/规则201,以给用户提供更佳的结果。More deeply, the training module 220 can obtain corresponding target models/rules 201 based on different data for different target tasks, so as to provide users with better results.
在附图2中所示情况下,用户可以手动指定输入执行设备210中的数据,例如,在收 发器212提供的界面中操作。另一种情况下,客户设备240可以自动地向收发器212输入数据并获得结果,如果客户设备240自动输入数据需要获得用户的授权,用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端将采集到与目标任务关联的数据存入数据库230。In the case shown in FIG. 2, the user may manually specify data entered into the execution device 210, for example, operating in an interface provided by the transceiver 212. In another case, the client device 240 can automatically input data to the transceiver 212 and obtain the result. If the client device 240 automatically enters data and needs to obtain the user's authorization, the user can set the corresponding permission in the client device 240 . The user can view the result output by the execution device 210 on the client device 240, and the specific presentation form can be a specific manner such as display, sound, and action. The client device 240 can also act as a data collection end to store the collected data associated with the target task into the database 230 .
需要说明的是,附图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在附图2中,数据存储系统250相对执行设备210是外部存储器,在其它场景中,也可以将数据存储系统250置于执行设备210中。It should be noted that FIG. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 2 , the data storage system 250 is an external memory relative to the execution device 210 . In other scenarios, the data storage system 250 may also be placed in the execution device 210 .
参见附图3,本申请实施例提供了一种系统架构300。执行设备210由一个或多个服务器实现,可选的,与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备;执行设备210可以布置在一个物理站点上,或者分布在多个物理站点上。执行设备210可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码实现本申请以下图6-8对应的神经网络模型构建方法的步骤。Referring to FIG. 3 , an embodiment of the present application provides a system architecture 300 . The execution device 210 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 may be arranged on a physical site, or distributed in multiple on the physical site. The execution device 210 can use the data in the data storage system 250 or call the program code in the data storage system 250 to implement the steps of the neural network model construction method corresponding to FIGS. 6-8 below in this application.
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与执行设备210进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。A user may operate respective user devices (eg, local device 301 and local device 302 ) to interact with execution device 210 . Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, etc.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与执行设备210进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。具体地,该通信网络可以包括无线网络、有线网络或者无线网络与有线网络的组合等。该无线网络包括但不限于:第五代移动通信技术(5th-Generation,5G)系统,长期演进(long term evolution,LTE)系统、全球移动通信系统(global system for mobile communication,GSM)或码分多址(code division multiple access,CDMA)网络、宽带码分多址(wideband code division multiple access,WCDMA)网络、无线保真(wireless fidelity,WiFi)、蓝牙(bluetooth)、紫蜂协议(Zigbee)、射频识别技术(radio frequency identification,RFID)、远程(Long Range,Lora)无线通信、近距离无线通信(near field communication,NFC)中的任意一种或多种的组合。该有线网络可以包括光纤通信网络或同轴电缆组成的网络等。Each user's local device can interact with the execution device 210 through any communication mechanism/standard communication network, which can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof. Specifically, the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, and the like. The wireless network includes but is not limited to: the fifth generation mobile communication technology (5th-Generation, 5G) system, the long term evolution (long term evolution, LTE) system, the global system for mobile communication (global system for mobile communication, GSM) or code division Multiple access (code division multiple access, CDMA) network, wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), Any one or a combination of radio frequency identification technology (radio frequency identification, RFID), long range (Long Range, Lora) wireless communication, and near field communication (near field communication, NFC). The wired network may include an optical fiber communication network or a network composed of coaxial cables, and the like.
在另一种实现中,执行设备210的一个方面或多个方面可以由每个本地设备实现,例如,本地设备301可以为执行设备210提供本地数据或反馈计算结果。In another implementation, one or more aspects of the execution device 210 may be implemented by each local device, for example, the local device 301 may provide the execution device 210 with local data or feedback calculation results.
需要注意的,执行设备210的所有功能也可以由本地设备实现。例如,本地设备301实现执行设备210的功能并为自己的用户提供服务,或者为本地设备302的用户提供服务。It should be noted that all the functions of the execution device 210 can also be implemented by the local device. For example, the local device 301 implements the functions of the execution device 210 and provides services for its own users, or provides services for the users of the local device 302 .
请参阅图4,为本申请实施例提供的一个神经网络模型构建框架示意图。Please refer to FIG. 4 , which is a schematic diagram of a framework for constructing a neural network model according to an embodiment of the present application.
该神经网络模型构建框架包括至少一个控制器模型和经由控制器模型生成的神经网络模型。其中,控制器模型通过搜索获得一个神经网络模型的架构,并通过训练集训练该神经网络模型的架构,再通过验证集对该神经网络模型的架构进行评估得到准确度。之后, 将反馈的结果(例如准确度)返回到控制模型中,利用强化学习更新控制器模型,在下一次循环中使控制器能生成更好的网络结构。这一过程重复多次之后生成新的架构、再进行测试,再把反馈结果输送给控制器模型再次进行强化学习。最终,上述控制器模型会倾向于设计哪些在验证集中能获得更高准确性的架构。The neural network model construction framework includes at least one controller model and a neural network model generated via the controller model. The controller model obtains the architecture of a neural network model through searching, trains the architecture of the neural network model through the training set, and then evaluates the architecture of the neural network model through the validation set to obtain the accuracy. After that, the feedback results (such as accuracy) are returned to the control model, and the controller model is updated using reinforcement learning, so that the controller can generate a better network structure in the next cycle. After this process is repeated many times, a new architecture is generated, tested again, and the feedback results are sent to the controller model for reinforcement learning again. Ultimately, the above controller models will tend to design architectures that achieve higher accuracy in the validation set.
进一步的,基于图4的神经网络模型构建框架示意图,请参阅图5,为本申请实施例提供的另一神经网络模型构建框架示意图。Further, based on the schematic diagram of the neural network model construction framework shown in FIG. 4 , please refer to FIG. 5 , which is a schematic diagram of another neural network model construction framework provided in this embodiment of the present application.
该神经网络模型构建框架示意图至少包括网络结构种群、性能评估工具。其中,该网络结构种群里包含了多种神经网络模型构建单元,模型生成器通过在网络结构种群中搜索合适的神经网络构建单元,并将搜索到的神经网络构建单元构建为一个神经网络模型,可以理解的是,该神经网络模型可以包括一个或者多个神经网络模型构建单元,具体此处不做限定。The schematic diagram of the neural network model construction framework includes at least network structure population and performance evaluation tools. Among them, the network structure population contains a variety of neural network model building units. The model generator searches for suitable neural network building units in the network structure population, and constructs the searched neural network building units into a neural network model. It can be understood that the neural network model may include one or more neural network model construction units, which are not specifically limited here.
构建完神经网络模型之后,将该神经网络模型输入进性能评估工具中,以获得该神经网络模型在目标芯片上运行时的芯片性能指标。可选地,还可以将该神经网络模型放在云上的芯片进行训练,以得到神经网络结构的网络结构性能,可以理解的是,还可以将该神经网络模型放在其他芯片环境中训练,具体此处不做限定。After the neural network model is constructed, the neural network model is input into the performance evaluation tool to obtain the chip performance index when the neural network model runs on the target chip. Optionally, the neural network model can also be trained on a chip on the cloud to obtain the network structure performance of the neural network structure. It is understandable that the neural network model can also be trained in other chip environments. There is no specific limitation here.
在不断的迭代过程中,通过芯片性能指标和网络结构性能调整网络结构种群,以获得可以构建出芯片性能更高、网络结构性能更好的模型生成器。在一种可能的实现方式中,通过帕累托优化的方式更新网络结构种群,即在不影响网络结构性能的前提下,使得构建出的神经网络模型在芯片性能上更高。In the continuous iterative process, the network structure population is adjusted through chip performance indicators and network structure performance to obtain a model generator that can build a higher chip performance and better network structure performance. In a possible implementation, the network structure population is updated through Pareto optimization, that is, on the premise of not affecting the performance of the network structure, the constructed neural network model has higher chip performance.
需要说明的是,本申请实施例中的神经网络模型构建装置可以是服务器、台式计算机、笔记本电脑、计算机集群等带有芯片的计算机设备,具体此处不做限定。It should be noted that the apparatus for constructing the neural network model in the embodiment of the present application may be a computer device with a chip such as a server, a desktop computer, a notebook computer, and a computer cluster, which is not specifically limited here.
下面基于前述的应用场景,对本申请提供的神经网络模型构建方法进行说明。Based on the aforementioned application scenarios, the method for constructing the neural network model provided by the present application will be described below.
请参阅图6,为本申请实施例提供的一个神经网络模型构建方法的流程示意图。Please refer to FIG. 6 , which is a schematic flowchart of a method for constructing a neural network model according to an embodiment of the present application.
在步骤601中,神经网络模型构建装置通过第一模型生成器构建第一神经网络模型。In step 601, the apparatus for constructing a neural network model constructs a first neural network model through a first model generator.
神经网络模型构建装置通过第一模型生成器构建第一神经网络模型,该第一模型生成器为神经网络模型构建装置预置的。The apparatus for constructing a neural network model constructs a first neural network model through a first model generator, and the first model generator is preset by the apparatus for constructing a neural network model.
具体的,在一种可能的实现方式中,该神经网络模型构建装置通过第一模型生成器和目标任务构建第一神经网络。在第一模型生成器构建第一神经网络之前,神经网络模型构建装置会获取目标任务。该目标任务是根据自身的需求确定的,也可以是根据用户的操作确定的。例如,该目标任务可以包括:神经网络的类型、神经网络的准确度等。其中,神经网络的类型包括请求构建的神经网络的输出类型,例如,该目标任务可以是构建人脸识别神经网络,用于识别人脸,输出对应的人物信息。又例如,该目标任务可以是由终端发起的,构建车辆识别的神经网络,用于识别串改器得到的图片中所包括的车辆的信息。Specifically, in a possible implementation manner, the apparatus for constructing a neural network model constructs a first neural network by using a first model generator and a target task. Before the first model generator builds the first neural network, the neural network model building apparatus acquires the target task. The target task is determined according to its own needs, and may also be determined according to the user's operation. For example, the target task may include: the type of neural network, the accuracy of the neural network, and the like. The type of neural network includes the output type of the neural network requested to be constructed. For example, the target task may be to construct a neural network for face recognition, which is used to recognize faces and output corresponding character information. For another example, the target task may be initiated by the terminal, and a neural network for vehicle identification is constructed to identify the information of the vehicle included in the picture obtained by the cross-modifier.
需要说明的是,本申请中的神经网络可以是卷积神经网络,循环神经网络,感知器神经网络等等,具体可以根据实际应用场景进行调整,本申请不对此作出限定。It should be noted that the neural network in this application may be a convolutional neural network, a recurrent neural network, a perceptron neural network, etc., which may be adjusted according to actual application scenarios, which is not limited in this application.
在获取目标任务的同时或者获取到目标任务之后,还可以获取对应的训练数据。该训练数据为与目标任务相关联的数据。该训练数据可以包括目标任务的输入数据和真实测量 数据。例如,若目标任务为构建人脸识别神经网络,则该训练数据包括大量人脸图片和每张图片对应的任务信息。其中,在一种可能的实现方式中,可以将该训练数据分为训练集和验证集,训练集表示训练该神经网络模型的图片,验证集表示验证该网络模型准确度的图片。Corresponding training data can also be acquired while acquiring the target task or after acquiring the target task. The training data is data associated with the target task. The training data may include input data for the target task and real measurement data. For example, if the target task is to construct a face recognition neural network, the training data includes a large number of face pictures and task information corresponding to each picture. Wherein, in a possible implementation manner, the training data may be divided into a training set and a validation set, where the training set represents a picture for training the neural network model, and the validation set represents a picture for verifying the accuracy of the network model.
在一种可能的实现方式中,在第一模型生成器构建第一神经网络之前,还向第一模型生成器输入硬件约束条件,该硬件约束条件包括目标芯片的各项参数。具体的,该硬件约束条件可以包括以下至少一种:芯片的频率、芯片内存的大小、芯片运算模块的大小、芯片中内存和内存之间的带宽等等。可以理解的是,在实际应用过程中,还可以包括更多的参数,具体此处不做限定。将该硬件约束条件输入到第一模型生成器中后,第一模型生成器在构建第一神经网络模型时会根据该硬件约束条件构建第一神经网络。In a possible implementation manner, before the first model generator constructs the first neural network, hardware constraints are also input to the first model generator, where the hardware constraints include various parameters of the target chip. Specifically, the hardware constraints may include at least one of the following: the frequency of the chip, the size of the chip memory, the size of the chip computing module, the bandwidth between the memory in the chip and the memory, and the like. It can be understood that in the actual application process, more parameters may also be included, which are not specifically limited here. After the hardware constraint is input into the first model generator, the first model generator will construct the first neural network according to the hardware constraint when constructing the first neural network model.
在步骤602中,神经网络模型构建装置获取第一理论性能指标。In step 602, the apparatus for constructing the neural network model obtains the first theoretical performance index.
神经网络模型构建装置获取第一理论性能指标,该第一理论性能指标表示第一神经网络模型在目标芯片上运行时的性能指标的理论值,即该理论性能指标为一个理论的性能指标。The apparatus for constructing the neural network model obtains a first theoretical performance index, where the first theoretical performance index represents a theoretical value of the performance index of the first neural network model running on the target chip, that is, the theoretical performance index is a theoretical performance index.
该目标芯片可以是中央处理器(central processing unit,CPU)图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processing,DSP)、现场可编程门阵列(field programmable gate array,FPGA)、专用集成电路(application specific integrated circuit,ASIC)或通用处理器等。例如,该第一理论性能指标为CPU运行第一神经网络时理论的性能指标。The target chip can be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA) ), application specific integrated circuit (ASIC) or general-purpose processor, etc. For example, the first theoretical performance index is a theoretical performance index when the CPU runs the first neural network.
在一种可能的实现方式中,该第一理论性能指标包括以下至少一种:理论的矢量模块界限(vector bound)、理论的内存界限(memory bound)、理论的立方体模块(cube)利用率、理论的高速并行乘法累加器MAC(multiplier-accumulator/multiply-accumulate operation,MAC)利用率、理论的立方体模块运算的次数(cube cycle)、理论的矢量模块运算的次数(vector cycle)、L1和L2内存融合效果、批量计算(compute batch)效果、平铺(Tiling)策略及其性能表现、混合精度下的性能效果、数据流不同模式下的性能效果、神经网络模型里每算子或网络层cycles或时延、整个神经网络模型的总cycle数量或者时延。In a possible implementation manner, the first theoretical performance index includes at least one of the following: theoretical vector bound, theoretical memory bound, theoretical cube utilization, Theoretical high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization, theoretical cube module operations (cube cycle), theoretical vector module operations (vector cycle), L1 and L2 Memory fusion effect, compute batch effect, tiling strategy and its performance, performance effect under mixed precision, performance effect under different modes of data flow, cycles of each operator or network layer in the neural network model Or delay, the total number of cycles or delay of the entire neural network model.
在一种可能的实现方式中,神经网络模型构建装置通过性能评估工具获取第一理论性能指标,性能评估工具包括计算函数,计算函数用于对第一神经网络模型进行计算,以得到第一理论性能指标。具体的,该性能评估工具可以是一种软件形式的工具,用于获取目标芯片运行第一神经网络时对应的性能指标。例如,在一种优选的方式中,该性能评估工具为PTM,则神经网络模型构建装置通过PTM获取理论性能指标。在实际应用过程中,该性能评估工具还可以以其他形式存在,例如为一个硬件模块,具体此处不做限定。In a possible implementation manner, the neural network model building apparatus obtains the first theoretical performance index through a performance evaluation tool, and the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index. Performance. Specifically, the performance evaluation tool may be a tool in the form of software, and is used to obtain the corresponding performance index when the target chip runs the first neural network. For example, in a preferred manner, the performance evaluation tool is PTM, and the apparatus for constructing a neural network model obtains theoretical performance indicators through PTM. In a practical application process, the performance evaluation tool may also exist in other forms, for example, a hardware module, which is not specifically limited here.
具体的,在一种可能的实现方式中,神经网络模型构建装置以组成第一神经网络的至少一个组成单元(该组成单元可以包括至少一个算子或者以一个layer层为一个单元)为一个第一构建单元进行划分,将神经网络划分为一个或多个第一构建单元。将该多个第一构建单元输入到性能评估工具中,性能评估工具根据该多个第一构建单元和目标芯片的参 数进行计算,得到各个第一构建单元的理论性能指标。进而,再将该各个第一构建单元的理论性能指标叠加起来,即为该目标芯片运行第一神经网络的第一理论性能指标。例如,该理论性能指标为目标芯片运行第一神经网络时的耗时,一个构建单元为第一神经网络中的一个算子,则性能评估工具分别对各个算子对应的耗时进行计算,进而再进行叠加,最终得到整个神经网络中所有算子对应的耗时,即得到目标芯片运行第一神经网络时的耗时。其中,整网的耗时可以通过以下公式进行计算:Specifically, in a possible implementation manner, the device for constructing a neural network model takes at least one component unit (the component unit may include at least one operator or a layer as a unit) composing the first neural network as a first A building unit is divided, and the neural network is divided into one or more first building units. The multiple first building units are input into a performance evaluation tool, and the performance evaluation tool performs calculations according to the multiple first building units and the parameters of the target chip to obtain theoretical performance indicators of each first building unit. Further, the theoretical performance indicators of the first building units are superimposed, that is, the first theoretical performance indicators of the target chip running the first neural network. For example, the theoretical performance index is the time-consuming when the target chip runs the first neural network, and one building unit is an operator in the first neural network, then the performance evaluation tool calculates the time-consuming corresponding to each operator, and then Then superimpose, and finally obtain the time-consuming corresponding to all operators in the entire neural network, that is, obtain the time-consuming when the target chip runs the first neural network. Among them, the time-consuming of the whole network can be calculated by the following formula:
整网耗时=∑ nf(性能评估工具预估各层算子的耗时) Time-consuming of the whole network = ∑ n f (the performance evaluation tool estimates the time-consuming of operators at each layer)
需要说明的是,其他性能指标也可以通过类似的公式实现,例如计算cube cycle数量时,可以通过如下公式进行计算:It should be noted that other performance indicators can also be achieved by similar formulas. For example, when calculating the number of cube cycles, it can be calculated by the following formula:
整网cube cycle=∑ nf(性能评估工具预估各层算子的cube cycle) The entire network cube cycle=∑ n f (the performance evaluation tool estimates the cube cycle of each layer operator)
具体的,在一种可能的实现方式中,神经网络模型构建装置将整个第一神经网络模型输入到性能评估工具中,性能评估工具根据该第一神经网络模型和目标芯片的参数进行计算,得到第一神经网络模型的理论性能指标,即为该目标芯片运行第一神经网络的第一理论性能指标。例如,该第一理论性能指标为目标芯片运行第一神经网络时的耗时,则性能评估工具通过对该第一神经网络耗时的计算,得到目标芯片运行第一神经网络时的耗时。Specifically, in a possible implementation manner, the neural network model construction device inputs the entire first neural network model into the performance evaluation tool, and the performance evaluation tool performs calculation according to the first neural network model and the parameters of the target chip, and obtains The theoretical performance index of the first neural network model is the first theoretical performance index of the target chip running the first neural network. For example, if the first theoretical performance index is the time consumed when the target chip runs the first neural network, the performance evaluation tool obtains the time consumed when the target chip runs the first neural network by calculating the time consuming of the first neural network.
在一种可能的实现方式中,神经网络模型构建装置通过将数据流输入到性能评估工具中,获得第一理论性能指标。具体的,神经网络模型构建装置确定第一神经网络模型的一个或者多个第一构建单元,该第一构建单元包括以下至少一个:第一神经网络模型的卷积层、第一神经网络模型的池化层、第一神经网络模型的激活函数、第一神经网络模型的归一化层。进而根据该一个或者多个第一构建单元将神经网络中不同层划分为一个任务的维度,例如以一个或者多个卷积层、池化层、激活函数、归一化层等划分为一个维度,在一种优选方式中,以一个卷积层、池化层、激活函数、归一化层划分为一个维度。并根据各个任务分析数据流,例如在芯片运行神经网络时,不同类型的芯片内存会处理不同的数据。例如从L2到L1的过程中,L2需要将数据传输到L1上,则分析一个神经网络的维度下,在这个过程中的效率。该过程还可以包括:L1->L0A/L0B、UB->L1等等。再根据数据流计算分析神经网络中各个维度在执行数据传输时,通过哪些管道(pipe)进行计算和数据的传输。进而根据各个管道(pipe)及数据切割的处理的情况计算cycle数、cube利用率、mac利用率、vector bound、memory bound(DDR,L2)等等理论性能指标。In a possible implementation manner, the apparatus for constructing the neural network model obtains the first theoretical performance index by inputting the data stream into the performance evaluation tool. Specifically, the neural network model construction apparatus determines one or more first construction units of the first neural network model, where the first construction units include at least one of the following: a convolution layer of the first neural network model, a The pooling layer, the activation function of the first neural network model, and the normalization layer of the first neural network model. Then, according to the one or more first building units, different layers in the neural network are divided into a dimension of a task, for example, one or more convolution layers, pooling layers, activation functions, normalization layers, etc. are divided into one dimension , in a preferred manner, a convolution layer, a pooling layer, an activation function, and a normalization layer are divided into one dimension. And analyze the data flow according to each task, such as when the chip runs a neural network, different types of chip memory will process different data. For example, in the process from L2 to L1, L2 needs to transmit data to L1, then analyze the efficiency of this process under the dimension of a neural network. The process may also include: L1->LOA/L0B, UB->L1, and so on. Then, according to the data flow calculation, analyze the pipelines through which each dimension in the neural network performs data transmission for calculation and data transmission. Then, theoretical performance indicators such as the number of cycles, cube utilization, mac utilization, vector bound, memory bound (DDR, L2), etc. are calculated according to the processing of each pipeline (pipe) and data.
memory bound:芯片中最小内存单元复用的情况。当输出的特征图大于芯片的最小内存单元时,芯片的最小内存单元需要多次复用。本申请实施例中,可以通过在构建神经网络时对应的权重,将输出的特征图调小,以适应芯片的最小内存单元的大小,提升了芯片最小内存单元的使用效率。memory bound: The situation where the smallest memory unit in the chip is reused. When the output feature map is larger than the minimum memory unit of the chip, the minimum memory unit of the chip needs to be reused multiple times. In the embodiment of the present application, the output feature map can be reduced by the corresponding weight when constructing the neural network to adapt to the size of the minimum memory unit of the chip, thereby improving the use efficiency of the minimum memory unit of the chip.
vector bound:芯片中vector模块复用的情况。vector bound: The situation of multiplexing of the vector module in the chip.
Cube利用率:cube模块主要是用来计算矩阵的,cube利用率即指单位时间内使用cube模块的次数。在神经网络训练时,如果想提高后续cube的利用率,可以通过将cube利用率设置为目标任务,在训练模型时,模型越接近收敛,则表示cube的利用率就越高。Cube utilization: The cube module is mainly used to calculate the matrix. The cube utilization refers to the number of times the cube module is used per unit time. During neural network training, if you want to improve the utilization of subsequent cubes, you can set the utilization of cubes as the target task. When training the model, the closer the model is to convergence, the higher the utilization of cubes.
MAC(multiplier-accumulator/multiply-accumulate operation,MAC)利用率:芯片中统计神经网络乘法和加法运算次数的模块,mac利用率即指单位时间内使用mac模块的次数。本申请实施例中提升mac利用率的方法和提升cube利用率的方法类似,具体此处不再赘述。MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization rate: a module in the chip that counts the number of neural network multiplication and addition operations, and the mac utilization rate refers to the number of times the mac module is used per unit time. The method for improving the mac utilization rate in the embodiment of the present application is similar to the method for improving the cube utilization rate, and details are not described herein again.
cube cycle数:利用芯片中cube模块运算的次数。Number of cube cycles: The number of operations performed by the cube module in the chip.
vector cycle数:利用vector模块运算的次数。The number of vector cycles: the number of operations using the vector module.
L1和L2内存融合:L1和L2是芯片内存中不同的两种类型,通过设置构建神经网络的对应权重,可以提高L1和L2在处理神经网络对应的数据时的兼容性。L1 and L2 memory fusion: L1 and L2 are two different types of chip memory. By setting the corresponding weights for building a neural network, the compatibility of L1 and L2 in processing the data corresponding to the neural network can be improved.
Compute batch:同一时间段内批量处理特征图的数量。通过设置构建神经网络的对应权重,可以在保证芯片运行性能的前提下,尽可能的提升compute batch。Compute batch: The number of batch processing feature maps in the same time period. By setting the corresponding weights for constructing the neural network, the compute batch can be improved as much as possible on the premise of ensuring the running performance of the chip.
除了上述描述的芯片的性能指标之外,本申请实施例中的性能指标还可以包括更多的参数,只要可以影响神经网络在芯片上的运行效率的都可以,具体此处不做限定。In addition to the performance indicators of the chip described above, the performance indicators in the embodiments of the present application may also include more parameters, as long as they can affect the operating efficiency of the neural network on the chip, which is not specifically limited here.
在步骤603中,神经网络模型构建装置根据第一理论性能指标调整第一模型生成器,得到第二模型生成器。In step 603, the apparatus for constructing a neural network model adjusts the first model generator according to the first theoretical performance index to obtain a second model generator.
在获得了第一理论性能指标之后,神经网络模型构建装置根据第一理论性能指标调整第一模型生成器,得到第二模型生成器,该第二模型生成器可以构建出相对于第一模型生成器,在目标芯片上运行时各个理论性能指标更优的神经网络。After obtaining the first theoretical performance index, the neural network model building apparatus adjusts the first model generator according to the first theoretical performance index to obtain a second model generator, and the second model generator can construct a A neural network with better theoretical performance indicators when running on the target chip.
在一种可能的实现方式中,可以根据该第一理论性能指标调整对应的第一模型生成器中的权重因子,以使得第一模型生成器可以构建出在目标芯片中运行时性能指标更好的第二神经网络模型,且该第二神经网络模型的第二理论性能指标优于第一理论性能指标。例如,当理论性能指标为理论的耗时,那在对模型生成器进行调整时,则调整对应的耗时的权重因子,以使得模型生成器构建出运行目标芯片时耗时更短的神经网络。可以理解的是,当理论性能指标是多个理论性能指标时,则对应的调整多个权重因子,使得该第一模型生成器构建出在目标芯片运行时各个性能指标更优的神经网络。In a possible implementation manner, the weighting factor in the corresponding first model generator can be adjusted according to the first theoretical performance index, so that the first model generator can construct a better performance index when running in the target chip The second neural network model, and the second theoretical performance index of the second neural network model is better than the first theoretical performance index. For example, when the theoretical performance index is the theoretical time-consuming, when adjusting the model generator, adjust the corresponding time-consuming weight factor, so that the model generator can build a neural network that takes less time to run the target chip. . It can be understood that when the theoretical performance indicators are multiple theoretical performance indicators, multiple weighting factors are adjusted correspondingly, so that the first model generator constructs a neural network with better performance indicators when the target chip is running.
在一种可能的实现方式中,对第一神经网络进行训练,当第一神经网络趋于收敛时,得到第五神经网络模型,并获得第五神经网络的性能参数。该第五神经网络的性能参数可以包括准确率,描述图片的峰值信噪比等,具体此处不做限定。进而再根据该第五神经网络的性能参数和第一理论性能指标调整第一模型生成器中的权重因子。这样可以在保证模型生成器可以生成的神经网络的性能参数优良,且在目标芯片上运行的性能指标更优。In a possible implementation manner, the first neural network is trained, and when the first neural network tends to converge, a fifth neural network model is obtained, and performance parameters of the fifth neural network are obtained. The performance parameters of the fifth neural network may include an accuracy rate, a peak signal-to-noise ratio describing a picture, etc., which are not specifically limited here. Further, the weight factor in the first model generator is adjusted according to the performance parameters of the fifth neural network and the first theoretical performance index. In this way, the performance parameters of the neural network that can be generated by the model generator can be guaranteed, and the performance indicators running on the target chip are better.
需要说明的是,如图7所示,步骤601至步骤603为一次第一模型生成器根据第一理论性能指标迭代一次的过程,在实际应用过程中,可以经过一次或者多次的迭代,最终生成可以构建理论性能指标最优的神经网络的模型生成器,即第二模型生成器。It should be noted that, as shown in FIG. 7 , steps 601 to 603 are a process in which the first model generator iterates once according to the first theoretical performance index. A model generator capable of constructing a neural network with the optimal theoretical performance index is generated, that is, the second model generator.
在步骤604中,神经网络模型构建装置获取实测性能指标。In step 604, the apparatus for constructing the neural network model obtains the measured performance index.
在神经网络模型构建装置得到第二模型生成器之后,神经网络模型构建装置获取第一实测性能指标,该第一实测性能指标表示目标芯片运行第二神经网络时实际的性能指标。After the neural network model building device obtains the second model generator, the neural network model building device obtains the first measured performance index, where the first measured performance index represents the actual performance index when the target chip runs the second neural network.
当神经网络模型构建装置获得第二模型生成器之后,可以直接使用该第二模型生成器生成的第二神经网络,还可以对该第二模型生成器进行第二轮的改进。该第二轮的改进是 基于第二神经网络在实际的目标芯片上运行得到的性能指标进行的调整。After the apparatus for constructing the neural network model obtains the second model generator, the second neural network generated by the second model generator can be directly used, and the second model generator can also be improved in a second round. The second round of improvement is based on the performance indicators obtained by the second neural network running on the actual target chip.
在一种可能的实现方式中,从第二模型生成器构建出第二神经网络模型,并且将该第二神经网络模型放置到目标芯片中运行,并获取目标芯片运行第二网络模型时对应实际的性能指标。In a possible implementation manner, the second neural network model is constructed from the second model generator, and the second neural network model is placed in the target chip to run, and the corresponding actual situation when the target chip runs the second network model is obtained. performance indicators.
该目标芯片可以是中央处理器(central processing unit,CPU)图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processing,DSP)、现场可编程门阵列(field programmable gate array,FPGA)、专用集成电路(application specific integrated circuit,ASIC)或通用处理器等。The target chip can be a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a field programmable gate array (FPGA) ), application specific integrated circuit (ASIC) or general-purpose processor, etc.
在一种可能的实现方式中,该第一实测性能指标可以包括以下至少一种:实测的矢量模块界限(vector bound)、实测的内存界限(memory bound)、实测的立方体模块(cube)利用率、实测的高速并行乘法累加器MAC(multiplier-accumulator/multiply-accumulate operation,MAC)利用率、实测的立方体模块运算的次数(cube cycle)、L1和L2内存实测融合效果、实测批量计算(compute batch)效果、实测平铺(Tiling)策略及其性能表现、实测混合精度下的性能效果、数据流不同模式下的实测性能效果、神经网络模型里每算子或网络层cycles的实测时延、实测整个神经网络模型的总cycle数量或者实测时延等,可以理解的是,在实际应用过程中,还可以包括更多的芯片性能指标,例如实测的矢量模块运算的次数(vector cycle),只要是可以反应目标芯片运行该第二网络性能所反应的性能指标即可,具体此处不做限定。In a possible implementation manner, the first measured performance index may include at least one of the following: measured vector bound, measured memory bound, measured cube utilization , the measured high-speed parallel multiplier-accumulator MAC (multiplier-accumulator/multiply-accumulate operation, MAC) utilization rate, the measured number of cube module operations (cube cycle), L1 and L2 memory measured fusion effect, measured batch calculation (compute batch ) effect, measured tiling strategy and its performance, measured performance effect under mixed precision, measured performance effect under different data flow modes, measured delay of each operator or network layer cycles in the neural network model, measured It can be understood that in the actual application process, more chip performance indicators can be included, such as the measured number of vector module operations (vector cycle), as long as it is It is sufficient to reflect the performance index reflected by the target chip running the second network performance, which is not specifically limited here.
在一种可能的实现方式中,目标芯片所在的系统可以通过软件或者硬件的形式获取到该目标芯片运行第二神经网络时对应的性能指标,具体获取性能指标的方式此处不做限定。In a possible implementation manner, the system where the target chip is located can obtain the performance index corresponding to the target chip running the second neural network in the form of software or hardware, and the specific method of obtaining the performance index is not limited here.
在步骤605中,神经网络模型构建装置根据第一实测性能指标调整第二模型生成器,得到第三模型生成器。In step 605, the neural network model building apparatus adjusts the second model generator according to the first measured performance index to obtain a third model generator.
在神经网络模型构建装置获取了第一实测性能指标之后,神经网络模型构建装置根据第一实测性能指标调整第二模型生成器,得到第三模型生成器,该第三模型生成器可以构建出第三神经网络模型,该第三神经网络模型的第二实测性能指标优于第一实测性能指标。After the neural network model construction device obtains the first measured performance index, the neural network model construction device adjusts the second model generator according to the first measured performance index to obtain a third model generator, which can construct a third model generator. Three neural network models, the second measured performance index of the third neural network model is better than the first measured performance index.
在一种可能的实现方式中,可以根据实测性能指标调整对应的第二模型生成器中的权重因子,以使得第二模型生成器可以构建出在目标芯片中运行时性能指标更好的神经网络。例如,当第一实测性能指标为第二神经网络在目标芯片上运行时实测的耗时,那在对第二模型生成器进行调整时,则调整对应的耗时的权重因子,以使得模型生成器构建出在目标芯片上运行时耗时更短的神经网络。可以理解的是,当第一实测理论性能指标是多个指标时,则对应的调整多个权重因子,使得该第二模型生成器构建出在目标芯片上运行时各个性能指标更优的神经网络。In a possible implementation manner, the weight factor in the corresponding second model generator can be adjusted according to the measured performance index, so that the second model generator can construct a neural network with better performance index when running in the target chip . For example, when the first measured performance index is the measured time-consuming when the second neural network runs on the target chip, when adjusting the second model generator, the corresponding time-consuming weight factor is adjusted so that the model generates The processor builds a neural network that takes less time to run on the target chip. It can be understood that when the first measured theoretical performance index is multiple indices, the multiple weighting factors are adjusted correspondingly, so that the second model generator can construct a neural network with better performance indices when running on the target chip. .
在一种可能的实现方式中,对第二神经网络进行训练,当第二神经网络趋于收敛时,得到第四神经网络模型,并获得第四神经网络的性能参数。该第四神经网络的性能参数可以包括准确率,描述图片的峰值信噪比等,具体此处不做限定。进而再根据该第四神经网络的性能参数和第一实测性能指标调整第一模型生成器中的权重因子。这样可以在保证模型生成器生成的神经网络的性能参数优良,且在目标芯片上运行的性能指标更优。In a possible implementation manner, the second neural network is trained, and when the second neural network tends to converge, a fourth neural network model is obtained, and performance parameters of the fourth neural network are obtained. The performance parameters of the fourth neural network may include an accuracy rate, a peak signal-to-noise ratio describing a picture, etc., which are not specifically limited here. Further, the weight factor in the first model generator is adjusted according to the performance parameter of the fourth neural network and the first measured performance index. This can ensure that the performance parameters of the neural network generated by the model generator are excellent, and the performance indicators running on the target chip are better.
需要说明的是,如图8所示,步骤604至步骤605为第二模型生成器根据实测性能指标迭代一次的过程,在实际应用过程中,可以经过一次或者多次的迭代,最终生成可以构建实测性能指标最优的神经网络的模型生成器,即第三模型生成器。It should be noted that, as shown in FIG. 8 , steps 604 to 605 are a process in which the second model generator iterates once according to the measured performance indicators. The model generator of the neural network with the best measured performance index, that is, the third model generator.
本申请实施例中,步骤604至步骤605为可选步骤,当不执行步骤604至步骤605时,则将第二模型生成器构建的第二神经网络模型作为在目标芯片上使用的神经网络模型。In this embodiment of the present application, steps 604 to 605 are optional steps, and when steps 604 to 605 are not performed, the second neural network model constructed by the second model generator is used as the neural network model used on the target chip .
需要说明的是,本申请实施例中,在获取理论性能指标和实测性能指标时,可以在神经网络模型构建装置中获取,也可以是其他计算机设备获取到之后,再发送给神经网络模型构建装置的,具体此处不做限定。It should be noted that, in the embodiment of the present application, when obtaining the theoretical performance index and the measured performance index, it can be obtained in the neural network model construction device, or it can be obtained by other computer equipment, and then sent to the neural network model construction device. Yes, there is no specific limitation here.
本申请实施例中,通过获取目标芯片运行第一神经网络的理论性能指标,并根据理论性能指标调整第一模型生成器中对应的权重,从而构建出在目标芯片中运行时硬件性能指标更优的神经网络模型。In the embodiment of the present application, by obtaining the theoretical performance index of the target chip running the first neural network, and adjusting the corresponding weights in the first model generator according to the theoretical performance index, a better hardware performance index when running on the target chip is constructed. neural network model.
上述图6所示的实施例为本申请实施例的一个应用场景,以下对本申请实施例神经网络模型构建方法另一应用场景进行描述。The above-mentioned embodiment shown in FIG. 6 is an application scenario of the embodiment of the present application, and another application scenario of the neural network model construction method of the embodiment of the present application is described below.
请参阅图9,为本申请实施例提供的神经网络模型构建方法另一流程示意图。Please refer to FIG. 9 , which is another schematic flowchart of a method for constructing a neural network model according to an embodiment of the present application.
本实施例中,以结构搜索空间为例表示模型生成器进行说明。In this embodiment, the structure search space is taken as an example to represent the model generator for description.
根据应用场景任务需要,在进行神经网络模型构建前还需要根据指标构建搜索空间。该搜索空间中每个组成单元为一个指示构建神经网络结构的编码。结构搜索过程中,每一次采样均利用了搜索空间中未评估的组成单元的分部关系与基于已评估组成单元构建的网络模型。具体的,如图9所示,虚线框中的部分即为构建初始搜索空间的步骤。该搜索空间由多个网络结构组成,该网络结构由多个组成单元组成。在构建了多个网络结构的初始搜索空间之后,基于预置规则或者应用场景任务筛选不合理的网络结构,得到第二搜索空间。再通过网络结构聚类,形成新的第三搜索空间。再基于第三搜索空间对簇中心结构进行训练,根据已评估的网络结构的训练损失值对未评估结构进行建模,再基于贝叶斯优化选择若干网络结构进行训练,接着再根据训练的损失值对未评估结构进行建模,以此为循环,直至搜索空间构建完成。According to the needs of application scenarios and tasks, it is necessary to construct a search space according to indicators before constructing a neural network model. Each constituent unit in the search space is a code indicating the construction of the neural network structure. In the process of structure search, each sampling utilizes the subdivision relationship of the unevaluated constituent units in the search space and the network model constructed based on the evaluated constituent units. Specifically, as shown in FIG. 9 , the part in the dotted box is the step of constructing the initial search space. The search space consists of multiple network structures, which consist of multiple constituent units. After constructing initial search spaces of multiple network structures, unreasonable network structures are screened based on preset rules or application scenario tasks to obtain a second search space. Then through the network structure clustering, a new third search space is formed. Then, the cluster center structure is trained based on the third search space, the unevaluated structure is modeled according to the training loss value of the evaluated network structure, and then several network structures are selected for training based on Bayesian optimization, and then according to the training loss The value models the unevaluated structure, and this loops until the search space is constructed.
在构建完初始搜索空间之后,再接着图6所示实施例中的神经网络模型构建方法进行搜索空间的进一步调整,具体此处不再赘述。After the initial search space is constructed, the method for constructing the neural network model in the embodiment shown in FIG. 6 is used to further adjust the search space, and details are not repeated here.
本申请实施例中,通过对搜索空间进行初始构建和训练,提升了搜索空间的性能。In the embodiment of the present application, the performance of the search space is improved by initially constructing and training the search space.
以上对本申请实施例中的神经网络模型构建方法进行了描述,下面对本申请实施例中的神经网络模型构建装置进行描述。The method for constructing a neural network model in the embodiment of the present application has been described above, and the apparatus for constructing a neural network model in the embodiment of the present application is described below.
请参阅图10,为本申请实施例提供的神经网络模型构建装置一个结构示意图。Please refer to FIG. 10 , which is a schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.
一种神经网络模型构建装置,包括:A device for constructing a neural network model, comprising:
构建单元1001,用于通过第一模型生成器构建第一神经网络模型;A construction unit 1001, configured to construct a first neural network model by a first model generator;
获取单元1002,用于根据第一神经网络模型获取第一神经网络模型在目标芯片上运行时的第一性能指标;an obtaining unit 1002, configured to obtain, according to the first neural network model, a first performance index when the first neural network model runs on the target chip;
处理单元1003,用于根据第一性能指标调整第一模型生成器,得到第二模型生成器;a processing unit 1003, configured to adjust the first model generator according to the first performance index to obtain the second model generator;
构建单元1001还用于通过第二模型生成器构建第二神经网络模型,第二神经网络模型 的第二性能指标优于第一性能指标。The constructing unit 1001 is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
本实施例中,神经网络模型构建装置各单元所执行的操作与前述图6和图7所示实施例描述的类似,具体此处不再赘述。In this embodiment, the operations performed by each unit of the apparatus for constructing a neural network model are similar to those described in the foregoing embodiments shown in FIG. 6 and FIG. 7 , and details are not repeated here.
请参阅图11,为本申请实施例提供的神经网络模型构建装置另一结构示意图。Please refer to FIG. 11 , which is another schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.
一种神经网络模型构建装置,包括:A device for constructing a neural network model, comprising:
构建单元1101,用于通过第一模型生成器构建第一神经网络模型;a construction unit 1101, configured to construct a first neural network model by a first model generator;
获取单元1102,用于根据第一神经网络模型获取第一神经网络模型在目标芯片上运行时的第一性能指标;an obtaining unit 1102, configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;
处理单元1103,用于根据第一性能指标调整第一模型生成器,得到第二模型生成器;a processing unit 1103, configured to adjust the first model generator according to the first performance index to obtain the second model generator;
构建单元1101还用于通过第二模型生成器构建第二神经网络模型,第二神经网络模型的第二性能指标优于第一性能指标。The constructing unit 1101 is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
可选地,第一性能指标为第一理论性能指标,第一理论性能指标表示第一神经网络模型在目标芯片上运行时的性能指标的理论值,第二性能指标为第二理论性能指标,第二理论性能指标优于第一理论性能指标。Optionally, the first performance index is a first theoretical performance index, the first theoretical performance index represents a theoretical value of the performance index when the first neural network model runs on the target chip, and the second performance index is a second theoretical performance index, The second theoretical performance index is better than the first theoretical performance index.
可选地,获取单元1102具体用于获取第一实测性能指标,第一实测性能指标表示第二神经网络模型在目标芯片上运行时的性能指标的实测值;Optionally, the obtaining unit 1102 is specifically configured to obtain the first measured performance index, where the first measured performance index represents the measured value of the performance index when the second neural network model runs on the target chip;
处理单元1103还用于根据第一实测性能指标调整第二模型生成器,得到第三模型生成器;The processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index to obtain the third model generator;
构建单元1101还用于通过第三模型生成器构建第三神经网络模型,第三神经网络模型的第二实测性能指标优于第一实测性能指标。The constructing unit 1101 is further configured to construct a third neural network model through the third model generator, where the second measured performance index of the third neural network model is better than the first measured performance index.
可选地,神经网络模型构建装置还包括:Optionally, the apparatus for constructing the neural network model further includes:
训练单元1104,用于训练第二神经网络模型得到第四神经网络模型;The training unit 1104 is used for training the second neural network model to obtain the fourth neural network model;
处理单元1103还用于根据第一实测性能指标调整第二模型生成器,得到第三模型生成器包括:The processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:
处理单元1103还用于根据第一实测性能指标和第四神经网络模型的模型性能调整第二模型生成器,得到第三模型生成器。The processing unit 1103 is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain a third model generator.
可选地,第一性能指标为第一理论性能指标,获取单元具体用于通过性能评估工具获取第一理论性能指标,性能评估工具包括计算函数,计算函数用于对第一神经网络模型进行计算,以得到第一理论性能指标。Optionally, the first performance index is a first theoretical performance index, and the obtaining unit is specifically configured to obtain the first theoretical performance index through a performance evaluation tool, the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model. , to obtain the first theoretical performance index.
可选地,神经网络模型构建装置还包括:Optionally, the apparatus for constructing the neural network model further includes:
确定单元1105,用于通过性能评估工具确定第一神经网络模型的第一构建单元,第一构建单元包括以下至少一个:第一神经网络模型的卷积层、第一神经网络模型的池化层、第一神经网络模型的激活函数、第一神经网络模型的归一化层;Determining unit 1105, configured to determine a first construction unit of the first neural network model by using a performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a pooling layer of the first neural network model , the activation function of the first neural network model, the normalization layer of the first neural network model;
处理单元1103还用于根据第一构建单元进行计算,以得到第一理论性能指标。The processing unit 1103 is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.
可选地,训练单元1104还用于训练第一神经网络模型得到第五神经网络模型;Optionally, the training unit 1104 is further configured to train the first neural network model to obtain the fifth neural network model;
处理单元1103还用于根据第一理论性能指标和第五神经网络模型的模型性能调整第一模型生成器,得到第二模型生成器。The processing unit 1103 is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
可选地,第一理论性能指标和第二理论性能指标分别包括以下至少一种:理论的矢量模块界限、理论的内存界限、理论的立方体模块利用率、理论的高速并行乘法累加器MAC利用率、理论的立方体模块运算次数、理论的矢量模块运算次数。Optionally, the first theoretical performance index and the second theoretical performance index respectively include at least one of the following: a theoretical vector module limit, a theoretical memory limit, a theoretical cube module utilization rate, and a theoretical high-speed parallel multiply-accumulator MAC utilization rate. , the number of operations of the theoretical cube module, and the number of operations of the theoretical vector module.
可选地,第一实测性能指标和第二实测性能指标分别包括以下至少一种:实测的矢量模块界限、实测的内存界限、实测的立方体模块利用率、实测的高速并行乘法累加器MAC利用率、实测的立方体模块运算次数、实测的矢量模块运算次数。Optionally, the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit, the measured memory limit, the measured cube module utilization rate, and the measured high-speed parallel multiply-accumulator MAC utilization rate. , the measured number of operations of the cube module, and the measured number of operations of the vector module.
本实施例中,神经网络模型构建装置各单元所执行的操作与前述图6和图7所示实施例描述的类似,具体此处不再赘述。In this embodiment, the operations performed by each unit of the apparatus for constructing a neural network model are similar to those described in the foregoing embodiments shown in FIG. 6 and FIG. 7 , and details are not repeated here.
请参阅图12,为本申请实施例提供的神经网络模型构建装置另一结构示意图。Please refer to FIG. 12 , which is another schematic structural diagram of the apparatus for constructing a neural network model according to an embodiment of the present application.
处理器1201、存储器1202、总线1205、接口1204,处理器1201与存储器1202、接口1204相连,总线1205分别连接处理器1201、存储器1202以及接口1204,接口1204用于接收或者发送数据,处理器1201是单核或多核中央处理单元,或者为特定集成电路,或者为被配置成实施本发明实施例的一个或多个集成电路。存储器1202可以为随机存取存储器(random access memory,RAM),也可以为非易失性存储器(non-volatile memory),例如至少一个硬盘存储器。存储器1202用于存储计算机执行指令。具体的,计算机执行指令中可以包括程序1203。The processor 1201, the memory 1202, the bus 1205, and the interface 1204. The processor 1201 is connected to the memory 1202 and the interface 1204. The bus 1205 is respectively connected to the processor 1201, the memory 1202, and the interface 1204. The interface 1204 is used to receive or send data. The processor 1201 be a single-core or multi-core central processing unit, or be a specific integrated circuit, or be one or more integrated circuits configured to implement embodiments of the invention. The memory 1202 may be random access memory (RAM), or may be non-volatile memory (non-volatile memory), such as at least one hard disk memory. Memory 1202 is used to store computer-executable instructions. Specifically, the program 1203 may be included in the computer-executed instructions.
本实施例中,该处理器1201调用程序1203时,可以使图12中的神经网络模型构建装置执行前述图6或图9所示实施例中神经网络模型构建装置所执行的操作,具体此处不再赘述。In this embodiment, when the processor 1201 calls the program 1203, it can make the neural network model construction apparatus in FIG. 12 execute the operations performed by the neural network model construction apparatus in the embodiment shown in FIG. 6 or FIG. 9, specifically here No longer.
应理解,本申请以上实施例中的神经网络模型构建装置中提及的处理器,或者本申请上述实施例提供的处理器,可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。It should be understood that the processor mentioned in the apparatus for constructing a neural network model in the above embodiments of the present application, or the processor provided by the above embodiments of the present application, may be a central processing unit (CPU), or other General-purpose processors, digital signal processors (DSPs), application-specific integrated circuits (ASICs), off-the-shelf programmable gate arrays (FPGAs) or other programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
还应理解,本申请中以上实施例中的神经网络模型构建装置中的处理器的数量可以是一个,也可以是多个,可以根据实际应用场景调整,此处仅仅是示例性说明,并不作限定。本申请实施例中的存储器的数量可以是一个,也可以是多个,可以根据实际应用场景调整,此处仅仅是示例性说明,并不作限定。It should also be understood that the number of processors in the apparatus for constructing a neural network model in the above embodiments of the present application may be one or more, and may be adjusted according to actual application scenarios. limited. The number of memories in this embodiment of the present application may be one or multiple, and may be adjusted according to actual application scenarios, which is merely illustrative and not limiting.
还需要说明的是,当神经网络模型构建装置包括处理器(或处理单元)与存储器时,本申请中的处理器可以是与存储器集成在一起的,也可以是处理器与存储器通过接口连接,可以根据实际应用场景调整,并不作限定。It should also be noted that when the neural network model building apparatus includes a processor (or a processing unit) and a memory, the processor in this application may be integrated with the memory, or the processor and the memory may be connected through an interface, It can be adjusted according to the actual application scenario and is not limited.
本申请实施例还提供了一种计算机程序或包括计算机程序的一种计算机程序产品,该计算机程序在某一计算机上执行时,将会使所述计算机实现上述任一方法实施例中与神经网络模型构建装置相关的方法流程。The embodiments of the present application also provide a computer program or a computer program product including a computer program, and when the computer program is executed on a computer, the computer will enable the computer to implement the neural network and the neural network in any of the above method embodiments. The method flow related to the model building device.
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现上述任一方法实施例中与神经网络模型构建装置相关的方法流 程。Embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a computer, implements the method process related to the apparatus for constructing a neural network model in any of the above method embodiments.
在上述图6-图9中各个实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。The various embodiments in the above-mentioned FIGS. 6-9 may be implemented in whole or in part by software, hardware, firmware or any combination thereof. When implemented in software, it can be implemented in whole or in part in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated. The computer may be a general purpose computer, special purpose computer, computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center is by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be stored by a computer, or a data storage device such as a server, data center, etc., which includes one or more available media integrated. The usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。The terms "first", "second" and the like in the description and claims of the present application and the above drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the terms used in this way can be interchanged under appropriate circumstances, and this is only a distinguishing manner adopted when describing objects with the same attributes in the embodiments of the present application. Furthermore, the terms "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, product or device comprising a series of elements is not necessarily limited to those elements, but may include no explicit or other units inherent to these processes, methods, products, or devices.
本申请各实施例中提供的消息/帧/信息、模块或单元等的名称仅为示例,可以使用其他名称,只要消息/帧/信息、模块或单元等的作用相同即可。The names of messages/frames/information, modules or units, etc. provided in the embodiments of the present application are only examples, and other names may be used, as long as the functions of the messages/frames/information, modules or units are the same.
在本申请实施例中使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本发明。在本申请实施例中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。The terms used in the embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit the present invention. As used in the embodiments of this application, the singular forms "a," "the," and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that, in the description of this application, unless otherwise specified, "/" indicates that the associated objects are in an "or" relationship, for example, A/B can indicate A or B; in this application, "and" "/or" is just an association relationship that describes an associated object, which means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. where A and B can be singular or plural.
取决于语境,如在此所使用的词语“如果”或“若”可以被解释成为“在……时”或“当……时”或“响应于确定”或“响应于检测”。类似地,取决于语境,短语“如果确定”或“如果检测(陈述的条件或事件)”可以被解释成为“当确定时”或“响应于确定”或“当检测(陈述的条件或事件)时”或“响应于检测(陈述的条件或事件)”。Depending on the context, the words "if" or "if" as used herein may be interpreted as "at" or "when" or "in response to determining" or "in response to detecting." Similarly, the phrases "if determined" or "if detected (the stated condition or event)" can be interpreted as "when determined" or "in response to determining" or "when detected (the stated condition or event)," depending on the context )" or "in response to detection (a stated condition or event)".
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand: The technical solutions described in the embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the scope of the technical solutions of the embodiments of the present application.

Claims (19)

  1. 一种神经网络模型构建方法,其特征在于,包括:A method for constructing a neural network model, comprising:
    通过第一模型生成器构建第一神经网络模型;constructing a first neural network model by a first model generator;
    根据所述第一神经网络模型获取所述第一神经网络模型在目标芯片上运行时的第一性能指标;Obtaining the first performance index when the first neural network model runs on the target chip according to the first neural network model;
    根据所述第一性能指标调整所述第一模型生成器,得到第二模型生成器;Adjust the first model generator according to the first performance index to obtain a second model generator;
    通过所述第二模型生成器构建第二神经网络模型,所述第二神经网络模型的第二性能指标优于所述第一性能指标。A second neural network model is constructed by the second model generator, and the second performance index of the second neural network model is better than the first performance index.
  2. 根据权利要求1所述的方法,其特征在于,所述第一性能指标为第一理论性能指标,所述第一理论性能指标表示所述第一神经网络模型在目标芯片上运行时的性能指标的理论值,所述第二性能指标为第二理论性能指标,所述第二理论性能指标优于所述第一理论性能指标。The method according to claim 1, wherein the first performance index is a first theoretical performance index, and the first theoretical performance index represents a performance index of the first neural network model when running on a target chip The theoretical value of , the second performance index is a second theoretical performance index, and the second theoretical performance index is better than the first theoretical performance index.
  3. 根据权利要求2所述的方法,其特征在于,通过所述第二模型生成器构建第二神经网络模型之后,所述方法还包括:The method according to claim 2, wherein after the second neural network model is constructed by the second model generator, the method further comprises:
    获取第一实测性能指标,所述第一实测性能指标表示所述第二神经网络模型在所述目标芯片上运行时的性能指标的实测值;obtaining a first measured performance index, where the first measured performance index represents an actual measured value of the performance index when the second neural network model runs on the target chip;
    根据所述第一实测性能指标调整所述第二模型生成器,得到第三模型生成器;Adjust the second model generator according to the first measured performance index to obtain a third model generator;
    通过所述第三模型生成器构建第三神经网络模型,所述第三神经网络模型的第二实测性能指标优于所述第一实测性能指标。A third neural network model is constructed by the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.
  4. 根据权利要求3所述的方法,其特征在于,获取第一实测性能指标之前,所述方法还包括:The method according to claim 3, wherein before obtaining the first measured performance index, the method further comprises:
    训练所述第二神经网络模型得到第四神经网络模型;training the second neural network model to obtain a fourth neural network model;
    根据所述第一实测性能指标调整所述第二模型生成器,得到第三模型生成器包括:Adjusting the second model generator according to the first measured performance index to obtain the third model generator includes:
    根据所述第一实测性能指标和所述第四神经网络模型的模型性能调整所述第二模型生成器,得到所述第三模型生成器。The third model generator is obtained by adjusting the second model generator according to the first measured performance index and the model performance of the fourth neural network model.
  5. 根据权利要求2至4中任一项所述的方法,其特征在于,所述第一性能指标为第一理论性能指标,根据所述第一神经网络模型获取所述第一神经网络模型在目标芯片上运行时的第一性能指标包括:The method according to any one of claims 2 to 4, wherein the first performance index is a first theoretical performance index, and the first neural network model is obtained according to the first neural network model at the target The first performance metrics when running on chip include:
    通过性能评估工具获取所述第一理论性能指标,所述性能评估工具包括计算函数,所述计算函数用于对所述第一神经网络模型进行计算,以得到所述第一理论性能指标。The first theoretical performance index is obtained through a performance evaluation tool, and the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
  6. 根据权利要求5所述的方法,其特征在于,通过性能评估工具获取所述第一理论性能指标包括:The method according to claim 5, wherein obtaining the first theoretical performance index by using a performance evaluation tool comprises:
    通过所述性能评估工具确定所述第一神经网络模型的第一构建单元,所述第一构建单元包括以下至少一个:第一神经网络模型的卷积层、第一神经网络模型的池化层、第一神经网络模型的激活函数、第一神经网络模型的归一化层;A first construction unit of the first neural network model is determined by the performance evaluation tool, and the first construction unit includes at least one of the following: a convolution layer of the first neural network model, a pooling layer of the first neural network model , the activation function of the first neural network model, the normalization layer of the first neural network model;
    根据所述第一构建单元进行计算,以得到所述第一理论性能指标。Calculation is performed according to the first construction unit to obtain the first theoretical performance index.
  7. 根据权利要求2至5中任一项所述的方法,其特征在于,通过第一模型生成器构建 第一神经网络模型之后,所述方法还包括:The method according to any one of claims 2 to 5, wherein after the first neural network model is constructed by the first model generator, the method further comprises:
    训练所述第一神经网络模型得到第五神经网络模型;training the first neural network model to obtain a fifth neural network model;
    根据所述第一性能指标调整所述第一模型生成器,得到第二模型生成器包括:Adjusting the first model generator according to the first performance index to obtain the second model generator includes:
    根据所述第一理论性能指标和所述第五神经网络模型的模型性能调整所述第一模型生成器,得到所述第二模型生成器。The second model generator is obtained by adjusting the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model.
  8. 根据权利要求2至6中任一项所述的方法,其特征在于,所述第一理论性能指标和所述第二理论性能指标分别包括以下至少一种:理论的矢量模块界限、理论的内存界限、理论的立方体模块利用率、理论的高速并行乘法累加器MAC利用率、理论的立方体模块运算次数、理论的矢量模块运算次数。The method according to any one of claims 2 to 6, wherein the first theoretical performance index and the second theoretical performance index respectively comprise at least one of the following: a theoretical vector module limit, a theoretical memory Bounds, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.
  9. 根据权利要求3至6中任一项所述的方法,其特征在于,所述第一实测性能指标和所述第二实测性能指标分别包括以下至少一种:实测的矢量模块界限、实测的内存界限、实测的立方体模块利用率、实测的高速并行乘法累加器MAC利用率、实测的立方体模块运算次数、实测的矢量模块运算次数。The method according to any one of claims 3 to 6, wherein the first measured performance index and the second measured performance index respectively comprise at least one of the following: measured vector module limit, measured memory Bounds, measured cube module utilization, measured high-speed parallel multiply-accumulator MAC utilization, measured cube module operation times, and measured vector module operation times.
  10. 一种神经网络模型构建装置,其特征在于,包括:A device for constructing a neural network model, comprising:
    构建单元,用于通过第一模型生成器构建第一神经网络模型;a construction unit for constructing a first neural network model by a first model generator;
    获取单元,用于根据所述第一神经网络模型获取所述第一神经网络模型在目标芯片上运行时的第一性能指标;an obtaining unit, configured to obtain, according to the first neural network model, the first performance index when the first neural network model runs on the target chip;
    处理单元,用于根据所述第一性能指标调整所述第一模型生成器,得到第二模型生成器;a processing unit, configured to adjust the first model generator according to the first performance index to obtain a second model generator;
    所述构建单元还用于通过所述第二模型生成器构建第二神经网络模型,所述第二神经网络模型的第二性能指标优于所述第一性能指标。The construction unit is further configured to construct a second neural network model through the second model generator, and the second performance index of the second neural network model is better than the first performance index.
  11. 根据权利要求10所述的神经网络模型构建装置,其特征在于,所述第一性能指标为第一理论性能指标,所述第一理论性能指标表示所述第一神经网络模型在目标芯片上运行时的性能指标的理论值,所述第二性能指标为第二理论性能指标,所述第二理论性能指标优于所述第一理论性能指标。The apparatus for constructing a neural network model according to claim 10, wherein the first performance index is a first theoretical performance index, and the first theoretical performance index indicates that the first neural network model runs on a target chip The theoretical value of the performance index at the time, the second performance index is a second theoretical performance index, and the second theoretical performance index is better than the first theoretical performance index.
  12. 根据权利要求11所述的神经网络模型构建装置,其特征在于,所述获取单元具体用于获取第一实测性能指标,所述第一实测性能指标表示所述第二神经网络模型在所述目标芯片上运行时的性能指标的实测值;The apparatus for constructing a neural network model according to claim 11, wherein the obtaining unit is specifically configured to obtain a first measured performance index, and the first measured performance index indicates that the second neural network model is at the target The measured value of the performance index when running on the chip;
    所述处理单元还用于根据所述第一实测性能指标调整所述第二模型生成器,得到第三模型生成器;The processing unit is further configured to adjust the second model generator according to the first measured performance index to obtain a third model generator;
    所述构建单元还用于通过所述第三模型生成器构建第三神经网络模型,所述第三神经网络模型的第二实测性能指标优于所述第一实测性能指标。The construction unit is further configured to construct a third neural network model through the third model generator, and the second measured performance index of the third neural network model is better than the first measured performance index.
  13. 根据权利要求12所述的神经网络模型构建装置,其特征在于,所述神经网络模型构建装置还包括:The neural network model construction device according to claim 12, wherein the neural network model construction device further comprises:
    训练单元,用于训练所述第二神经网络模型得到第四神经网络模型;a training unit for training the second neural network model to obtain a fourth neural network model;
    所述处理单元还用于根据所述第一实测性能指标调整所述第二模型生成器,得到第三模型生成器包括:The processing unit is further configured to adjust the second model generator according to the first measured performance index, and obtaining the third model generator includes:
    所述处理单元还用于根据所述第一实测性能指标和所述第四神经网络模型的模型性能调整所述第二模型生成器,得到所述第三模型生成器。The processing unit is further configured to adjust the second model generator according to the first measured performance index and the model performance of the fourth neural network model to obtain the third model generator.
  14. 根据权利要求11至13中任一项所述的神经网络模型构建装置,其特征在于,所述第一性能指标为第一理论性能指标,所述获取单元具体用于通过性能评估工具获取所述第一理论性能指标,所述性能评估工具包括计算函数,所述计算函数用于对所述第一神经网络模型进行计算,以得到所述第一理论性能指标。The apparatus for constructing a neural network model according to any one of claims 11 to 13, wherein the first performance index is a first theoretical performance index, and the obtaining unit is specifically configured to obtain the The first theoretical performance index, the performance evaluation tool includes a calculation function, and the calculation function is used to calculate the first neural network model to obtain the first theoretical performance index.
  15. 根据权利要求14所述的神经网络模型构建装置,其特征在于,所述神经网络模型构建装置还包括:The neural network model construction device according to claim 14, wherein the neural network model construction device further comprises:
    确定单元,用于通过所述性能评估工具确定所述第一神经网络模型的第一构建单元,所述第一构建单元包括以下至少一个:第一神经网络模型的卷积层、第一神经网络模型的池化层、第一神经网络模型的激活函数、第一神经网络模型的归一化层;A determination unit, configured to determine a first construction unit of the first neural network model by the performance evaluation tool, the first construction unit includes at least one of the following: a convolutional layer of the first neural network model, a first neural network The pooling layer of the model, the activation function of the first neural network model, and the normalization layer of the first neural network model;
    所述处理单元还用于根据所述第一构建单元进行计算,以得到所述第一理论性能指标。The processing unit is further configured to perform calculation according to the first construction unit to obtain the first theoretical performance index.
  16. 根据权利要求11至15中任一项所述的神经网络模型构建装置,其特征在于,所述训练单元还用于训练所述第一神经网络模型得到第五神经网络模型;The neural network model construction device according to any one of claims 11 to 15, wherein the training unit is further configured to train the first neural network model to obtain a fifth neural network model;
    所述处理单元还用于根据所述第一理论性能指标和所述第五神经网络模型的模型性能调整所述第一模型生成器,得到所述第二模型生成器。The processing unit is further configured to adjust the first model generator according to the first theoretical performance index and the model performance of the fifth neural network model to obtain the second model generator.
  17. 根据权利要求11至16中任一项所述的神经网络模型构建装置,其特征在于,所述第一理论性能指标和所述第二理论性能指标分别包括以下至少一种:理论的矢量模块界限、理论的内存界限、理论的立方体模块利用率、理论的高速并行乘法累加器MAC利用率、理论的立方体模块运算次数、理论的矢量模块运算次数。The apparatus for constructing a neural network model according to any one of claims 11 to 16, wherein the first theoretical performance index and the second theoretical performance index respectively comprise at least one of the following: a theoretical vector module limit , Theoretical memory limit, theoretical cube module utilization, theoretical high-speed parallel multiply-accumulator MAC utilization, theoretical cube module operation times, theoretical vector module operation times.
  18. 根据权利要求12至16中任一项所述的神经网络模型构建装置,其特征在于,所述第一实测性能指标和所述第二实测性能指标分别包括以下至少一种:实测的矢量模块界限、实测的内存界限、实测的立方体模块利用率、实测的高速并行乘法累加器MAC利用率、实测的立方体模块运算次数、实测的矢量模块运算次数。The neural network model construction device according to any one of claims 12 to 16, wherein the first measured performance index and the second measured performance index respectively include at least one of the following: the measured vector module limit , the measured memory limit, the measured cube module utilization, the measured high-speed parallel multiply-accumulator MAC utilization, the measured cube module operation times, and the measured vector module operation times.
  19. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有指令,所述指令在计算机上执行时,使得所述计算机执行如权利要求1至9中任一项所述的方法。A computer storage medium, characterized in that, instructions are stored in the computer storage medium, and when executed on a computer, the instructions cause the computer to execute the method according to any one of claims 1 to 9.
PCT/CN2020/105773 2020-07-30 2020-07-30 Neural network model construction method and device therefor WO2022021199A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/105773 WO2022021199A1 (en) 2020-07-30 2020-07-30 Neural network model construction method and device therefor
CN202080104556.9A CN116261729A (en) 2020-07-30 2020-07-30 Neural network model construction method and equipment thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/105773 WO2022021199A1 (en) 2020-07-30 2020-07-30 Neural network model construction method and device therefor

Publications (1)

Publication Number Publication Date
WO2022021199A1 true WO2022021199A1 (en) 2022-02-03

Family

ID=80036925

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/105773 WO2022021199A1 (en) 2020-07-30 2020-07-30 Neural network model construction method and device therefor

Country Status (2)

Country Link
CN (1) CN116261729A (en)
WO (1) WO2022021199A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993300A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of training method and device of neural network model
CN110909877A (en) * 2019-11-29 2020-03-24 百度在线网络技术(北京)有限公司 Neural network model structure searching method and device, electronic equipment and storage medium
WO2020056647A1 (en) * 2018-09-19 2020-03-26 华为技术有限公司 Ai model development method and device
US20200151558A1 (en) * 2018-11-13 2020-05-14 Gyrfalcon Technology Inc. Systems and methods for updating an artificial intelligence model by a subset of parameters in a communication system
WO2020123541A1 (en) * 2018-12-11 2020-06-18 Groq, Inc. Power optimization in an artificial intelligence processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109993300A (en) * 2017-12-29 2019-07-09 华为技术有限公司 A kind of training method and device of neural network model
WO2020056647A1 (en) * 2018-09-19 2020-03-26 华为技术有限公司 Ai model development method and device
US20200151558A1 (en) * 2018-11-13 2020-05-14 Gyrfalcon Technology Inc. Systems and methods for updating an artificial intelligence model by a subset of parameters in a communication system
WO2020123541A1 (en) * 2018-12-11 2020-06-18 Groq, Inc. Power optimization in an artificial intelligence processor
CN110909877A (en) * 2019-11-29 2020-03-24 百度在线网络技术(北京)有限公司 Neural network model structure searching method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN116261729A (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN111406267B (en) Neural architecture search using performance prediction neural networks
WO2022083536A1 (en) Neural network construction method and apparatus
CN112434721A (en) Image classification method, system, storage medium and terminal based on small sample learning
CN110782015A (en) Training method and device for network structure optimizer of neural network and storage medium
US20230082597A1 (en) Neural Network Construction Method and System
WO2019018375A1 (en) Neural architecture search for convolutional neural networks
WO2022063247A1 (en) Neural architecture search method and apparatus
JP2023545423A (en) Point cloud segmentation method, device, equipment and storage medium
CN111460234B (en) Graph query method, device, electronic equipment and computer readable storage medium
CN112948951B (en) Building model creating method and device and processing server
CN112580733B (en) Classification model training method, device, equipment and storage medium
CN112513886A (en) Information processing method, information processing apparatus, and information processing program
CN113505883A (en) Neural network training method and device
CN111428854A (en) Structure searching method and structure searching device
US20230289572A1 (en) Neural network structure determining method and apparatus
CN113191479A (en) Method, system, node and storage medium for joint learning
CN114357105A (en) Pre-training method and model fine-tuning method of geographic pre-training model
CN112528108B (en) Model training system, gradient aggregation method and device in model training
CN110222734B (en) Bayesian network learning method, intelligent device and storage device
CN112446462A (en) Generation method and device of target neural network model
KR20220144281A (en) Method of optimizing neural network model and neural network model processing system performing the same
CN113766633A (en) Data processing method, data processing device, electronic equipment and storage medium
CN115412401B (en) Method and device for training virtual network embedding model and virtual network embedding
WO2022021199A1 (en) Neural network model construction method and device therefor
WO2022127603A1 (en) Model processing method and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20946627

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20946627

Country of ref document: EP

Kind code of ref document: A1