WO2023071793A1 - 神经网络的构建方法和装置 - Google Patents

神经网络的构建方法和装置 Download PDF

Info

Publication number
WO2023071793A1
WO2023071793A1 PCT/CN2022/124843 CN2022124843W WO2023071793A1 WO 2023071793 A1 WO2023071793 A1 WO 2023071793A1 CN 2022124843 W CN2022124843 W CN 2022124843W WO 2023071793 A1 WO2023071793 A1 WO 2023071793A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network
parameter generation
layer
target neural
Prior art date
Application number
PCT/CN2022/124843
Other languages
English (en)
French (fr)
Inventor
吴佳骏
孙乘坚
杨晨阳
王坚
李榕
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023071793A1 publication Critical patent/WO2023071793A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of artificial intelligence, and more specifically, to a method and device for constructing a neural network.
  • neural networks for example, deep neural networks
  • the parameters of the neural network are generally updated and trained through the method of gradient backpropagation, and multiple rounds of gradient backpropagation will result in huge computational and energy consumption overhead.
  • a method of constructing the neural network (target neural network) through the parameter generation network is proposed, that is, the parameters of the target neural network are generated through the parameter generation network.
  • it is to use a network (parameter generation network ) to generate the parameters of another network (target neural network).
  • parameter generation networks can be used to generate the parameters of the target neural network.
  • the number of users participating in resource allocation will vary with Time changes, for example, some users join the wireless communication system and other users leave the wireless communication system.
  • a fixed-scale target neural network is used to solve the problem, it is necessary to train multiple neural networks of different sizes for various possible input dimensions (for example, the number of users), which increases the overhead of calculation and storage.
  • the present application provides a method and device for constructing a neural network, which can take into account the scale of processing tasks of the neural network, so that the target neural network has better generalization.
  • a method for constructing a neural network comprising: generating parameters of a target neural network according to a parameter generation network, the input of the parameter generation network includes information about the relative labels of neurons in the target neural network, and the neural network
  • the relative label of the unit represents the relative position of the neuron in the first neural network layer, and the first neural network layer is the layer where the neuron is located in the target neural network; the target neural network is constructed according to the parameters of the target neural network .
  • the parameter generation network generates the parameters of the target neural network by inputting the information of the relative labels of the neurons in the target neural network, which can make the parameter generation network have better performance for constructing target neural networks of different scales. generalization.
  • N parameter generation networks are obtained; wherein, N is determined by the parameter type M of the target neural network and the number of hidden layers L of the target neural network, M, L is a positive integer.
  • the target neural network includes a second neural network layer and a third neural network layer
  • the second neural network layer includes N1 neurons
  • the third neural network The network layer comprises N 2 neurons, and the relative label of the i neuron in the second neural network layer and the j neuron in the third neural network layer are input into the first parameter generating network, Generate weight parameters connected between the i-th neuron in the second neural network layer and the j-th neuron in the third neural network layer, and the first parameter generation network is used to generate in the N parameter generation network A parameter generation network of weight parameters between the second neural network layer and the third neural network layer, where 1 ⁇ i ⁇ N 1 , 1 ⁇ j ⁇ N 2 .
  • the target neural network includes a second neural network layer
  • the second neural network layer includes N1 neurons
  • the first neural network layer in the second neural network layer The relative labels of the i neurons are input into the second parameter generation network to generate the bias parameters on the i neuron in the second neural network layer
  • the second parameter generation network is used to generate in the N parameter generation network
  • the parameter generation network of the bias parameters of the second neural network layer where 1 ⁇ i ⁇ N 1 .
  • the number L of hidden layers of the target neural network and the number of neurons in each hidden layer of the L hidden layers are determined according to the scale of the target neural network processing task.
  • number, L is a positive integer.
  • the parameter generation network is trained according to the training data of the target neural network processing task.
  • the training data is sorted, and the training data is the training data of the target neural network processing task; the sorted training data is input into the target neural network to obtain the The output of the target neural network; the parameters of the parameter generating network are updated according to the loss function to train the parameter generating network, and the loss function is used to update the parameters of the parameter generating network according to the output of the target neural network and the label of the training data.
  • the number of neurons in each hidden layer of the L hidden layers is re-determined according to the scale of the processing task number; using the parameter generation network after training to update the parameters of the target neural network.
  • a device for constructing a neural network comprising a processing unit configured to generate parameters of a target neural network according to a parameter generating network, and the input of the parameter generating network includes relative labels of neurons in the target neural network Information, the relative label of the neuron indicates the relative position of the neuron in the first neural network layer, the first neural network layer is the layer where the neuron is located in the target neural network; the processing unit is also used to The parameters of the target neural network construct the target neural network.
  • the device further includes an acquisition unit configured to acquire N parameter generation networks; wherein, N consists of the parameter type M of the target neural network and the target neural network The number of hidden layers L is determined, and M and L are positive integers.
  • the target neural network includes a second neural network layer and a third neural network layer
  • the second neural network layer includes N1 neurons
  • the third neural network The network layer includes N2 neurons
  • the processing unit is specifically used to input the relative label of the i neuron in the second neural network layer and the j neuron in the third neural network layer
  • the first parameter generation network generates weight parameters connected between the i neuron in the second neural network layer and the j neuron in the third neural network layer, and the first parameter generation network is the N parameters
  • the target neural network includes a second neural network layer
  • the second neural network layer includes N1 neurons
  • the processing unit is specifically configured to use the second
  • the relative label of the i-th neuron in the neural network layer is input into the second parameter generation network to generate the bias parameter on the i-th neuron in the second neural network layer
  • the second parameter generation network is the N parameters
  • the processing unit is further configured to determine the number of hidden layers L of the target neural network and the number of hidden layers of each of the L hidden layers according to the scale of the target neural network processing task.
  • the number of neurons in the hidden layer, L is a positive integer.
  • the processing unit is further configured to train the parameter generation network according to the training data of the target neural network processing task.
  • the processing unit is specifically configured to sort the training data, which is the training data of the target neural network processing task; input the sorted training data
  • the target neural network obtains the output of the target neural network; the parameters of the parameter generation network are updated according to a loss function to train the parameter generation network, and the loss function is used to update the parameters according to the output of the target neural network and the label of the training data Generate the parameters of the network.
  • the processing unit is further configured to re-determine each of the L hidden layers according to the scale of the processing task if the scale of the target neural network processing task changes.
  • the number of neurons in the hidden layer use the parameter generation network after training to update the parameters of the target neural network.
  • a device for constructing a neural network comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the The processor is configured to execute the method in any one implementation manner in the first aspect.
  • a computer-readable medium stores program code for execution by a device, where the program code includes a method for executing any one of the implementation manners in the first aspect.
  • a computer program product containing instructions is provided, and when the computer program product is run on a computer, it causes the computer to execute the method in any one of the implementation manners in the first aspect above.
  • a chip in a sixth aspect, includes a processor and a data interface, the processor reads the instructions stored on the memory through the data interface, and executes any one of the implementations in the first aspect above. method.
  • the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners in the first aspect.
  • FIG. 1 is a schematic diagram of an artificial intelligence subject framework provided by an embodiment of the present application.
  • Fig. 2 is a schematic structural diagram of a parameter generation network.
  • FIG. 3 is a schematic structural diagram of a system architecture provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for constructing a neural network provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a parameter generation network provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of constructing a target neural network by a parameter generation network provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a training parameter generation network provided by an embodiment of the present application.
  • FIG. 8 is a functional diagram of the bias parameter values of the target neural network of different dimensions and the relative labels of the neurons provided by the embodiment of the present application.
  • FIG. 9 is a schematic flowchart of another neural network construction method provided by the embodiment of the present application.
  • FIG. 10 is a schematic flowchart of another neural network construction method provided by the embodiment of the present application.
  • FIG. 11 is a schematic flowchart of another neural network construction method provided by the embodiment of the present application.
  • Fig. 12 is a schematic block diagram of an apparatus for constructing a neural network provided by an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a hardware structure of a neural network construction device provided by an embodiment of the present application.
  • Figure 1 shows a schematic diagram of an artificial intelligence main framework, which describes the overall workflow of an artificial intelligence system and is applicable to general artificial intelligence field requirements.
  • Intelligent information chain reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensed process of "data-information-knowledge-wisdom".
  • IT value chain reflects the value brought by artificial intelligence to the information technology industry from the underlying infrastructure of artificial intelligence, information (provided and processed by technology) to the systematic industrial ecological process.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • the infrastructure can communicate with the outside through sensors, and the computing power of the infrastructure can be provided by smart chips.
  • the smart chip here can be a central processing unit (central processing unit, CPU), a neural network processor (neural-network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), an application specific integrated circuit (application specific) Integrated circuit, ASIC) and field programmable gate array (field programmable gate array, FPGA) and other hardware acceleration chips.
  • CPU central processing unit
  • NPU neural network processor
  • NPU graphics processing unit
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the basic platform of infrastructure can include related platform guarantees and supports such as distributed computing framework and network, and can include cloud storage and computing, interconnection and interworking network, etc.
  • data can be obtained through sensors and external communication, and then these data can be provided to smart chips in the distributed computing system provided by the basic platform for calculation.
  • Data from the upper layer of the infrastructure is used to represent data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional equipment, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • the above data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other processing methods.
  • machine learning and deep learning can symbolize and formalize intelligent information modeling, extraction, preprocessing, training, etc. of data.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, and using formalized information to carry out machine thinking and solve problems according to reasoning control strategies.
  • the typical functions are search and matching.
  • Decision-making refers to the process of decision-making after intelligent information is reasoned, and usually provides functions such as classification, sorting, and prediction.
  • some general capabilities can be formed based on the results of data processing, such as algorithms or a general system, such as translation, text analysis, computer vision processing, speech recognition, image processing identification, etc.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is the packaging of the overall solution of artificial intelligence, which commercializes intelligent information decision-making and realizes landing applications. Its application fields mainly include: intelligent manufacturing, intelligent transportation, Smart home, smart medical care, smart security, automatic driving, safe city, smart terminals, etc.
  • the embodiments of the present application can be applied in many fields of artificial intelligence, for example, intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical care, intelligent security, automatic driving, safe city and other fields.
  • the embodiments of the present application can be specifically applied in fields that require the use of (deep) neural networks, such as automatic driving, image classification, image retrieval, image semantic segmentation, image quality enhancement, image super-resolution, and natural language processing.
  • deep neural networks such as automatic driving, image classification, image retrieval, image semantic segmentation, image quality enhancement, image super-resolution, and natural language processing.
  • neural network neural network, NN
  • a neural network can be an algorithmic mathematical model composed of neurons.
  • a neuron refers to an operation unit that takes x s and intercept 1 as input.
  • the output of the operation unit can be expressed by formula (1):
  • f represents the activation function (activation functions), which is used to perform nonlinear transformation on the features in the neural network, thereby converting the input signal in the neuron into an output signal.
  • the output signal of the activation function can be used as the input of the next layer of neural units, for example, the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neurons, that is, the output of one neuron can be the input of another neuron.
  • the input of each neuron can be connected with the local receptive field of the previous layer to extract the features of the local receptive field.
  • a local receptive field may be an area composed of several neurons.
  • a deep neural network also known as a multilayer neural network
  • DNN can be understood as a neural network with multiple hidden layers.
  • DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the middle layers are all hidden layers.
  • Layers can be fully connected, that is, any neuron in the i-th layer is connected to any neuron in the i+1-th layer. In simple terms, it is represented by the following linear relationship:
  • DNN is the input vector
  • the output vector is the offset vector
  • W weight matrix also called coefficient
  • is the activation function.
  • the definition of these parameters in DNN is as follows: Taking the coefficient as an example, suppose that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output index 2 of the third layer and the input index 4 of the second layer.
  • the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as
  • the input layer has no W parameter.
  • DNN can fit any function with arbitrary precision.
  • more hidden layers can make the network more capable of describing complex situations in the real world.
  • the more parameters the higher the complexity of the model, and the greater the "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to train the weight matrix of all layers of the deep neural network (the weight matrix formed by the vector W of many layers).
  • a convolutional neural network is a neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can only be connected to some adjacent neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels.
  • Shared weights can be understood as a way to extract image information that is independent of location.
  • the convolution kernel can obtain reasonable weights through learning.
  • the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • the specific way to train the neural network is to use the loss function to evaluate the output of the neural network, and to propagate the error back, and to iteratively optimize W and b through the method of gradient descent until the loss function reaches the minimum value.
  • the neural network can use the gradient back propagation (back propagation, BP) algorithm to modify the size of the parameters in the neural network model during the training process, so that the reconstruction error loss of the neural network model is getting smaller and smaller.
  • BP gradient back propagation
  • the forward transmission of the input signal until the output will generate an error loss
  • the parameters in the neural network model are updated by backpropagating the error loss information, so that the error loss converges.
  • the backpropagation algorithm is a backpropagation movement dominated by error loss, aiming to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the loss value generated by each training of the neural network model is transmitted layer by layer from the back to the front in the neural network model.
  • the update amount of the layer parameters (partial derivative operation) is calculated at the same time, and this update amount is related to the gradient.
  • the optimal allocation strategy p * can be regarded as a function of the environment state h.
  • the state of each object in h changes position, the position of the optimal strategy of each object in p * will also undergo the same change. That is to say, the above-mentioned resource allocation problem in the wireless communication system has a permutation equivariance, and the elements in the permutation environment state The strategy obtained by solving problem (1) The same permutation occurs for the elements in .
  • FIG. 2 it is a schematic structural diagram of a parameter generation network (Hypernetwork).
  • h 1 and h 2 represent the hidden layer nodes of the fully connected network (neural network)
  • W 1 , W 2 and W 3 represent the model parameters of each layer of the fully connected network
  • g 1 and g 2 represent the parameters of the parameter generation network
  • X 2 and X 1 denote the input of the target neural network and parameter generation network.
  • W 1 , W 2 and W 3 are obtained by the parameter generation network according to the relevant information of hidden layers and weights.
  • the input of the parameter generation network is the label information of the model parameters of the fully connected network.
  • the input of the parameter generation network is the embedding vector of the parameters of the CNN model obtained through learning.
  • the above parameter generation network does not consider the change in the scale of the target neural network when generating the parameters of the target neural network, that is, the parameter generation network has no generalization ability for problems of different scales.
  • the parameter generation network has no generalization ability for problems of different scales.
  • the number of users participating in wireless resource allocation will change over time (eg, some users join the wireless communication system, others leave the wireless communication system).
  • a fixed-dimensional neural network is used to solve the resource allocation problem, it is necessary to train multiple neural networks with different dimensions for various possible numbers of users, which increases the overhead of computing and storage.
  • the embodiment of the present application provides a method for constructing a neural network, which can generate a set of parameters of the target neural network for processing tasks of different scales with a relatively low complexity by using a set of parameters to generate the network, Therefore, this parametric generative network has better generalization for generating target neural networks of different scales.
  • the embodiment of the present application provides a system architecture 100 .
  • the system architecture 100 includes an execution device 110 , a training device 120 , a database 130 , a client device 140 , a data storage system 150 , and a data collection device 160 .
  • the execution device 110 includes a computing module 111 , an I/O interface 112 , a preprocessing module 113 and a preprocessing module 114 .
  • the calculation module 111 may include the target model/rule 101, and the preprocessing module 113 and the preprocessing module 114 are optional.
  • the data collection device 160 is used to collect training data.
  • the training data may include training images (for example, the training images include objects) and label data, wherein the label data gives the information of the objects in the training pictures category.
  • the training data may be channel data between the terminal device and the base station, and the labeled data is the optimal power allocation result under the condition of the channel data.
  • the training device 120 obtains the target model/rule 101 based on the training data.
  • the training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value The value is less than a certain threshold, thus completing the training of the target model/rule 101.
  • the target model/rule 101 in the embodiment of the present application may specifically be a neural network model.
  • a neural network model For example, BP neural network.
  • the training data maintained in the database 130 may not all be collected by the data collection device 160, but may also be received from other devices.
  • the training device 120 does not necessarily perform the training of the target model/rules 101 based entirely on the training data maintained by the database 130, and it is also possible to obtain training data from the cloud or other places for model training. Limitations of the Examples.
  • the target model/rule 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. 1, which can be a terminal, such as a mobile phone terminal, a tablet computer, a notebook Computer, augmented reality (augmented reality, AR)/virtual reality (virtual reality, VR), vehicle terminal, etc., can also be a server or cloud, etc.
  • the execution device 110 is configured with an input/output (input/output, I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, the described
  • the input data in this embodiment of the application may include: an image to be processed input by the client device.
  • the client device 140 here may specifically be a terminal device.
  • the execution device 110 When the execution device 110 preprocesses the input data, or in the calculation module 111 of the execution device 110 performs calculation and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the correspondingly processed data and instructions may also be stored in the data storage system 150 .
  • the I/O interface 112 presents the processing result, such as the object detection result calculated by the object model/rule 101 , to the client device 140 , thereby providing it to the user.
  • the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above goals or complete above tasks, thereby providing the desired result to the user.
  • the target model/rule 101 obtained through training according to the training device 120 may be a target neural network and/or a parameter generation network.
  • the user can manually specify the input data, and the manual specification can be operated through the interface provided by the I/O interface 112 .
  • the client device 140 can automatically send the input data to the I/O interface 112 . If the client device 140 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 140 .
  • the user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be specific ways such as display, sound, and action.
  • the client device 140 can also be used as a data collection terminal, collecting the input data input to the I/O interface 112 as shown in the figure and the output results of the output I/O interface 112 as new sample data, and storing them in the database 130 .
  • the client device 140 may not be used for collection, but the I/O interface 112 directly uses the input data input to the I/O interface 112 as shown in the figure and the output result of the output I/O interface 112 as a new sample.
  • the data is stored in database 130 .
  • FIG. 3 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .
  • Fig. 4 is a schematic flowchart of a method for constructing a neural network provided by the present application, and the method includes at least the following steps.
  • the scale of the processing task may represent the complexity of the target neural network processing task.
  • the scale of the processing task can be measured by the input scale of the processing task.
  • the scale of the processing task can be measured by the size of the image.
  • the larger the size of the image the larger the scale of the image processing task. big.
  • the size of the input image is 32*32; for another example, when processing wireless resource allocation tasks, the scale of the processing task can be measured by the number of users participating in wireless resource allocation, the more users participating in wireless resource allocation, it can be expressed The larger the size of this infinite resource allocation problem.
  • the learning ability of the target neural network is limited by its parameter quantity, and the parameter quantity depends on the number of neurons, so when the scale of the target neural network processing task becomes larger, it is necessary to increase the number of neurons in the neural network to Improve the learning ability of the neural network and avoid the problem that the number of neural network parameters is too small to obtain correct results.
  • the target neural network can be written as where h represents the input of the target neural network, Represents the output of the target neural network, W and b represent the weight parameters and bias parameters of the target neural network, respectively.
  • the scale of the target neural network can be represented by the number L of hidden layers of the neural network and the number N l of neurons in each hidden layer, where 1 ⁇ l ⁇ L.
  • the number of hidden layers can be set to a fixed number of layers based on experience.
  • the scale of processing tasks changes, for example, when the number of users participating in wireless resource allocation changes, it can be The change of the scale of the target neural network is realized by changing the number of neurons in each hidden layer of the target neural network.
  • the number of hidden layers of the target neural network is fixed, which is L, and the number of neurons in each hidden layer is determined by the scale of the processing task. That is, the scale of the target neural network is determined by the scale of the processing task.
  • the mapping relationship between the scale of the processing task and the scale of the target neural network may be preconfigured, and the number of neurons in each hidden layer of the target neural network is determined according to the mapping relationship and the scale of the processing task.
  • the number of neurons in the target neural network is proportional to the scale of the problem to be processed.
  • N is a linear proportional relationship
  • P represents the scale of the task to be processed
  • N represents the number of neurons in the target neural network
  • a is a linear proportional parameter
  • a positive integer.
  • Parameters such as a, Q, and b l can be determined in advance through offline attempts, that is, given different combinations of a, Q, and b l parameters, after building the target neural network, offline training, checking its performance, and finally selecting the one with better performance combination of parameters.
  • the parameter generation network is used to generate parameters (eg, weights and biases) of the target neural network. If the number of hidden layers of the target neural network is L, generate a network with m(L+1) parameters. Among them, m represents the type of hidden layer parameters of the neural network. Usually, the value of m is 2 (two parameters of weight and bias), that is, 2 (L+1) parameter generation networks are generated.
  • the 2(L+1) parameter generation network is used to generate weights and biases connected between each layer of the neural network, wherein, the L+1 parameter generation network can be used to generate weights (weight parameter generation network), and L +1 parameter generator network for bias generation (bias parameter generator network).
  • the lth weight parameter generation network in the L+1 weight parameter generation network can be written as Among them, the value range of l is [1, L+1]; Represents the parameters of the lth weight parameter generation network; is the input that represents the weight parameter generator network.
  • the lth weight parameter generation network is used to generate the weight parameters connected between the l-1th layer (an example of the second neural network layer) and the lth layer (an example of the third neural network layer) of the target neural network, when When the value of l is 1, the l-1th layer can be understood as the input layer of the target neural network, and when the value of l is L+1, the lth layer can be understood as the output layer of the target neural network.
  • Respectively represent the relative label of the i-th neuron in the l-1 layer in the N l-1 neurons in the l-1 layer, and the relative label of the j-th neuron in the l layer in the N l neurons in the l layer label, can be defined as N l-1 and N l respectively represent the number of neurons in the l-1 layer of the target neural network and the number of neurons in the l layer, the value range of i is [1, N l-1 ], and the value of j The range is [1, Nl ].
  • the lth bias parameter generating network can be written as The lth bias parameter generation network is used to generate the bias parameter on the jth neuron of the lth layer of the target neural network. in, Indicates the parameters of the lth bias parameter generation network; Indicates the input of the lth bias parameter generation network, can be defined as It can represent the relative label of the jth neuron in the lth layer of the target neural network among the Nl neurons in the lth layer, and the value range of j is [1, Nl ].
  • the input dimension of the weight parameter generation network is 2, and the output dimension is 1, that is, the relative label information of the i-th neuron of the l-1th layer and the j-th neuron of the l-th layer are input , output the weight parameters of the connection between layer l-1 and layer l;
  • the input dimension of the bias parameter generation network is 1, and the output dimension is 1, that is, the jth neuron of the input layer l is the neuron of the l layer
  • the relative label information in outputs the bias parameter on the jth neuron of the lth layer.
  • This application does not limit the structure of the parameter generation network (number of hidden layers, connection method, etc.).
  • the method may further include S421, initializing the parameter generation network.
  • Initializing the parameter generation network is the middle parameter of the initialization parameter generation network and For example, each parameter may be initialized randomly, and the present application does not limit the specific manner of initializing the parameter generation network.
  • the acquired parameter generation network is shown in (b) in FIG. 5 .
  • in respectively represent the weight and bias parameters of the target neural network; from the calculation of the number of the above parameter generation network, the number of the parameter generation network is 4, and the parameter generation network can adopt a fully connected network structure containing 1 hidden layer. parameter and Random initialization can be used.
  • Obtaining the target neural network is to generate the parameters of the connections between the layers of the target neural network. It can be seen from the above steps that the number of hidden layers of the target neural network is L, and the number N of neurons in each hidden layer is determined by the scale of the processing task. Therefore, the parameters for generating the connections between the layers of the target neural network are Get the target neural network. The parameters of the target neural network can be determined by the above-mentioned parameter generation network.
  • each parameter generation network can obtain a parameter (weight or bias) of the target neural network for each inference, and use the parameter generation network multiple times to obtain all parameters of the target neural network.
  • FIG. 6 is a schematic diagram of using the parameter generation network to obtain the target neural network.
  • the training data can also be used to train the parameter generation network, so as to obtain the parameter generation network (the model of the parameter generation network) that can generate the target neural network model that performs a specific task, for example, it can be used to generate images.
  • the parameter generation network the model of the parameter generation network
  • a parameter generation network of an image classification model for classification and for example, a parameter generation network of a neural network model that can be used for wireless resource allocation, and the like.
  • the parameter generation network can be trained using the training data of the target neural network processing task.
  • the training data can be images.
  • the image processing task includes image classification, image detection, image recognition, etc.
  • the label corresponding to the training data may be the category corresponding to the image, etc.
  • the training data can be the channel data between the terminal and the base station, and the label corresponding to the training data can be the optimal power allocation result under the condition of the channel data .
  • the embodiment of the present application does not limit the type of training data.
  • the weight parameter generating network can be written as: in, Indicates the parameters of the parameter generation network used to generate the weight parameters between layer l-1 and layer l of the target neural network.
  • the value of l is 1, Indicates the parameters of the weight parameters between the input layer and the first hidden layer of the target neural network to generate the parameters of the network.
  • the value of l is L+1, it represents the L-th hidden layer and output of the target neural network.
  • the parameters of the weight parameters between the layers generate the parameters of the network; similarly, the bias parameters of the generated network can be written as: in, Indicates the parameters that generate the bias parameters of the l-th layer of the target neural network.
  • the parameters ⁇ W and ⁇ b of the parameter generation network can be trained by means of supervised or unsupervised learning.
  • L( ) denotes the loss function for supervised learning
  • p * (h) denotes the label of the training data
  • J( ) denotes the loss function for unsupervised learning.
  • the schematic diagram of gradient return during the training process is shown in Figure 7.
  • the input of the parameter generation network is first input or enter Generate the parameters W and b of the target neural network; then input the training data h into the target neural network, and infer the output of the target neural network
  • the output result is a result of the task of the target neural network; obtained by comparative reasoning and the optimal p * (that is, the label corresponding to the training data), the loss function is used to calculate the gradient of the parameters of the network for all parameters through backpropagation, where, Indicates the loss function L( ⁇ ) pair
  • the gradient obtained by derivation Respectively The gradient obtained by deriving the parameters W and b of the target neural network respectively, W represents the gradient obtained by W deriving the parameter ⁇ W of the parameter generation network, Respectively represent the gradient obtained by deriving b to ⁇ b ; use an optimization algorithm, such as the gradient descent method, to update the parameters of the parameter generation
  • a convergence judgment can also be performed.
  • the criterion for judging convergence can be whether the maximum number of training rounds has been reached, or whether the target neural network has reached the preset performance.
  • the preset performance can be judged by the neural network evaluation index, for example, judging the reasoning accuracy of the target neural network and the like. If the convergence judgment is yes, then end the training; otherwise, return to S430, re-initialize the target neural network with the updated parameter generation network, and perform the next step of training and convergence judgment. Repeat the above process to complete the training of the parameter generation network.
  • the parameters (weights and biases) of the target neural network are generated by the parameter generation network, so no training is required.
  • the process of generating the network with the above training parameters is the process of generating the parameters of the network through the training data training parameters, that is, the above and Where 0 ⁇ l ⁇ L+1.
  • the training data may also be sorted. Sorting the training data can be understood as sorting the data of each sample in the training data, for example, sorting the values of the data in descending order. Sorting the training data reduces the number of samples required to train the parameter generating network.
  • sorting the training data can make the two In a target neural network, the connection parameters between neurons with similar relative labels in two adjacent layers are similar.
  • the connection between the i-th A neuron of the l-1th layer of the target neural network A and the j-th A neuron of the l-th layer weighted as The bias on the jth A neuron in layer l is The number of neurons in layer l-1 and layer l are N l-1,A , N l,A respectively; the i B neuron in layer l-1 of the target neural network B and the j B neuron in layer l
  • the weight of the connection between neurons is
  • the bias on the jth B neuron in layer l is The number of neurons in layer l-1 and layer l are N l-1,B and N l,B respectively; if i A /(N l-1,A +1) ⁇ i B /(N l-1, B +1), that is, the relative label of the i- th neuron in the l-1th layer of the target neural
  • the distribution of different environmental states is the same and independent of each other. elements have approximately equal means.
  • the input dimension of the target neural network in scenario #1 is 10 (for example, there are 10 users in this scenario), and the input dimension of the target neural network in scenario #2 is 20.
  • 10 users may also have a variety of different deployment locations, and 1000 deployment sub-scenarios are randomly generated by means of scattered points; similarly, in Scenario #2, 1000 seed scenarios are also randomly generated.
  • the environmental state of the second user (assuming that its environmental state is represented by 1 element) has a total of 1000 values (corresponding to 1000 sub-scenes), then take the average of these 1000 values, The mean value of the environment state of user 2 is obtained, which is approximately equal to the mean value of the state of the fourth user in 1000 sub-scenes of 20 users.
  • Fig. 8 is a function diagram of the bias parameter value of the target neural network and the relative label of the neuron under different scales. It can be seen that if the relative label of the neuron is used as the input of the function (parametric neural network), the bias parameter of the target neural network value as the output of the function, the parameter values of the target neural network of different dimensions can be obtained through interpolation. That is, the parameter generation network in this application has good generalization for generating target neural networks of different dimensions.
  • a set of parameter generation network models can be obtained, which can be deployed in actual application scenarios (for example, wireless communication systems) for reasoning and obtaining The desired target neural network model.
  • the reasoning process of the parameter generation network model is introduced below in conjunction with FIG. 9 .
  • the scale of the target neural network can be represented by the number N l of neurons in each hidden layer, where 0 ⁇ l ⁇ L+1. That is, according to the scale of processing tasks, determine the number of neurons in each hidden layer of the target neural network.
  • the above-mentioned trained parameter generation network can be used to generate parameters (including weights and biases) of the target neural network, so that the target neural network can be determined.
  • the elements in the inference data can also be sorted, and then input into the target neural network to obtain the inference result.
  • the parameter generation network when the scale of the processing task changes, can generate target neural networks of different scales with relatively low complexity, that is, the parameter generation network has generalization.
  • the solution to the wireless downlink power control problem of a single cell is taken as an example to illustrate in the following with reference to FIGS. 10 and 11 .
  • the method at least includes the following steps.
  • the total number of users K may represent the number of users participating in wireless downlink power control in the cell.
  • the target neural network is used to solve the problem of wireless downlink power control, that is, the target neural network can input downlink channel state information, and output the optimal transmission power when the base station sends data to the K users.
  • the target neural network may be a fully connected neural network, and its parameter types are 2, namely weight parameters and bias parameters.
  • the parameter generator network is used to generate the weights and biases of the target neural network.
  • L+1 parameter generation networks can be used to generate weights (weight parameter generation networks), and L+1 parameter generation networks are used to generate biases (bias parameter generation networks).
  • bias parameter generation networks For a specific introduction of the parameter generation network, refer to S420.
  • Initial values of weight parameters and bias parameters in the target neural network can be generated by the 2(L+1) parameter generation network, and subsequently, the initial parameters can be updated by the trained parameter generation network.
  • the process of training the parameter generation network is similar to that in S440 and will not be repeated here.
  • 2(L+1) parameter generation networks are obtained.
  • the parameter generation networks can be deployed on the base station side.
  • the parameter generation networks can be used to generate target neural networks of different scales, that is, the inference process of the parameter generation networks.
  • the reasoning process of the parameter generating network is shown in FIG. 11 , and the reasoning process includes at least the following steps.
  • This step is similar to that in S410 and will not be repeated here.
  • the downlink channel state information can be obtained by the user through channel estimation and fed back to the base station.
  • S1150 perform downlink communication according to the power control result.
  • the apparatus for constructing the neural network in the embodiment of the present application will be introduced below with reference to FIG. 12 .
  • the neural network construction device shown in FIG. 12 can be used to execute each step of the neural network construction method of the embodiment of the present application, and the neural network construction device can be a computer, server, or other device with sufficient computing power to construct a neural network.
  • Fig. 12 is a schematic block diagram of an apparatus for constructing a neural network according to an embodiment of the present application.
  • the apparatus 1200 shown in FIG. 12 includes a processing unit 1210 , and optionally, the apparatus 1200 may further include an acquiring unit 1220 .
  • the apparatus 2000 may be used to execute the steps of the neural network construction method of the embodiment of the present application.
  • the processing unit 1210 may be used to execute steps S410 to S440 in the method shown in FIG. 4 , or to execute steps S910 to S940 in the method shown in FIG. 9 ; or to execute steps S1010 to S1050 in the method shown in FIG. 10 ; Or execute S1110, S1120 and S1140 in the method shown in FIG. 11 .
  • the device 2000 can also be used to obtain a target neural network with a specific function by using a trained parameter generation network, or it can be understood as using a trained parameter generation network to generate a model of a target neural network capable of performing a specific task, for example
  • the target neural network model for processing the wireless downlink power control task may also be a model for other specific tasks.
  • the acquiring unit can be used to acquire training data or reasoning data, use the training data to train the parameter generating network, and obtain a model of the parameter generating network; use the reasoning data and the model of the target neural network to obtain the result of the processing task.
  • the acquiring unit 1220 may be configured to execute step S1130 in the method shown in FIG. 11 .
  • the device 1200 can also obtain the trained parameter generation network model through the obtaining unit 1220.
  • the obtaining unit 1220 can be equivalent to the communication interface 1330 in the device 1300 shown in FIG. 13, through which the trained parameter generation network can be obtained.
  • network, the training of the parameter generation network can be completed through offline training; or, the acquisition unit 1220 can also be equivalent to the processor 1320 in the device 1300 shown in Figure 13, and at this time, the trained data can be obtained from the memory 1310 through the processor 1320.
  • the parameter generation network can be equivalent to the communication interface 1330 in the device 1300 shown in FIG. 13, through which the trained parameter generation network can be obtained.
  • the training of the parameter generation network can be completed through offline training; or, the acquisition unit 1220 can also be equivalent to the processor 1320 in the device 1300 shown in Figure 13, and at this time, the trained data can be obtained from the memory 1310 through the processor 1320.
  • processing unit 1210 in the apparatus 1200 shown in FIG. 12 may be equivalent to the processor 1320 in the apparatus 1300 shown in FIG. 13 .
  • apparatus 1200 is embodied in the form of functional units.
  • unit here may be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit or a combination of both to realize the above functions.
  • the hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
  • ASICs application specific integrated circuits
  • processors such as shared processors, dedicated processors, or group processors for executing one or more software or firmware programs. etc.
  • memory incorporating logic, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • FIG. 13 is a schematic diagram of the hardware structure of the neural network construction device of the embodiment of the present application.
  • the device 1300 shown in FIG. 13 includes a memory 1310 , a processor 13020 , a communication interface 1330 and a bus 1340 .
  • the memory 1310 , the processor 1320 , and the communication interface 1330 are connected to each other through the bus 1340 .
  • the memory 1310 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM).
  • the memory 1310 may store a program. When the program stored in the memory 1310 is executed by the processor 1320, the processor 1320 and the communication interface 1330 are used to execute each step of the neural network construction method of the embodiment of the present application.
  • Processor 1320 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more
  • the integrated circuit is used to realize the functions required by the units in the neural network construction device of the embodiment of the present application, or to perform various steps of the neural network construction method of the embodiment of the present application.
  • the processor 1320 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the method for constructing a neural network in the embodiment of the present application may be completed by an integrated logic circuit of hardware in the processor 1320 or instructions in the form of software.
  • the above-mentioned processor 1320 may also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • DSP digital signal processing
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method for constructing the neural network provided in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 1310, and the processor 1320 reads the information in the memory 1310, and combines its hardware to complete the functions required by the units included in the neural network construction device of the embodiment of the application, or execute the neural network of the embodiment of the application. The various steps of the construction method of the network.
  • the communication interface 1330 implements communication between the apparatus 1300 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver.
  • a transceiver device such as but not limited to a transceiver.
  • the control parameters corresponding to the inference results may be sent through the communication interface 1330 .
  • Bus 1340 may include a pathway for communicating information between various components of device 1300 (eg, memory 1310, processor 1320, communication interface 1330).
  • apparatus 1300 only shows memory, processor, and communication interface
  • apparatus 1300 may also include other devices necessary for normal operation during specific implementation.
  • apparatus 1300 may also include hardware devices for implementing other additional functions.
  • device 1300 may also only include the devices necessary to realize the embodiment of the present application, and does not necessarily include all the devices shown in FIG. 13 .
  • This application does not limit the specific structure of the execution subject of the method provided by the embodiment of this application, as long as it can communicate according to the method provided by the embodiment of this application by running the program that records the code of the method provided by the embodiment of this application.
  • the subject of execution of the method provided by the embodiment of the present application may be a network device, or a functional module in the network device capable of invoking a program and executing the program.
  • Computer-readable media may include, but are not limited to, magnetic storage devices (such as hard disks, floppy disks, or tapes, etc.), optical disks (such as compact discs (compact disc, CD), digital versatile discs (digital versatile disc, DVD), etc. ), smart cards and flash memory devices (for example, erasable programmable read-only memory (EPROM), card, stick or key drive, etc.).
  • magnetic storage devices such as hard disks, floppy disks, or tapes, etc.
  • optical disks such as compact discs (compact disc, CD), digital versatile discs (digital versatile disc, DVD), etc.
  • smart cards and flash memory devices for example, erasable programmable read-only memory (EPROM), card, stick or key drive, etc.
  • Various storage media described herein can represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing and/or carrying instructions and/or data.
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components
  • the memory storage module may be integrated in the processor.
  • memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
  • the disclosed devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the essence of the technical solution of this application, or the part that contributes to the prior art, or the part of the technical solution can be embodied in the form of computer software products, which are stored in a storage
  • the computer software product includes several instructions, which are used to make a computer device (which may be a personal computer, server, or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium may include, but is not limited to: various media capable of storing program codes such as U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请提供了一种神经网络的构建方法,该方法包括根据参数生成网络生成目标神经网络的参数,该参数生成网络的输入包括该目标神经网络中神经元的相对标号的信息,该神经元的相对标号表示该神经元在第一神经网络层中的相对位置,该第一神经网络层为该目标神经网络中该神经元所在的层;根据该目标神经网络的参数构建所述目标神经网络。通过输入目标神经网络中神经元的相对标号的信息生成目标神经网络的参数,可以使得该参数生成网络对于构建不同维度的目标神经网络具有更好的泛化性。

Description

神经网络的构建方法和装置
本申请要求于2021年10月29日提交中国专利局、申请号为202111271593.1、申请名称为“神经网络的构建方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,并且更具体地,涉及一种神经网络的构建方法和装置。
背景技术
随着人工智能(artificial intelligence,AI)技术的快速发展,神经网络(例如,深度神经网络)近年来在图像、视频以及语音等多种媒体信号的处理与分析中取得了很大的成就。神经网络的参数一般通过梯度反向传播的方法进行更新训练,多个回合的梯度反向传播会导致巨大的计算和能耗开销。为了更好地构建神经网络,提出了通过参数生成网络来构建神经网络(目标神经网络)的方法,即通过参数生成网络生成目标神经网络的参数,简单来说,就是用一个网络(参数生成网络)来生成另外一个网络(目标神经网络)的参数。
在使用目标神经网络解决诸如无线通信系统中的资源分配问题时,可以使用参数生成网络来生成目标神经网络的参数,但是,由于无线通信系统的动态特性,参与资源分配的用户的数量会随着时间发生变化,例如,某些用户加入无线通信系统、另外一些用户离开无线通信系统。此时,如果采用固定规模的目标神经网络来解决问题,则需要针对各种可能的输入维度(例如,用户数),训练多个规模不同的神经网络,增加了计算、存储的开销。
因此,如何在使用参数生成网络生成目标神经网络的参数时,兼顾神经网络处理任务的规模是一个亟待解决的问题。
发明内容
本申请提供一种神经网络的构建方法和装置,能够兼顾神经网络处理任务的规模,使得目标神经网络具有更好的泛化性。
第一方面,提供一种神经网络的构建方法,该方法包括,根据参数生成网络生成目标神经网络的参数,该参数生成网络的输入包括该目标神经网络中神经元的相对标号的信息,该神经元的相对标号表示该神经元在第一神经网络层中的相对位置,该第一神经网络层为该目标神经网络中该神经元所在的层;根据该目标神经网络的参数构建该目标神经网络。
根据本申请实施例提供的方法,参数生成网络通过输入目标神经网络中神经元的相对标号的信息生成目标神经网络的参数,可以使得该参数生成网络对于构建不同规模的目标 神经网络具有更好的泛化性。
结合第一方面,在第一方面的某些实现方式中,获取N个该参数生成网络;其中,N由该目标神经网络的参数种类M与该目标神经网络的隐藏层数目L确定,M、L为正整数。
结合第一方面,在第一方面的某些实现方式中,该目标神经网络包括第二神经网络层和第三神经网络层,该第二神经网络层包括N 1个神经元,该第三神经网络层包括N 2个神经元,将该第二神经网络层中的第i个神经元的相对标号以及该第三神经网络层中的第j个神经元的相对标号输入第一参数生成网络,生成该第二神经网络层中第i个神经元和该第三神经网络层中第j个神经元之间连接的权重参数,该第一参数生成网络为该N个参数生成网络中用于生成该第二神经网络层和该第三神经网络层之间的权重参数的参数生成网络,其中,1≤i≤N 1,1≤j≤N 2
结合第一方面,在第一方面的某些实现方式中,该目标神经网络包括第二神经网络层,该第二神经网络层包括N 1个神经元,将该第二神经网络层中的第i个神经元的相对标号输入第二参数生成网络,生成该第二神经网络层中第i个神经元上的偏置参数,该第二参数生成网络为该N个参数生成网络中用于生成该第二神经网络层的偏置参数的参数生成网络,其中,1≤i≤N 1
结合第一方面,在第一方面的某些实现方式中,根据该目标神经网络处理任务的规模确定该目标神经网络的隐藏层数目L和该L个隐藏层中每个隐藏层中神经元的数目,L为正整数。
结合第一方面,在第一方面的某些实现方式中,根据该目标神经网络处理任务的训练数据训练该参数生成网络。
结合第一方面,在第一方面的某些实现方式中,将训练数据进行排序,该训练数据为该目标神经网络处理任务的训练数据;将排序后的该训练数据输入该目标神经网络得到该目标神经网络的输出;根据损失函数更新该参数生成网络的参数以训练该参数生成网络,该损失函数用于根据该目标神经网络的输出以及该训练数据的标签更新该参数生成网络的参数。
结合第一方面,在第一方面的某些实现方式中,若该目标神经网络处理任务的规模发生变化,根据该处理任务的规模重新确定该L个隐藏层中每个隐藏层中神经元的数目;利用训练后的该参数生成网络更新该目标神经网络的参数。
第二方面,提供一种神经网络的构建装置,该装置包括处理单元,用于根据参数生成网络生成目标神经网络的参数,该参数生成网络的输入包括该目标神经网络中神经元的相对标号的信息,该神经元的相对标号表示该神经元在第一神经网络层中的相对位置,该第一神经网络层为该目标神经网络中该神经元所在的层;该处理单元还用于根据该目标神经网络的参数构建该目标神经网络。
结合第二方面,在第二方面的某些实现方式中,该装置还包括获取单元,用于获取N个该参数生成网络;其中,N由该目标神经网络的参数种类M与该目标神经网络的隐藏层数目L确定,M、L为正整数。
结合第二方面,在第二方面的某些实现方式中,该目标神经网络包括第二神经网络层和第三神经网络层,该第二神经网络层包括N 1个神经元,该第三神经网络层包括N 2个神 经元,该处理单元具体用于将该第二神经网络层中的第i个神经元的相对标号以及该第三神经网络层中的第j个神经元的相对标号输入第一参数生成网络,生成该第二神经网络层中第i个神经元和该第三神经网络层中第j个神经元之间连接的权重参数,该第一参数生成网络为该N个参数生成网络中用于生成该第二神经网络层和该第三神经网络层之间的权重参数的参数生成网络,其中,1≤i≤N 1,1≤j≤N 2
结合第二方面,在第二方面的某些实现方式中,该目标神经网络包括第二神经网络层,该第二神经网络层包括N 1个神经元,该处理单元具体用于将该第二神经网络层中的第i个神经元的相对标号输入第二参数生成网络,生成该第二神经网络层中第i个神经元上的偏置参数,该第二参数生成网络为该N个参数生成网络中用于生成该第二神经网络层的偏置参数的参数生成网络,其中,1≤i≤N 1
结合第二方面,在第二方面的某些实现方式中,该处理单元还用于根据该目标神经网络处理任务的规模确定该目标神经网络的隐藏层数目L和该L个隐藏层中每个隐藏层中神经元的数目,L为正整数。
结合第二方面,在第二方面的某些实现方式中,该处理单元还用于根据该目标神经网络处理任务的训练数据训练该参数生成网络。
结合第二方面,在第二方面的某些实现方式中,该处理单元具体用于将训练数据进行排序,该训练数据为该目标神经网络处理任务的训练数据;将排序后的该训练数据输入该目标神经网络得到该目标神经网络的输出;根据损失函数更新该参数生成网络的参数以训练该参数生成网络,该损失函数用于根据该目标神经网络的输出以及该训练数据的标签更新该参数生成网络的参数。
结合第二方面,在第二方面的某些实现方式中,该处理单元还用于若该目标神经网络处理任务的规模发生变化,根据该处理任务的规模重新确定该L个隐藏层中每个隐藏层中神经元的数目;利用训练后的该参数生成网络更新该目标神经网络的参数。
第三方面,提供了一种神经网络的构建装置,该装置包括:存储器,用于存储程序;处理器,用于执行所述存储器存储的程序,当所述存储器存储的程序被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。
第四方面,提供一种计算机可读介质,该计算机可读介质存储用于设备执行的程序代码,该程序代码包括用于执行第一方面中的任意一种实现方式中的方法。
第五方面,提供一种包含指令的计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述第一方面中的任意一种实现方式中的方法。
第六方面,提供一种芯片,所述芯片包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令,执行上述第一方面中的任意一种实现方式中的方法。
可选地,作为一种实现方式,所述芯片还可以包括存储器,所述存储器中存储有指令,所述处理器用于执行所述存储器上存储的指令,当所述指令被执行时,所述处理器用于执行第一方面中的任意一种实现方式中的方法。
附图说明
图1为本申请实施例提供的一种人工智能主体框架示意图。
图2为一种参数生成网络的结构示意图。
图3为本申请实施例提供的一种系统架构的结构示意图。
图4为本申请实施例提供的一种神经网络的构建方法的示意性流程图。
图5为本申请实施例提供的一种参数生成网络的结构示意图。
图6为本申请实施例提供的一种参数生成网络构建目标神经网络的示意图。
图7为本申请实施例提供的一种训练参数生成网络的示意图。
图8为本申请实施例提供的不同维度的目标神经网络偏置参数值与神经元相对标号的函数关系图。
图9为本申请实施例提供的另一种神经网络的构建方法的示意性流程图。
图10为本申请实施例提供的另一种神经网络的构建方法的示意性流程图。
图11为本申请实施例提供的另一种神经网络的构建方法的示意性流程图。
图12为本申请实施例提供的一种神经网络的构建装置的示意性框图。
图13为本申请实施例提供的一种神经网络的构建装置的硬件结构示意图。
具体实施方式
下下面将结合附图,对本申请中的技术方案进行描述。
图1示出一种人工智能主体框架示意图,该主体框架描述了人工智能系统总体工作流程,适用于通用的人工智能领域需求。
下面从“智能信息链”(水平轴)和“信息技术(information technology,IT)价值链”(垂直轴)两个维度对上述人工智能主题框架进行详细的阐述。
“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。
“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施:
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。
基础设施可以通过传感器与外部沟通,基础设施的计算能力可以由智能芯片提供。
这里的智能芯片可以是中央处理器(central processing unit,CPU)、神经网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、专门应用的集成电路(application specific integrated circuit,ASIC)以及现场可编程门阵列(field programmable gate array,FPGA)等硬件加速芯片。
基础设施的基础平台可以包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。
例如,对于基础设施来说,可以通过传感器和外部沟通获取数据,然后将这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据:
基础设施的上一层的数据用于表示人工智能领域的数据来源。该数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位 移、液位、温度、湿度等感知数据。
(3)数据处理:
上述数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等处理方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力:
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用:
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市,智能终端等。
本申请实施例可以应用在人工智能中的很多领域,例如,智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶,平安城市等领域。
具体地,本申请实施例可以具体应用在自动驾驶、图像分类、图像检索、图像语义分割、图像质量增强、图像超分辨率和自然语言处理等需要使用(深度)神经网络的领域。
由于本申请实施例涉及神经网络,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络(neural network,NN)
神经网络可以是由神经元组成的算法数学模型,神经元指以x s和截距1为输入的运算单元,该运算单元的输出可以用公式(1)表示:
Figure PCTCN2022124843-appb-000001
其中,s表示输入x s的维度,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b表示偏置。f表示激活函数(activation functions),该激活函数用于对神经网络中的特征进行非线性变换,从而将神经元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层神经单元的输入,例如,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经元联结在一起形成的网络,即一个神经元的输出可以是另一个神经元的输入。每个神经元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征。局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络(deep neural network,DNN)
深度神经网络,也称多层神经网络,可以理解为具有多层隐藏层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐藏层和输出层。一般来说,第一层是输入层,最后一层是输出层,中间的层都是隐藏层。层与层之间可以是全连接的,也就是说,第i层的任意一个神经元与第i+1层的任意一个神经元相 连。简单来说,就是用如下线性关系表示:
Figure PCTCN2022124843-appb-000002
其中,
Figure PCTCN2022124843-appb-000003
是输入向量,
Figure PCTCN2022124843-appb-000004
是输出向量,
Figure PCTCN2022124843-appb-000005
是偏移向量,W权重矩阵(也称系数),α是激活函数。隐藏层的每一层输入向量
Figure PCTCN2022124843-appb-000006
经过该操作得到输出向量
Figure PCTCN2022124843-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2022124843-appb-000008
的数量也比较多。这些参数在DNN中的定义如下:以系数为例,假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2022124843-appb-000009
上标3代表系数W所在的层数,下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2022124843-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,由万能近似定理可知,只要给定足够多的隐藏层,DNN可以以任意精度拟合任意函数。也就是说,更多的隐藏层可以让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多,模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络也就是学习权重矩阵的过程,其最终目的是训练得到深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。在卷积神经网络的训练过程中,卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)损失函数(loss function)
训练神经网络的具体方式为采用损失函数对神经网络的输出结果进行评价,并将误差反向传播,通过梯度下降的方法即能迭代优化W和b直到损失函数达到最小值。在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数或目标函数(objective function)的作用,它们用于衡量预测值和目标值的差异。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(5)反向传播算法
神经网络可以采用梯度反向传播(back propagation,BP)算法在训练过程中修正的神 经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
例如,神经网络模型每次训练产生的loss值在神经网络模型中从后向前逐层传递。传递到每一层时,同时计算出该层参数的更新量(偏导运算),这个更新量与梯度(gradient)相关。
下面对本申请中可能涉及的术语进行介绍。
(1)置换等变性(permutation equivalence,PE)
给定输入X=[x 1,x 2,...,x k],如果方程Y=f(X)=[y 1,y 2,...,y k]对于任意的置换π(x)=p k
Figure PCTCN2022124843-appb-000011
都满足
Figure PCTCN2022124843-appb-000012
则方程f对于X是置换等变的。换而言之,置换等变性使得函数的输出顺序与输入顺序对应。
例如,对于无线通信系统中的资源分配问题,如公式(3)中所示:
Figure PCTCN2022124843-appb-000013
其中,
Figure PCTCN2022124843-appb-000014
表示资源分配策略,
Figure PCTCN2022124843-appb-000015
表示环境状态,p k和h k分别表示对象k的策略和状态,C(·)表示约束。最优的分配策略p *可以看做环境状态h的函数,当h中各对象的状态发生位置变换时,p *中各对象的最优策略的位置也会发生相同的变换。也就是说,上述无线通信系统中的资源分配问题具有置换等变性,置换环境状态中的元素
Figure PCTCN2022124843-appb-000016
求解问题(1)得到的策略
Figure PCTCN2022124843-appb-000017
中的元素也会发生相同的置换。
(2)参数生成网络
传统的方法通常通过训练数据直接训练神经网络(目标神经网络)的参数,而参数生成网络可以输出目标模型的参数。简单来说,就是用一个网络(参数生成网络)来生成另外一个网络(目标神经网络)的参数。
如图2所示是一个参数生成网络(Hypernetwork)的结构示意图。其中,h 1和h 2表示全连接网络(神经网络)的隐藏层节点,W 1、W 2和W 3表示全连接网络各层模型参数,g 1和g 2表示参数生成网络的参数,X 2和X 1表示目标神经网络和参数生成网络的输入。其中,W 1、W 2和W 3由参数生成网络根据隐藏层和权重的相关信息得到。学习全连接网络时,参数生成网络的输入为全连接网络模型参数的标号信息,学习CNN时,参数生成网络的输入为通过学习得到的关于CNN模型参数的嵌入向量。
上述参数生成网络生成目标神经网络的参数时没有考虑目标神经网络规模的变化,即该参数生成网络没有对不同规模问题的泛化能力。而在一些神经网络的应用场景中,例如,在使用神经网络解决诸如无线通信系统中的资源分配问题时,由于无线通信系统的动态特性,参与无线资源分配的用户的数量会随着时间发生变化(例如,某些用户加入无线通信系统、另外一些用户离开无线通信系统)。此时,如果采用固定维度的神经网络解决资源分配问题,则需要针对各种可能的用户数,训练多个维度不同的神经网络,增加了计算、存储的开销。
有鉴于此,本申请实施例提供一种神经网络的构建方法,该方法可以通过使用一组参数生成网络,以较低的复杂度生成一组用于处理不同规模任务的目标神经网络的参数,因 此,该参数生成网络对于生成不同规模的目标神经网络具有更好的泛化性。
在介绍本申请实施例提供的神经网络的构建方法之前,首先介绍适用于本申请实施例的系统架构。如图3所示,本申请实施例提供了一种系统架构100。该系统架构100包括执行设备110、训练设备120、数据库130、客户设备140、数据存储系统150、以及数据采集设备160。
另外,执行设备110包括计算模块111、I/O接口112、预处理模块113和预处理模块114。其中,计算模块111中可以包括目标模型/规则101,预处理模块113和预处理模块114是可选的。
数据采集设备160用于采集训练数据。针对本申请实施例的生成神经网络参数的方法来说,训练数据可以包括训练图像(例如,该训练图像中包括物体)以及标注数据,其中,标注数据中给出了训练图片中的存在物体的类别。再例如,在无线通信系统的功率控制任务中,训练数据可以是终端设备和基站之间的信道数据,而标注数据就是该信道数据条件下的最优功率分配结果。在采集到训练数据之后,数据采集设备160将这些训练数据存入数据库130,训练设备120基于数据库130中维护的训练数据训练得到目标模型/规则101。
下面对训练设备120基于训练数据得到目标模型/规则101进行描述,训练设备120对输入的原始数据进行处理,将输出值与目标值进行对比,直到训练设备120输出的值与目标值的差值小于一定的阈值,从而完成目标模型/规则101的训练。
本申请实施例中的目标模型/规则101具体可以为神经网络模型。例如,BP神经网络。需要说明的是,在实际的应用中,该数据库130中维护的训练数据不一定都来自于数据采集设备160的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备120也不一定完全基于数据库130维护的训练数据进行目标模型/规则101的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。
根据训练设备120训练得到的目标模型/规则101可以应用于不同的系统或设备中,如应用于图1所示的执行设备110,该执行设备110可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端等,还可以是服务器或者云端等。在图3中,执行设备110配置输入/输出(input/output,I/O)接口112,用于与外部设备进行数据交互,用户可以通过客户设备140向I/O接口112输入数据,所述输入数据在本申请实施例中可以包括:客户设备输入的待处理图像。这里的客户设备140具体可以是终端设备。
在执行设备110对输入数据进行预处理,或者在执行设备110的计算模块111执行计算等相关的处理过程中,执行设备110可以调用数据存储系统150中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统150中。
最后,I/O接口112将处理结果,如将目标模型/规则101计算得到的目标检测结果呈现给客户设备140,从而提供给用户。
值得说明的是,训练设备120可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则101,该相应的目标模型/规则101即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。在本申请的实施例中,根据训练设备120训练得到目标模型/规则101,可以是目标神经网络和/或参数生成网络。
在图3中,用户可以手动给定输入数据,该手动给定可以通过I/O接口112提供的界面进行操作。另一种情况下,客户设备140可以自动地向I/O接口112发送输入数据,如果要求客户设备140自动发送输入数据需要获得用户的授权,则用户可以在客户设备140中设置相应权限。用户可以在客户设备140查看执行设备110输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备140也可以作为数据采集端,采集如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果作为新的样本数据,并存入数据库130。当然,也可以不经过客户设备140进行采集,而是由I/O接口112直接将如图所示输入I/O接口112的输入数据及输出I/O接口112的输出结果,作为新的样本数据存入数据库130。
需要说明的是,图3仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图3中,数据存储系统150相对执行设备110是外部存储器,在其它情况下,也可以将数据存储系统150置于执行设备110中。
图4是本申请提供的一种神经网络的构建方法的示意性流程图,该方法至少包括以下几个步骤。
S410,根据处理任务的规模确定目标神经网络的规模。
该处理任务的规模可以表示目标神经网络处理任务的复杂程度。该处理任务的规模可以用该处理任务的输入的规模衡量,例如,在图像处理任务中,该处理任务的规模可以用图像的尺寸大小衡量,图像的尺寸越大,该图像处理任务的规模越大。例如,输入图像的尺寸为32*32;再如,在处理无线资源分配任务时,该处理任务的规模可以用参与无线资源分配的用户数衡量,参与无线资源分配的用户数越多,可以表示该无限资源分配问题的规模越大。通常,目标神经网络的学习能力是受限于其参数量,参数量又取决于神经元数量,所以当该目标神经网络处理任务的规模变大后,需要增加神经网络中的神经元数量,以提高神经网络的学习能力,避免神经网络参数量太少,无法学习得到正确结果的问题。
该目标神经网络可以记为
Figure PCTCN2022124843-appb-000018
其中,h表示目标神经网络的输入,
Figure PCTCN2022124843-appb-000019
表示目标神经网络的输出,W和b分别表示目标神经网络的权重参数和偏置参数。该目标神经网络的规模可以用神经网络的隐藏层的层数L和各隐藏层的神经元的数目N l表示,其中1≤l≤L。对于解决一个具体问题的目标神经网络而言,其隐藏层的层数可以根据经验设置为固定的层数,当处理任务的规模发生变化时,例如,参与无线资源分配的用户数变化时,可以通过改变该目标神经网络的各隐藏层神经元的数目来实现目标神经网络规模的改变。或者说,该目标神经网络的隐藏层的层数固定,为L,其各隐藏层的神经元的数目由处理任务的规模确定。即由处理任务的规模确定该目标神经网络的规模。
具体地,可以预配置处理任务的规模和目标神经网络规模之间的映射关系,根据该映射关系和处理任务的规模确定该目标神经网络的各隐藏层的神经元的数目。示例性地,可以设置目标神经网络的神经元数目与待处理问题的规模呈正比关系。例如,为线性正比关系N=aP,其中,P表示待处理任务的规模,也可以理解为目标神经网络输入的维度,N表示目标神经网络神经元的数目,a为线性正比参数,a为正整数。在确定目标神经网络规模的过程中,可以先固定目标神经网络的隐藏层数目Q,则各隐藏层的神经元数量可以表示为N l=ceil(aP/(b lQ)),其中,Ceil(·)表示向上取整函数,b l为第l个隐藏层的比例参数, b l取满足
Figure PCTCN2022124843-appb-000020
的值。a,Q和b l等参数可以提前经过离线的尝试进行确定,即给定不同的a,Q和b l参数组合,构建目标神经网络后,离线训练,查看其性能,最终选择性能较好的参数组合。选定a,Q和b l参数组合后,后续使用的隐藏层数目L=Q将固定,各隐藏层包括的神经元的数量N l可以通过上述公式计算得到。
S420,根据该目标神经网络的隐藏层的层数,获取参数生成网络。
该参数生成网络用于生成该目标神经网络的参数(例如,权重和偏置)。如果目标神经网络的隐藏层的层数为L,则生成m(L+1)个参数生成网络。其中,m表示神经网络隐藏层参数的种类,通常m的取值为2(权重和偏置两种参数),即生成2(L+1)个参数生成网络。该2(L+1)个参数生成网络用于生成神经网络每层之间连接的权重和偏置,其中,L+1个参数生成网络可以用于生成权重(权重参数生成网络),另外L+1个参数生成网络用于生成偏置(偏置参数生成网络)。
具体地,该L+1个权重参数生成网络中的第l个权重参数生成网络可以记为
Figure PCTCN2022124843-appb-000021
其中,l的取值范围为[1,L+1];
Figure PCTCN2022124843-appb-000022
表示该第l个权重参数生成网络的参数;
Figure PCTCN2022124843-appb-000023
是表示该权重参数生成网络的输入。该第l个权重参数生成网络用于生成目标神经网络的第l-1层(第二神经网络层的一例)和第l层(第三神经网络层的一例)之间连接的权重参数,当l的取值为1时,该第l-1层可以理解为目标神经网络的输入层,当l的取值为L+1时,该第l层可以理解为目标神经网络的输出层。
其中,
Figure PCTCN2022124843-appb-000024
分别表示第l-1层第i个神经元在第l-1层N l-1个神经元中的相对标号,第l层第j个神经元在第l层N l个神经元中的相对标号,
Figure PCTCN2022124843-appb-000025
可以分别定义为
Figure PCTCN2022124843-appb-000026
Figure PCTCN2022124843-appb-000027
N l-1和N l分别表示目标神经网络的第l-1层神经元的数量和第l层神经元的数量,i的取值范围为[1,N l-1],j的取值范围为[1,N l]。
类似地,第l个偏置参数生成网络可以记为
Figure PCTCN2022124843-appb-000028
该第l个偏置参数生成网络用于生成目标神经网络的第l层第j个神经元上的偏置参数。其中,
Figure PCTCN2022124843-appb-000029
表示第l个偏置参数生成网络的参数;
Figure PCTCN2022124843-appb-000030
表示第l个偏置参数生成网络的输入,
Figure PCTCN2022124843-appb-000031
可以定义为
Figure PCTCN2022124843-appb-000032
可以表示目标神经网络的第l层第j个神经元在第l层N l个神经元中的相对标号,j的取值范围为[1,N l]。
由以上参数生成网络的表达式可知,权重参数生成网络的输入维度为2,输出维度为1,即输入第l-1层第i个神经元和第l层第j个神经元的相对标号信息,输出第l-1层和第l层之间连接的权重参数;偏置参数生成网络的输入维度为1,输出维度为1,即输入第l层第j个神经元在第l层神经元中的相对标号信息,输出第l层第j个神经元上的偏置参数。本申请对参数生成网络的结构(隐藏层数、连接方法等)不做限制。
可选地,该方法还可以包括S421,初始化该参数生成网络。
初始化该参数生成网络即为初始化参数生成网络的中参数
Figure PCTCN2022124843-appb-000033
Figure PCTCN2022124843-appb-000034
例如,可以对每个参数进行随机初始化,本申请对初始化参数生成网络的具体方式不做限定。
示例性地,以包括L=1个隐藏层的目标神经网络为例,如图5中的(a)所示,获取的参数生成网络如图5中的(b)中所示。其中,
Figure PCTCN2022124843-appb-000035
分别表示目标神经网络的权重和偏置参数;由上述参数生成网络的个数计算可知,该参数生成网络的个数为4,参数生成网络可以采用包含1个隐藏层的全连接网络结构,其参数
Figure PCTCN2022124843-appb-000036
Figure PCTCN2022124843-appb-000037
可以采用随机初始化。
S430,根据该参数生成网络生成的参数获取目标神经网络。
获取目标神经网络即生成该目标神经网络各层之间连接的参数。由上述步骤可知,该目标神经网络的隐藏层的层数为L,各隐藏层的神经元的数目N l由处理任务的规模确定,因此,生成该目标神经各层之间连接的参数即为获取该目标神经网络。该目标神经网络的参数可以由上述参数生成网络确定。
通过上述步骤得到了参数生成网络,如图5中所示。每个参数生成网络每次推理可以得到该目标神经网络的一个参数(权重或偏置),多次使用该参数生成网络,可以得到该目标神经网络的全部参数。图6为利用该参数生成网络获取目标神经网络的示意图。
在获得目标神经网络之后,还可以利用训练数据对参数生成网络进行训练,从而得到能够生成执行特定任务的目标神经网络模型的参数生成网络(参数生成网络的模型),例如可以生成用于进行图像分类的图像分类模型的参数生成网络,又例如生成可以用于进行无线资源分配的神经网络模型的参数生成网络等等。
具体地,可以使用该目标神经网络处理任务的训练数据训练该参数生成网络。例如,目标神经网络用于图像处理任务,则该训练数据可以为图像。具体地,图像处理任务包括图像分类、图像检测、图像识别等,训练数据对应的标签可以为图像对应的类别等。再如,该目标神经网络用于无线通信系统的功率控制任务时,则训练数据可以是终端和基站之间的信道数据,训练数据对应的标签可以是该信道数据条件下的最优功率分配结果。本申请实施例对训练数据的类型不做限定。
为了方便说明训练参数生成网络的过程,在参数生成网络的参数中添加层的信息。即该权重参数生成网络可以记为:
Figure PCTCN2022124843-appb-000038
其中,
Figure PCTCN2022124843-appb-000039
表示用于生成目标神经网络的第l-1层和第l层之间的权重参数的参数生成网络的参数。当l的取值为1时,
Figure PCTCN2022124843-appb-000040
表示生成目标神经网络的输入层和第一层隐藏层之间的权重参数的参数生成网络的参数,当l的取值为L+1时,表示生成目标神经网络的第L层隐藏层和输出层之间的权重参数的参数生成网络的参数;同样,偏置参数生成网络可以记为:
Figure PCTCN2022124843-appb-000041
其中,
Figure PCTCN2022124843-appb-000042
表示生成目标神经网络的第l层的偏置参数的参数生成网络的参数。参数生成网络的参数θ W和θ b可以通过监督或无监督学习的方式进行训练。
如果采用监督学习的方式训练,即求解公式(4):
Figure PCTCN2022124843-appb-000043
其中,L(·)表示监督学习的损失函数,p *(h)表示训练数据的标签。
采用非监督学习方式训练,即求解公式(5):
Figure PCTCN2022124843-appb-000044
其中,J(·)表示非监督学习的损失函数。
以监督学习为例,其训练过程中的梯度回传示意图如图7中所示。在参数生成网络训练阶段的一次前向传播中,首先输入参数生成网络的输入
Figure PCTCN2022124843-appb-000045
或者输入
Figure PCTCN2022124843-appb-000046
生成目标神经网络的参数W和b;再将训练数据h输入目标神经网络,推理得到目标神经网络的输出
Figure PCTCN2022124843-appb-000047
该输出结果即为目标神经网络的任务的一个结果;通过比较推理得到的
Figure PCTCN2022124843-appb-000048
和最优的p * (也就是训练数据对应的标签)之间的差距,通过反向传播将损失函数对所有参数生成网络的参数求梯度,其中,
Figure PCTCN2022124843-appb-000049
表示损失函数L(·)对
Figure PCTCN2022124843-appb-000050
求导得到的梯度,
Figure PCTCN2022124843-appb-000051
分别表示
Figure PCTCN2022124843-appb-000052
分别对目标神经网络的参数W和b求导得到的梯度,
Figure PCTCN2022124843-appb-000053
W表示W对参数生成网络的参数θ W求导得到的梯度,
Figure PCTCN2022124843-appb-000054
分别表示b对θ b求导得到的梯度;利用优化算法,例如,梯度下降法,来更新参数生成网络的参数。重复以上前向传播和反向传播过程,直到该参数生成网络收敛,即完成了该参数生成网络的训练。
可选地,在参数生成网络进行一定的回合训练之后,还可以进行收敛判断。收敛判断的准则可以是是否达到最大训练回合数,也可以是判断目标神经网络是否达到预设性能。该预设性能可以通过神经网络评价指标来判断,例如,判断目标神经网络的推理准确度等。如果该收敛判断为是,则结束训练;否则回到S430,用训练更新过的参数生成网络重新初始化目标神经网络,并进行下一步训练和收敛判断。重复以上过程,完成参数生成网络的训练。
可以理解的是,在本申请的实施例中,目标神经网络的参数(权重和偏置)是由参数生成网络生成的,因此无需训练。以上训练参数生成网络的过程是通过训练数据训练参数生成网络的参数的过程,即上述
Figure PCTCN2022124843-appb-000055
Figure PCTCN2022124843-appb-000056
其中0<l≤L+1。
可选地,在进行参数生成网络训练之前,对于满足PE性质的神经网络的任务,例如,无线资源分配任务,还可以对训练数据进行排序。对该训练数据进行排序可以理解为对该训练数据中的每一个样本的数据进行排序,例如,按照数据的值从大到小排序。对训练数据进行排序可以减少训练参数生成网络所需的样本数量。
此外,如果两个相同隐藏层层数的目标神经网络(层数为L)的输入分布相同且相互独立(例如,信道模值都服从瑞利分布),对训练数据进行排序,可以使得该两个目标神经网络中,相邻两层中相对标号相近的神经元之间的连接参数相近。即如果目标神经网络A和目标神经网络B的输入分布相同且相互独立,目标神经网络A的第l-1层第i A个神经元和第l层第j A个神经元之间连线的权重记为
Figure PCTCN2022124843-appb-000057
第l层第j A个神经元上的偏置为
Figure PCTCN2022124843-appb-000058
第l-1层和第l层神经元数量分别为N l-1,A,N l,A;目标神经网络B的第l-1层第i B个神经元和第l层第j B个神经元之间连线的权重为
Figure PCTCN2022124843-appb-000059
第l层第j B个神经元上的偏置为
Figure PCTCN2022124843-appb-000060
第l-1层和第l层神经元数量分别为N l-1,B,N l,B;若i A/(N l-1,A+1)≈i B/(N l-1,B+1),即目标神经网络A的第l-1层第i A个神经元在目标神经网络A的第l-1层的相对标号,约等于目标神经网络B的第l-1层第i B个神经元在目标神经网络B的第l-1层的相对标号,目标神经网络A的第l层第j A个神经元在目标神经网络A的第l层的相对标号,约等于目标神经网络B的第l层第j B个神经元在目标神经网络B的第l层的相对标号,则有
Figure PCTCN2022124843-appb-000061
Figure PCTCN2022124843-appb-000062
以无线资源分配问题为例,在不同的规模下(例如,参与无限资源分配的用户数不同),不同环境状态的分布相同且相互独立,经过排序后,不同规模下环境状态中相对标号相近的元素具有近似相等的均值。例如,考虑两种不同的规模的无线资源分配的场景,场景#1中目标神经网络的输入维度为10(例如,该场景中有10个用户),场景#2中目标神经网络的输入维度为20。在场景#1中,10个用户也可能有多种不同的部署位置,通过随机撒点的方式产生1000种部署子场景;同理,在场景#2中,也随机产生1000种子场景。那 么1000种10个用户场景的子场景中,第2个用户的环境状态(假设其环境状态用1个元素表征)共有1000个值(对应1000种子场景),则对这1000个值取平均,得到了用户2的环境状态的均值,这个均值和1000种20个用户的子场景中的第4个用户的状态的均值是近似相等的。
图8为不同规模下目标神经网络偏置参数值与神经元相对标号的函数关系图,可以看出,如果将神经元的相对标号作为函数(参数神经网络)的输入,目标神经网络偏置参数值作为函数的输出,则可以通过插值操作,获得不同维度的目标神经网络的参数值。即本申请中参数生成网络对生成不同维度的目标神经网络具有很好的泛化性。
以上介绍了参数生成网络的获取及训练过程,完成参数生成网络的训练即可获得一组参数生成网络的模型,其可以部署在实际的应用场景(例如,无线通信系统)中,进行推理,获得所需的目标神经网络模型。以下结合图9介绍参数生成网络模型的推理过程。
S910,根据处理任务的规模,确定目标神经网络的规模。
其中,目标神经网络的规模可以用各隐藏层神经元数目N l表示,其中0<l≤L+1。即根据处理任务的规模,确定目标神经网络的各隐藏层神经元数目。
S920,生成该目标神经网络的参数,得到目标神经网络的模型。
在确定目标神经网络各隐藏层神经元数量之后,可以利用上述训练好的参数生成网络生成目标神经网络的参数(包括权重和偏置),从而可以确定该目标神经网络。
S930,将推理数据输入该目标神经网络中,得到推理结果。
可选地,还可以将推理数据中的元素进行排序,再输入该目标神经网络中,得到推理结果。
S940,确定该处理任务的规模是否发生变化。
如果该处理任务的规模发生变化,则重新执行S910至S930。
根据本申请实施例的方法,在处理任务的规模发生变化时,参数生成网络可以以较低的复杂度生成不同规模的目标神经网络,即该参数生成网络具有泛化性。
为了便于更好地理解本申请实施例提供的神经网络的构建方法,下面结合图10和11以解决单小区无线下行功率控制问题为例来进行说明。如图10所示,该方法至少包括以下几个步骤。
S1010,确定用户总数K。
该用户总数K可以表示该小区中参与无线下行功率控制的用户数。
S1020,根据该用户总数K,确定目标神经网络的规模(隐藏层的数量L和各隐藏层的神经元数量N l)。
其中,该目标神经网络用于解决无线下行功率控制的问题,即该目标神经网络可以输入下行信道状态信息,输出基站向该K个用户发送数据时的最优发射功率。该目标神经网络可以是全连接神经网络,则其参数种类为2,即权重参数和偏置参数。
S1030,获取2(L+1)个参数生成网络。
该参数生成网络用于生成该目标神经网络的权重和偏置。其中,L+1个参数生成网络可以用于生成权重(权重参数生成网络),另外L+1个参数生成网络用于生成偏置(偏置参数生成网络)。该参数生成网络的具体介绍参考S420。
S1040,利用该2(L+1)个参数生成网络获取该目标神经网络的参数。
通过该2(L+1)个参数生成网络可以生成该目标神经网络中的权重参数和偏置参数的初始值,后续,通过训练后的参数生成网络可以更新该初始参数。
S1050,对训练数据进行排序,利用该排序后训练数据训练该参数生成网络。
该训练数据中的每个数据样本(h,p *)包括信道状态信息h=[h 1,...,h K]和该信道状态信息下最优的功率控制策略
Figure PCTCN2022124843-appb-000063
其中h k为第k个用户与基站之间的下行信道状态信息,
Figure PCTCN2022124843-appb-000064
为基站向第k个用户发送数据时的最优发射功率。对训练数据进行排序可以包括,对h=[h 1,...,h K]中的k个元素根据其绝对值从大到小进行过排序,即h 1≥h 2≥h 3≥...≥h K。训练该参数生成网络的过程和S440中类似,在此不再赘述。
根据以上步骤得到了2(L+1)个参数生成网络,该参数生成网络可以部署在基站侧,该参数生成网络可以用于生成不同规模的目标神经网络,即参数生成网络的推理过程。该参数生成网络的推理过程如图11中所示,该推理过程至少包括以下几个步骤。
S1110,确定当前功控周期内的用户数K 1,并根据用户数K 1确定目标神经网络#1的规模。
该步骤和S410中类似,在此不再赘述。
S1120,根据训练好的2(L+1)个参数生成网络生成目标神经网络#1的权重参数和偏置参数,得到目标神经网络#1。
S1130:获取该K 1个用户的下行信道状态信息h=[h 1,...,h K]。
该下行信道状态信息可以由用户通过信道估计得到,并反馈给基站。
S1140,将该K 1个用户的下行信道状态信息按照从大到小排序,并输入目标神经网络#1中,得到功控结果p=[p 1,...,p K]。
可选地,S1150,根据该功控结果,进行下行通信。
可选地,如果进入下一功控周期,则重复S1110至S1150。
下面结合图12对本申请实施例的神经网络的构建装置进行介绍。图12所示的神经网络构建装置可以用于执行本申请实施例的神经网络构建方法的各个步骤,该神经网络构建装置可以是电脑、服务器等运算能力足以用来构建神经网络的装置。
图12是本申请实施例的神经网络的构建装置的示意性框图。图12所示的装置1200包括处理单元1210,可选地,该装置1200还可以包括获取单元1220。
装置2000可以用于执行本申请实施例的神经网络的构建方法的步骤。例如,处理单元1210可以用于执行图4所示方法中的步骤S410至S440,或者执行图9所示方法中的步骤S910至S940;或者用于执行图10所示方法中的步骤S1010至S1050;或者执行图11所示方法中的S1110,S1120以及S1140。
可选地,装置2000还可以用于利用训练好的参数生成网络得到具有特定功能的目标神经网络,或者可以理解为利用训练好的参数生成网络生成能够执行特定任务的目标神经网络的模型,例如用于处理无线下行功率控制任务的目标神经网络模型,还可以是其他特定任务的模型。
其中,该获取单元可用于获取训练数据或推理数据,利用训练数据对参数生成网络进行训练,得到参数生成网络的模型;利用推理数据以及目标神经网络的模型得到处理任务的结果。例如,获取单元1220可以用于执行图11所示方法中的步骤S1130。
装置1200还可以通过获取单元1220获取训练好的参数生成网络的模型,该获取单元 1220可以相当于图13所示的装置1300中的通信接口1330,通过该通信接口1330可以获得训练好的参数生成网络,该参数生成网络的训练可以通过离线训练完成;或者,获取单元1220也可以相当于图13所示的装置1300中的处理器1320,此时可以通过处理器1320从存储器1310中获取训练好的参数生成网络。
此外,上述图12所示的装置1200中的处理单元1210可以相当于图13所示的装置1300中处理器1320。
需要说明的是,上述装置1200以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图13是本申请实施例的神经网络的构建装置的硬件结构示意图。图13所示的装置1300包括存储器1310、处理器13020、通信接口1330以及总线1340。其中,存储器1310、处理器1320、通信接口1330通过总线1340实现彼此之间的通信连接。
存储器1310可以是只读存储器(read only memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(random access memory,RAM)。存储器1310可以存储程序,当存储器1310中存储的程序被处理器1320执行时,处理器1320和通信接口1330用于执行本申请实施例的神经网络的构建方法的各个步骤。
处理器1320可以采用通用的中央处理器(central processing unit,CPU),微处理器,应用专用集成电路(application specific integrated circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,以实现本申请实施例的神经网络的构建装置中的单元所需执行的功能,或者执行本申请实施例的神经网络的构建方法的各个步骤。
处理器1320还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请实施例的神经网络的构建方法的各个步骤可以通过处理器1320中的硬件的集成逻辑电路或者软件形式的指令完成。
上述处理器1320还可以是通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所提供的神经网络的构建方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1310,处理器1320读取存储器1310中的信息,结合其硬件完成本申 请实施例的神经网络的构建装置中包括的单元所需执行的功能,或者执行本申请实施例的神经网络的构建方法的各个步骤。
通信接口1330使用例如但不限于收发器一类的收发装置,来实现装置1300与其他设备或通信网络之间的通信。例如,可以通过通信接口1330发送推理结果所对应的控制参数。
总线1340可包括在装置1300各个部件(例如,存储器1310、处理器1320、通信接口1330)之间传送信息的通路。
需要说明的是,尽管上述装置1300仅仅示出了存储器、处理器、通信接口,但是在具体实现过程中,本领域的技术人员应当理解,装置1300还可以包括实现正常运行所必须的其他器件。同时,根据具体需要,本领域的技术人员应当理解,装置1300还可包括实现其他附加功能的硬件器件。此外,本领域的技术人员应当理解,装置1300也可仅仅包括实现本申请实施例所必须的器件,而不必包括图13中所示的全部器件。
本申请并未对本申请实施例提供的方法的执行主体的具体结构进行限定,只要能够通过运行记录有本申请实施例提供的方法的代码的程序,以根据本申请实施例提供的方法进行通信即可。例如,本申请实施例提供的方法的执行主体可以是网络设备,或者,网络设备中能够调用程序并执行程序的功能模块。
本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本文中使用的术语“制品”可以涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。例如,计算机可读介质可以包括但不限于:磁存储器件(例如,硬盘、软盘或磁带等),光盘(例如,压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等),智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)。
本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可以包括但不限于:无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)可以集成在处理器中。
还需要说明的是,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的保护范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分, 仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。此外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上,或者说对现有技术做出贡献的部分,或者该技术方案的部分,可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,该计算机软件产品包括若干指令,该指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。前述的存储介质可以包括但不限于:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (20)

  1. 一种神经网络的构建方法,其特征在于,包括:
    根据参数生成网络生成目标神经网络的参数,所述参数生成网络的输入包括所述目标神经网络中神经元的相对标号的信息,所述神经元的相对标号表示所述神经元在第一神经网络层中的相对位置,所述第一神经网络层为所述目标神经网络中所述神经元所在的层;
    根据所述目标神经网络的参数构建所述目标神经网络。
  2. 根据权利要求1所述的方法,其特征在于,在根据参数生成网络生成所述目标神经网络的参数之前,所述方法还包括:
    获取N个所述参数生成网络;
    其中,N由所述目标神经网络的参数种类M与所述目标神经网络的隐藏层数目L确定,M、L为正整数。
  3. 根据权利要求1或2所述的方法,其特征在于,所述目标神经网络包括第二神经网络层和第三神经网络层,所述第二神经网络层包括N 1个神经元,所述第三神经网络层包括N 2个神经元,所述根据所述参数生成网络生成所述目标神经网络的参数包括:
    将所述第二神经网络层中的第i个神经元的相对标号以及所述第三神经网络层中的第j个神经元的相对标号输入第一参数生成网络,生成所述第二神经网络层中第i个神经元和所述第三神经网络层中第j个神经元之间连接的权重参数,所述第一参数生成网络为所述N个参数生成网络中用于生成所述第二神经网络层和所述第三神经网络层之间的权重参数的参数生成网络,其中,1≤i≤N 1,1≤j≤N 2
  4. 根据权利要求1或2所述的方法,其特征在于,所述目标神经网络包括第二神经网络层,所述第二神经网络层包括N 1个神经元,所述根据参数生成网络生成所述目标神经网络的参数包括:
    将所述第二神经网络层中的第i个神经元的相对标号输入第二参数生成网络,生成所述第二神经网络层中第i个神经元上的偏置参数,所述第二参数生成网络为所述N个参数生成网络中用于生成所述第二神经网络层的偏置参数的参数生成网络,其中,1≤i≤N 1
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,在根据参数生成网络生成所述目标神经网络的参数之前,所述方法还包括:
    根据所述目标神经网络处理任务的规模确定所述目标神经网络的隐藏层数目L和所述L个隐藏层中每个隐藏层中神经元的数目,L为正整数。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,在根据参数生成网络生成所述目标神经网络的参数之前,所述方法还包括:
    根据所述目标神经网络处理任务的训练数据训练所述参数生成网络。
  7. 根据权利要求6所述的方法,其特征在于,根据所述目标神经网络处理任务的训练数据训练所述参数生成网络包括:
    将训练数据进行排序,所述训练数据为所述目标神经网络处理任务的训练数据;
    将排序后的所述训练数据输入所述目标神经网络得到所述目标神经网络的输出;
    根据损失函数更新所述参数生成网络的参数以训练所述参数生成网络,所述损失函数 用于根据所述目标神经网络的输出以及所述训练数据的标签更新所述参数生成网络的参数。
  8. 根据权利要求7所述的方法,其特征在于,在根据参数生成网络生成所述目标神经网络的参数之后,所述方法还包括:
    若所述目标神经网络处理任务的规模发生变化,根据所述处理任务的规模重新确定所述L个隐藏层中每个隐藏层中神经元的数目;
    利用训练后的所述参数生成网络更新所述目标神经网络的参数。
  9. 一种神经网络的构建装置,其特征在于,包括:
    处理单元,用于根据参数生成网络生成目标神经网络的参数,所述参数生成网络的输入包括所述目标神经网络中神经元的相对标号的信息,所述神经元的相对标号表示所述神经元在第一神经网络层中的相对位置,所述第一神经网络层为所述目标神经网络中所述神经元所在的层;
    所述处理单元还用于根据所述目标神经网络的参数构建所述目标神经网络。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    获取单元,用于获取N个所述参数生成网络;
    其中,N由所述目标神经网络的参数种类M与所述目标神经网络的隐藏层数目L确定,M、L为正整数。
  11. 根据权利要求9或10所述的装置,其特征在于,所述目标神经网络包括第二神经网络层和第三神经网络层,所述第二神经网络层包括N 1个神经元,所述第三神经网络层包括N 2个神经元,所述处理单元具体用于:
    将所述第二神经网络层中的第i个神经元的相对标号以及所述第三神经网络层中的第j个神经元的相对标号输入第一参数生成网络,生成所述第二神经网络层中第i个神经元和所述第三神经网络层中第j个神经元之间连接的权重参数,所述第一参数生成网络为所述N个参数生成网络中用于生成所述第二神经网络层和所述第三神经网络层之间的权重参数的参数生成网络,其中,1≤i≤N 1,1≤j≤N 2
  12. 根据权利要求9或10所述的装置,其特征在于,所述目标神经网络包括第二神经网络层,所述第二神经网络层包括N 1个神经元,所述处理单元具体用于:
    将所述第二神经网络层中的第i个神经元的相对标号输入第二参数生成网络,生成所述第二神经网络层中第i个神经元上的偏置参数,所述第二参数生成网络为所述N个参数生成网络中用于生成所述第二神经网络层的偏置参数的参数生成网络,其中,1≤i≤N 1
  13. 根据权利要求9至12中任一项所述的装置,其特征在于,所述处理单元还用于:
    根据所述目标神经网络处理任务的规模确定所述目标神经网络的隐藏层数目L和所述L个隐藏层中每个隐藏层中神经元的数目,L为正整数。
  14. 根据权利要求9至13中任一项所述的装置,其特征在于,所述处理单元还用于:
    根据所述目标神经网络处理任务的训练数据训练所述参数生成网络。
  15. 根据权利要求14所述的装置,其特征在于,所述处理单元具体用于:
    将训练数据进行排序,所述训练数据为所述目标神经网络处理任务的训练数据;
    将排序后的所述训练数据输入所述目标神经网络得到所述目标神经网络的输出;
    根据损失函数更新所述参数生成网络的参数以训练所述参数生成网络,所述损失函数 用于根据所述目标神经网络的输出以及所述训练数据的标签更新所述参数生成网络的参数。
  16. 根据权利要求15所述的装置,其特征在于,所述处理单元还用于:
    若所述目标神经网络处理任务的规模发生变化,根据所述处理任务的规模重新确定所述L个隐藏层中每个隐藏层中神经元的数目;
    利用训练后的所述参数生成网络更新所述目标神经网络的参数。
  17. 一种神经网络模型的运算装置,其特征在于,
    包括处理器和存储器,所述存储器用于存储程序指令,所述处理器用于调用所述程序指令以执行如权利要求1至8中任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,
    所述计算机可读存储介质用于存储设备执行的程序代码,所述程序代码包括用于执行如权利要求1至8中任一项所述的方法。
  19. 一种包含指令的计算机程序产品,其特征在于,
    当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至8中任一项所述的方法。
  20. 一种芯片,其特征在于,
    包括处理器与数据接口,所述处理器通过所述数据接口读取存储器上存储的指令以执行如权利要求1至8中任一项所述的方法。
PCT/CN2022/124843 2021-10-29 2022-10-12 神经网络的构建方法和装置 WO2023071793A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111271593.1 2021-10-29
CN202111271593.1A CN116090512A (zh) 2021-10-29 2021-10-29 神经网络的构建方法和装置

Publications (1)

Publication Number Publication Date
WO2023071793A1 true WO2023071793A1 (zh) 2023-05-04

Family

ID=86159126

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124843 WO2023071793A1 (zh) 2021-10-29 2022-10-12 神经网络的构建方法和装置

Country Status (2)

Country Link
CN (1) CN116090512A (zh)
WO (1) WO2023071793A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229526A (zh) * 2017-06-16 2018-06-29 北京市商汤科技开发有限公司 网络训练、图像处理方法、装置、存储介质和电子设备
CN109063829A (zh) * 2018-06-22 2018-12-21 泰康保险集团股份有限公司 神经网络构建方法、装置、计算机设备和存储介质
CN110163345A (zh) * 2019-05-09 2019-08-23 腾讯科技(深圳)有限公司 一种神经网络处理方法、装置、设备及介质
US20190385704A1 (en) * 2018-06-19 2019-12-19 Orbai Technologies, Inc. Apparatus and method for utilizing a parameter genome characterizing neural network connections as a building block to construct a neural network with feedforward and feedback paths
CN112418392A (zh) * 2020-10-21 2021-02-26 华为技术有限公司 一种神经网络构建方法以及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229526A (zh) * 2017-06-16 2018-06-29 北京市商汤科技开发有限公司 网络训练、图像处理方法、装置、存储介质和电子设备
US20190385704A1 (en) * 2018-06-19 2019-12-19 Orbai Technologies, Inc. Apparatus and method for utilizing a parameter genome characterizing neural network connections as a building block to construct a neural network with feedforward and feedback paths
CN109063829A (zh) * 2018-06-22 2018-12-21 泰康保险集团股份有限公司 神经网络构建方法、装置、计算机设备和存储介质
CN110163345A (zh) * 2019-05-09 2019-08-23 腾讯科技(深圳)有限公司 一种神经网络处理方法、装置、设备及介质
CN112418392A (zh) * 2020-10-21 2021-02-26 华为技术有限公司 一种神经网络构建方法以及装置

Also Published As

Publication number Publication date
CN116090512A (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2020221200A1 (zh) 神经网络的构建方法、图像处理方法及装置
WO2022083536A1 (zh) 一种神经网络构建方法以及装置
WO2020228376A1 (zh) 文本处理方法、模型训练方法和装置
WO2021043193A1 (zh) 神经网络结构的搜索方法、图像处理方法和装置
CN111695415B (zh) 图像识别方法及相关设备
CN111507378A (zh) 训练图像处理模型的方法和装置
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
WO2021218517A1 (zh) 获取神经网络模型的方法、图像处理方法及装置
CN110659723B (zh) 基于人工智能的数据处理方法、装置、介质及电子设备
US20220215259A1 (en) Neural network training method, data processing method, and related apparatus
US20230153615A1 (en) Neural network distillation method and apparatus
WO2022156561A1 (zh) 一种自然语言处理方法以及装置
CN111382868A (zh) 神经网络结构搜索方法和神经网络结构搜索装置
CN113011282A (zh) 图数据处理方法、装置、电子设备及计算机存储介质
CN112990211A (zh) 一种神经网络的训练方法、图像处理方法以及装置
WO2020112189A1 (en) Computer architecture for artificial image generation using auto-encoder
WO2023093724A1 (zh) 神经网络模型的处理方法及装置
WO2022007867A1 (zh) 神经网络的构建方法和装置
CN113159283A (zh) 一种基于联邦迁移学习的模型训练方法及计算节点
CN111612215A (zh) 训练时间序列预测模型的方法、时间序列预测方法及装置
WO2022012668A1 (zh) 一种训练集处理方法和装置
WO2023280113A1 (zh) 数据处理方法、神经网络模型的训练方法及装置
WO2022161387A1 (zh) 一种神经网络的训练方法及相关设备
CN114492723A (zh) 神经网络模型的训练方法、图像处理方法及装置
CN114004383A (zh) 时间序列预测模型的训练方法、时间序列预测方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885672

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022885672

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022885672

Country of ref document: EP

Effective date: 20240508