CN115358390A

CN115358390A - Neural network training method and device, electronic equipment and storage medium

Info

Publication number: CN115358390A
Application number: CN202211078131.2A
Authority: CN
Inventors: 邓辰辰; 郑纪元; 王钰言; 林珠; 吴嘉敏; 范静涛; 方璐; 戴琼海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2022-09-05
Filing date: 2022-09-05
Publication date: 2022-11-18

Abstract

The present application relates to the field of neural network technology, and in particular, to a method, an apparatus, an electronic device, and a storage medium for training a neural network, wherein the method includes: determining a target neural network to be trained; splitting a target neural network into a plurality of sub-networks with the same structure according to a preset splitting strategy, and carrying out logic operation training on any one sub-network to obtain a weight parameter of the sub-network; and splicing each sub-network into an integral network by using a preset splicing strategy, and realizing corresponding logic operation by using the integral network, and/or training a target neural network by using the weight parameter of the integral network as an initial weight parameter, and realizing corresponding logic operation by using the trained target neural network. Therefore, the problems that reusability of a logic operation task cannot be fully mined, training complexity is high, training time is long, efficiency is low and the like by using an overall network parameter initialization method in the related art are solved.

Description

Neural network training method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of neural network technologies, and in particular, to a method and an apparatus for training a neural network, an electronic device, and a storage medium.

Background

The infrastructure of the information age is integrated circuit electronic chips, and the progress of integrated circuit process technology has been one of the main measures to improve the performance and energy efficiency of computing chips in the past decades. However, as moore's law and the denuded scaling law slow down or even end, this method is gradually failing and computing the performance and energy efficiency of the chip faces bottlenecks. The light has the advantages of the fastest propagation speed of a physical space and multi-dimensional and multi-scale, and the light calculation for information processing by replacing the traditional electronics with photons is expected to construct a new generation of high-performance calculation chip, thereby becoming an important support for the new generation of information industry. Particularly, with the deep development of artificial intelligence algorithm, the mathematical expression of the physical process of limited propagation of light in a medium has high similarity with the deep neural network algorithm, and the adoption of the photoelectric neural network to realize the logic operation is expected to break through the energy efficiency bottleneck of the traditional electronic chip, and is an important basis for realizing the photoelectric computer.

In the related technology, a neural network can be used for realizing corresponding logic operation through training, and the essence of the neural network training is the tuning of network weight parameters. The training process consists of two processes, forward propagation of the signal and back propagation of the error. Generally, initialization is realized by adopting a mode of carrying out all 0 or random on the parameters of the whole network at the beginning of training, and the method is used for calculating the forward propagation of the first signal, and has universality for intelligent reasoning tasks such as image recognition.

However, when the training of the neural network aims at implementing logical operation, especially when the input layer neurons are large in scale, that is, the bit width of the logical operation is wide, the method for initializing the overall network parameters does not fully exploit the reusability of the logical operation task, and has high training complexity, long training time and low efficiency.

Disclosure of Invention

The application provides a training method and device for a neural network, electronic equipment and a storage medium, and aims to solve the problems that reusability of a logic operation task cannot be fully mined, training complexity is high, training time is long, efficiency is low and the like by using an overall network parameter initialization method in the related art.

An embodiment of a first aspect of the present application provides a training method for a neural network, including the following steps: determining a target neural network to be trained; splitting the target neural network into a plurality of sub-networks with the same structure according to a preset splitting strategy, and carrying out logic operation training on any one sub-network to obtain a weight parameter of the sub-network; and splicing each sub-network into an integral network by using a preset splicing strategy, and realizing corresponding logic operation by using the integral network, and/or training the target neural network by using the weight parameter of the integral network as an initial weight parameter, and realizing corresponding logic operation by using the trained target neural network.

Optionally, in an embodiment of the present application, the splitting the target neural network into a plurality of sub-networks according to a preset splitting policy includes: splitting the target neural network into a plurality of sub-networks according to a structure in which each sub-network includes a first input node, a second input node, and an output node.

Optionally, in an embodiment of the present application, the logic operation training performed on any one of the subnetworks to obtain the weight parameter of the subnetwork includes: inputting training data into the first input node and the second input node, performing and operation, or operation and non-operation one or more combinational logic operation training through the first input node and the second input node until a preset condition is met, stopping training, and obtaining a weight parameter of each sub-network.

Optionally, in one embodiment of the present application, the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.

The embodiment of the second aspect of the present application provides a training apparatus for a neural network, including: the determining module is used for determining a target neural network to be trained; the processing module is used for splitting the target neural network into a plurality of sub-networks with the same structure according to a preset splitting strategy, and performing logic operation training on any one of the sub-networks to obtain a weight parameter of the sub-network; and the training module is used for splicing each sub-network into an integral network by using a preset splicing strategy, realizing corresponding logic operation by using the integral network, and/or training the target neural network by using the weight parameter of the integral network as an initial weight parameter, and realizing corresponding logic operation by using the trained target neural network.

Optionally, in an embodiment of the present application, the processing module is further configured to split the target neural network into a plurality of sub-networks according to a structure in which each sub-network includes a first input node, a second input node, and an output node.

Optionally, in an embodiment of the present application, the processing module is further configured to input training data into the first input node and the second input node, perform training of one or more combinational logic operations of and operation, or operation, and not operation through the first input node and the second input node until a preset condition is met, stop training, and obtain a weight parameter of each sub-network.

An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the training method of the neural network as described in the above embodiments.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the program is executed by a processor, and is used for implementing the training method for a neural network as described in the foregoing embodiments.

Therefore, the application has at least the following beneficial effects:

the target neural network is divided into two input and one output sub-networks, the weight parameters obtained by carrying out logic operation training on any one sub-network can be used as the initial values of the networks, and the target neural network or the training initial weight parameters are obtained by splicing, so that the training complexity can be greatly reduced, and the training speed is improved. Therefore, the problems that the reusability of a logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a training method of a neural network according to an embodiment of the present application;

FIG. 2 is a schematic diagram of logic operation oriented neural network training provided in accordance with an embodiment of the present application;

FIG. 3 is a block diagram of a training apparatus for neural networks according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Description of reference numerals: a determination module-100, a processing module-200, a training module-300, a memory-401, a processor-402, and a communication interface-403.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative and intended to explain the present application and should not be construed as limiting the present application.

A neural network training method, apparatus, electronic device, and storage medium according to embodiments of the present application are described below with reference to the accompanying drawings. In view of the above-mentioned problems in the background art, the present application provides a training method for a neural network, in which a target neural network is split into two input and one output subnetworks, a weight parameter obtained by performing logic operation training on any one of the subnetworks can be used as an initial value of the network, and the target neural network or the training initial weight parameter is obtained by splicing, so that the training complexity can be greatly reduced, and the training speed can be increased. Therefore, the problems that reusability of a logic operation task cannot be fully mined, training complexity is high, training time is long, efficiency is low and the like by using an overall network parameter initialization method in the related art are solved.

Specifically, fig. 1 is a schematic flow chart of a training method of a neural network according to an embodiment of the present disclosure.

As shown in fig. 1, the training method of the neural network includes the following steps:

in step S101, a target neural network to be trained is determined.

The embodiment of the application can train the neural network to realize logic operation, effectively improve the performance of the system, reduce the power consumption and realize exponential increase of energy efficiency. The target neural network includes, but is not limited to, one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.

In step S102, the target neural network is split into a plurality of sub-networks with the same structure according to a preset splitting strategy, and logic operation training is performed on any one of the sub-networks to obtain a weight parameter of the sub-network.

The preset splitting strategy is a structure that each sub-network comprises a first input node, a second input node and an output node, so that the target neural network is split into a plurality of sub-networks.

It can be understood that the essence of the neural network training is tuning of the network weight parameters, when the training target of the neural network is to implement logical operation, especially when the input layer neurons are large in scale, that is, the bit width of the logical operation is wide, the embodiment of the present application can split the target neural network to be trained, and the weight parameters obtained by the sub-network training can be used as the initial values of the network, so that compared with the method that initialization is implemented by performing all-0 or random on the overall network parameters at the beginning of training, the training complexity is greatly reduced.

In an embodiment of the present application, training a logic operation on any one subnetwork to obtain a weight parameter of the subnetwork includes: inputting training data into a first input node and a second input node, performing one or more of AND operation, OR operation and NOT operation on the training data through the first input node and the second input node until a preset condition is met, stopping training, and obtaining a weight parameter of the subnetwork.

The neural network facing arithmetic logic operation in the embodiment of the application is different from the traditional neural network facing intelligent reasoning task, and the logic operation generally adopts bitwise operation, so that each output node is only related to two input nodes, the target network is divided into a plurality of sub-networks with two inputs and one output and the same structure and parameters, and the sub-networks are trained to obtain the weight parameters, thereby realizing the corresponding logic operation.

Specifically, as shown in FIG. 2, the operands of the logical operation are B _x… B ₁ B ₀ And A _x… A ₁ A ₀ The output is C _x …C ₁ C ₀ . Can be directly trained on B ₀ And A ₀ As an input, C ₀ Corresponding logic operation is realized for the output neural network, the system performance is effectively improved, the power consumption is reduced, and the energy efficiency is realizedThe rate increases exponentially.

In step S103, each sub-network is spliced into an overall network by using a preset splicing strategy, and a corresponding logical operation is implemented by using the overall network, and/or a target neural network is trained by using a weight parameter of the overall network as an initial weight parameter, and a corresponding logical operation is implemented by using the trained target neural network.

The preset splicing strategy can be selected according to actual conditions, and is not particularly limited.

It can be understood that, in the embodiments of the present application, after the weight parameter of any sub-network is obtained, a plurality of identical sub-networks can be spliced into an integral target neural network by using a preset splicing strategy, so that corresponding logical operations can be directly implemented. And/or, the weight parameters obtained by training the sub-networks are spliced to obtain the initial weight parameters of the whole neural network, and the initial weight parameters are utilized to train the target neural network, so that the training speed can be greatly improved, a large-scale network can be realized by splicing a plurality of sub-networks, compared with the method of directly training the whole neural network from random weight initial values, the efficiency is higher, the training complexity of the neural network can be reduced, the training time is shortened, and the performance improvement is more obvious especially for the training of the large-scale neural network.

Specifically, since the logic operation is performed by bit, the neural network for multi-bit logic operation can be divided into a plurality of sub-networks for one-bit logic operation, and the divided sub-networks for two-input one-output all have the same structure and parameters, thereby implementing the corresponding logic operation. According to the embodiment of the application, after one sub-network is trained to obtain the initial weight parameters, a plurality of identical sub-networks can be spliced into an integral target neural network, if the spliced neural network adopts an electrical neural network and a part of optical neural network, corresponding logic operation can be directly realized, and for the part of optical neural networks, the spliced neural network can be used as the original parameter values for training to improve the training speed.

According to the training method of the neural network provided by the embodiment of the application, the target neural network is divided into two-input one-output sub-networks, the weight parameters obtained by carrying out logic operation training on any one sub-network can be used as the initial values of the network, the initial weight parameters of the target neural network are obtained by splicing, the training complexity can be greatly reduced, and the training speed is improved. Therefore, the problems that the reusability of a logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.

Next, a training apparatus for a neural network according to an embodiment of the present application will be described with reference to the drawings.

Fig. 3 is a block diagram illustrating a training apparatus for a neural network according to an embodiment of the present disclosure.

As shown in fig. 3, the training apparatus 10 for neural network includes: a determination module 100, a processing module 200 and a training module 300.

The determining module 100 is configured to determine a target neural network to be trained; the processing module 200 is configured to split the target neural network into a plurality of sub-networks with the same structure according to a preset splitting strategy, and perform logic operation training on any one of the sub-networks to obtain a weight parameter of the sub-network; the training module 300 is configured to splice each sub-network into an overall network by using a preset splicing strategy, and implement corresponding logical operations by using the overall network, and/or train a target neural network by using weight parameters of the overall network as initial weight parameters, and implement corresponding logical operations by using the trained target neural network.

In an embodiment of the application, the processing module 200 is further configured to split the target neural network into a plurality of sub-networks according to a structure in which each sub-network includes a first input node, a second input node, and an output node.

In an embodiment of the present application, the processing module 200 is further configured to input training data into the first input node and the second input node, perform training of one or more combinational logic operations of and operation, or non-operation through the first input node and the second input node until a preset condition is met, stop training, and obtain a weight parameter of each sub-network.

In one embodiment of the present application, the target neural network includes one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.

It should be noted that the foregoing explanation of the embodiment of the neural network training method is also applicable to the neural network training apparatus of this embodiment, and details are not repeated here.

According to the training device for the neural network, the target neural network is divided into two-input and one-output sub-networks, the weight parameters obtained by carrying out logic operation training on any one sub-network can be used as the initial values of the network, the initial weight parameters of the target neural network are obtained by splicing, the training complexity can be greatly reduced, and the training speed is improved. Therefore, the problems that the reusability of a logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

memory 401, processor 402, and computer programs stored on memory 401 and executable on processor 402.

The processor 402, when executing the program, implements the neural network training method provided in the above-described embodiment.

Further, the electronic device further includes:

a communication interface 403 for communication between the memory 401 and the processor 402.

A memory 401 for storing computer programs executable on the processor 402.

The Memory 401 may include a high-speed RAM (Random Access Memory) Memory, and may also include a non-volatile Memory, such as at least one disk Memory.

If the memory 401, the processor 402 and the communication interface 403 are implemented independently, the communication interface 403, the memory 401 and the processor 402 may be connected to each other through a bus and perform communication with each other. The bus may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.

Optionally, in a specific implementation, if the memory 401, the processor 402, and the communication interface 403 are integrated on a chip, the memory 401, the processor 402, and the communication interface 403 may complete mutual communication through an internal interface.

Processor 402 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

Embodiments of the present application also provide a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the computer program implements the method for training a neural network as above.

In the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a programmable gate array, a field programmable gate array, or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

Claims

1. A training method of a neural network is characterized by comprising the following steps:

determining a target neural network to be trained;

splitting the target neural network into a plurality of sub-networks with the same structure according to a preset splitting strategy, and carrying out logic operation training on any one sub-network to obtain a weight parameter of the sub-network;

and splicing each sub-network into an integral network by using a preset splicing strategy, and realizing corresponding logic operation by using the integral network, and/or training the target neural network by using the weight parameter of the integral network as an initial weight parameter, and realizing corresponding logic operation by using the trained target neural network.

2. The method according to claim 1, wherein the splitting the target neural network into a plurality of sub-networks according to a preset splitting strategy comprises:

splitting the target neural network into a plurality of sub-networks according to a structure in which each sub-network includes a first input node, a second input node, and an output node.

3. The method of claim 2, wherein training the logic operation of any one sub-network to obtain the weight parameter of the sub-network comprises:

inputting training data into the first input node and the second input node, performing and operation, or operation and non-operation one or more combinational logic operation training through the first input node and the second input node until a preset condition is met, stopping training, and obtaining a weight parameter of each sub-network.

4. The method of any one of claims 1-3, wherein the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.

5. An apparatus for training a neural network, comprising:

the determining module is used for determining a target neural network to be trained;

the processing module is used for splitting the target neural network into a plurality of sub-networks with the same structure according to a preset splitting strategy, and performing logic operation training on any one of the sub-networks to obtain a weight parameter of the sub-network;

and the training module is used for splicing each sub-network into an integral network by using a preset splicing strategy, realizing corresponding logic operation by using the integral network, and/or training the target neural network by using the weight parameter of the integral network as an initial weight parameter, and realizing corresponding logic operation by using the trained target neural network.

6. The apparatus of claim 5, wherein the processing module is further configured to:

7. The apparatus of claim 6, wherein the processing module is further configured to:

8. The apparatus of claims 5-7, wherein the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.

9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of training a neural network as claimed in any one of claims 1 to 4.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing the method of training a neural network according to any one of claims 1 to 4.