CN115358391A - Neural network scale expansion method and device, electronic equipment and storage medium - Google Patents

Neural network scale expansion method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115358391A
CN115358391A CN202211078157.7A CN202211078157A CN115358391A CN 115358391 A CN115358391 A CN 115358391A CN 202211078157 A CN202211078157 A CN 202211078157A CN 115358391 A CN115358391 A CN 115358391A
Authority
CN
China
Prior art keywords
network
neural network
scale
training
bit width
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211078157.7A
Other languages
Chinese (zh)
Inventor
邓辰辰
郑纪元
王钰言
林珠
吴嘉敏
范静涛
方璐
戴琼海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211078157.7A priority Critical patent/CN115358391A/en
Publication of CN115358391A publication Critical patent/CN115358391A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/067Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using optical means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/067Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using optical means
    • G06N3/0675Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using optical means using electro-optical, acousto-optical or opto-electronic means

Abstract

The present application relates to the field of neural network technology, and in particular, to a method and an apparatus for scale expansion of a neural network, an electronic device, and a storage medium, where the method includes: determining the actual operation bit width of the target neural network; training a first scale sub-network with the operation bit width smaller than the actual operation bit width of the target neural network by using an arithmetic logic operation training strategy, taking the obtained weight parameters as initial weight parameters of a second scale sub-network with the operation bit width larger than the first scale sub-network, training the second scale sub-network, and gradually expanding the network scale from low bit to high bit according to the operation bit width until the operation bit width of the network scale reaches the actual operation bit width, so as to obtain the initial weight parameters of the target neural network and realize the training of the target neural network. Therefore, the problems that the reusability of an arithmetic logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.

Description

Neural network scale expansion method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of neural network technologies, and in particular, to a method and an apparatus for scale expansion of a neural network, an electronic device, and a storage medium.
Background
The infrastructure of the information age is an integrated circuit electronic chip, and the progress of integrated circuit process technology has been one of the main measures for improving the performance and energy efficiency of a computing chip in the past decades. However, as moore's law and the denuded scaling law slow down or even end, this method is gradually failing and computing the performance and energy efficiency of the chip faces bottlenecks. The light has the advantages of the fastest propagation speed of a physical space and multi-dimensional and multi-scale, and the optical calculation for information processing by adopting photons to replace traditional electrons is expected to construct a new generation of high-performance calculation chip and becomes an important support for a new generation of information industry. Particularly, with the deep development of artificial intelligence algorithm, the mathematical expression of the physical process of limited propagation of light in a medium has high similarity with the deep neural network algorithm, and the adoption of the photoelectric neural network to realize arithmetic logic operation is expected to break through the energy efficiency bottleneck of the traditional electronic chip, and is an important basis for realizing the photoelectric computer.
In the related technology, the corresponding arithmetic logic operation can be realized by utilizing a neural network through training, and the essence of the neural network training is the tuning of the network weight parameters. The training process consists of two processes, forward propagation of the signal and backward propagation of the error. Generally, initialization is realized by carrying out all-0 or random initialization on the parameters of the whole network when training is started, and the method is used for calculating the forward propagation of a first signal and has universality for intelligent reasoning tasks such as image recognition. However, when the training of the neural network aims at implementing the arithmetic logic operation, especially when the input layer neurons are large in scale, that is, the bit width of the arithmetic logic operation is wide, the method for initializing the overall network parameters does not fully exploit the reusability of the arithmetic logic operation task, and has high training complexity, long training time and low efficiency.
Disclosure of Invention
The application provides a scale expansion method and device of a neural network, electronic equipment and a storage medium, and aims to solve the problems that reusability of an arithmetic logic operation task cannot be fully mined, training complexity is high, training time is long, efficiency is low and the like by using an integral network parameter initialization method in the related art.
An embodiment of a first aspect of the present application provides a scale expansion method for a neural network, including the following steps: determining the actual operation bit width of the target neural network; training a first scale sub-network with the operation bit width smaller than the actual operation bit width in the target neural network by using a preset arithmetic logic operation training strategy to obtain a weight parameter of the first scale sub-network; and taking the weight parameter of the first scale sub-network as an initial weight parameter of a second scale sub-network with the operation bit width larger than that of the first scale sub-network, training the second scale sub-network based on the initial weight parameter of the second scale sub-network, expanding the network scale step by step according to the operation bit width from a low bit to a high bit until the operation bit width of the expanded network scale reaches the actual operation bit width, obtaining the initial weight parameter of the target neural network, and realizing the training of the target neural network by using the initial weight parameter of the target neural network.
Optionally, in an embodiment of the present application, the training the second scale sub-network based on the initial weight parameter of the second scale sub-network, with the weight parameter of the first scale sub-network as the initial weight parameter of the second scale sub-network having a larger operation bit width than the first scale sub-network, includes: and taking the weight parameter of the first scale sub-network as the weight parameter of the neural network corresponding to the low-bit operation part in the second scale sub-network, and training the weight parameter of the neural network corresponding to the increased bit width part in the second scale sub-network to obtain the weight parameter of the second scale sub-network.
Optionally, in an embodiment of the present application, after the training of the target neural network is implemented by using the initial weight parameters of the target neural network, the method includes: and realizing preset arithmetic logic operation by using the trained target neural network, wherein the preset arithmetic logic operation comprises one or more of AND, OR, NOT and combinational logic operation, addition, subtraction and multiplication.
Optionally, in one embodiment of the present application, the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.
An embodiment of a second aspect of the present application provides a scale expansion apparatus for a neural network, including: the determining module is used for determining the actual operation bit width of the target neural network; the training module is used for training a first scale sub-network with the operation bit width smaller than the actual operation bit width in the target neural network by using a preset arithmetic logic operation training strategy to obtain a weight parameter of the first scale sub-network; and the expanding module is used for taking the weight parameter of the first scale sub-network as an initial weight parameter of a second scale sub-network with the operation bit width larger than that of the first scale sub-network, training the second scale sub-network based on the initial weight parameter of the second scale sub-network, expanding the network scale step by step according to the operation bit width from a low bit to a high bit until the expanded operation bit width of the network scale reaches the actual operation bit width, obtaining the initial weight parameter of the target neural network, and realizing the training of the target neural network by using the initial weight parameter of the target neural network.
Optionally, in an embodiment of the present application, the expansion module is further configured to use the weight parameter of the first scale sub-network as the weight parameter of the neural network corresponding to the lower operation part in the second scale sub-network, and train the weight parameter of the neural network corresponding to the part with the increased bit width in the second scale sub-network to obtain the weight parameter of the second scale sub-network.
Optionally, in an embodiment of the present application, the method further includes: and the processing module is used for realizing preset arithmetic logic operation by using the trained target neural network after the training of the target neural network is realized by using the initial weight parameters of the target neural network, wherein the preset arithmetic logic operation comprises one or more of AND, OR, NOT and combinational logic operation, addition, subtraction and multiplication.
Optionally, in one embodiment of the present application, the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of scaling a neural network as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the program is executed by a processor, and is used to implement the method for scaling up a neural network according to the foregoing embodiments.
Therefore, the application has at least the following beneficial effects:
by determining the actual operation bit width of the target neural network, starting from a small-scale neural network input with low bit width as an initial value for large-scale neural network training, and fixing the initial value in the training process, only the weight value of the increased bit width part of the neural network needs to be trained until the expanded network scale operation bit width reaches the actual operation bit width, so as to obtain the initial weight parameter of the target neural network, and the training of the target neural network is realized by using the initial weight parameter of the target neural network. Therefore, the problems that the reusability of an arithmetic logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a method for scaling up a neural network according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of training of a neural network extending from a low bit to a high bit according to an embodiment of the present application;
fig. 3 is a block diagram illustrating a scale expansion apparatus for a neural network according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Description of reference numerals: the system comprises a determination module-100, a training module-200, an expansion module-300, a memory-401, a processor-402 and a communication interface-403.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
A method, an apparatus, an electronic device, and a storage medium for scaling up a neural network according to an embodiment of the present application are described below with reference to the drawings. In order to solve the problems mentioned in the background art, the application provides a scale expansion method of a neural network, in the method, the actual operation bit width of a target neural network is determined, a small-scale neural network input with low bit width is used as an initial value for large-scale neural network training, the initial value is fixed in the training process, only the weight value of the increased bit width part of the neural network needs to be trained until the operation bit width of the expanded network scale reaches the actual operation bit width, an initial weight parameter of the target neural network is obtained, and the initial weight parameter of the target neural network is utilized to realize the training of the target neural network. Therefore, the problems that the reusability of an arithmetic logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.
Specifically, fig. 1 is a schematic flow chart of a method for expanding the scale of a neural network provided in an embodiment of the present application.
As shown in fig. 1, the method for expanding the size of the neural network includes the following steps:
in step S101, an actual operation bit width of the target neural network is determined.
According to the embodiment of the application, part of the network weight parameters of the high bit width arithmetic logic operation can be used as the initial values of the network training by multiplexing the network weight parameters of the low bit width arithmetic logic operation, so that the training complexity is reduced, and the training time is shortened, therefore. Before the arithmetic logic operation training is carried out, the practical operation bit width of the target neural network can be determined firstly in the embodiment of the application. The target neural network includes, but is not limited to, one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.
In step S102, a preset arithmetic logic operation training strategy is used to train a first scale sub-network with a smaller operation bit width than an actual operation bit width of the target neural network, so as to obtain a weight parameter of the first scale sub-network.
It can be understood that, when the training of the neural network aims to implement arithmetic logic operation, especially when the scale of the neurons of the input layer is large, that is, the bit width of the arithmetic logic operation is wide, in the embodiment of the present application, the network may be split into sub-networks with higher training iteration speed, and the sub-network of the first scale with the target neural network operation bit width smaller than the actual operation bit width is trained to obtain the corresponding weight parameter. By training the sub-network, the weight parameters obtained by the training of the sub-network are used as initial values of the network, and the number of neurons in an input layer is increased step by step, so that the training speed is greatly increased.
In step S103, the weight parameter of the first scale sub-network is used as an initial weight parameter of a second scale sub-network having a larger operation bit width than the first scale sub-network, the second scale sub-network is trained based on the initial weight parameter of the second scale sub-network, and the network scale is gradually expanded from a low bit to a high bit according to the operation bit width until the expanded operation bit width of the network scale reaches the actual operation bit width, so as to obtain an initial weight parameter of the target neural network, and the initial weight parameter of the target neural network is used to implement the training of the target neural network.
It can be understood that the neural network can be used for realizing corresponding arithmetic logic operation through training, the essence of the neural network training is the adjustment and optimization of the network weight parameters, the neural network of the arithmetic logic operation in the embodiment of the application is different from the traditional neural network based on the intelligent reasoning task, and the high-order input of the neural network does not influence the low-order output. Therefore, the initial training parameters of the large-scale neural network can be used as initial values of the large-scale neural network training from a small-scale neural network with low bit width input, are fixed in the training process, and only need to train the weight values of the added bit width part of the neural network until the initial weight parameters of the target neural network are obtained, so that the training of the target neural network is realized, the training complexity is greatly reduced, the training time is shortened, the network training efficiency is improved, and the arithmetic logic operation with high energy efficiency is realized.
In one embodiment of the present application, the training of the second scale sub-network based on the initial weight parameters of the second scale sub-network, which has a larger bit width than the first scale sub-network, using the weight parameters of the first scale sub-network as the initial weight parameters of the second scale sub-network, comprises: and taking the weight parameter of the first scale sub-network as the weight parameter of the neural network corresponding to the low-bit operation part in the second scale sub-network, and training the weight parameter of the neural network corresponding to the increased bit width part in the second scale sub-network to obtain the weight parameter of the second scale sub-network.
As shown in FIG. 2, training of a 2-bit full adder, the output of the full adderIn is B 2 B 1 And A 2 A 1 And carry input C in The output is S 2 S 1 And carry out C out . The training mechanism of the network can adopt a 1-bit full adder with a small training scale, and the weight parameters of the solid line part neurons in the hidden layer can be quickly obtained due to the small network scale. After the weight parameters of the solid line part neurons in the hidden layer of the 1-bit full adder are obtained, the parameters can be used as the weight parameters of the hidden layer of the lower half part of the neural network of the 2-bit full adder and the numerical values are fixed, and the weight parameters of the virtual line part neurons in the hidden layer only need to be trained. Also, the neural network weight parameters with 3 bits and higher bit width can be trained quickly, and the efficiency is higher compared with the method of directly training the whole neural network from random weight initial values.
In one embodiment of the present application, after the training of the target neural network is achieved by using the initial weight parameters of the target neural network, the method includes: and realizing preset arithmetic logic operation by using the trained target neural network, wherein the preset arithmetic logic operation comprises one or more of AND, OR, NOT and combinational logic operation, addition, subtraction and multiplication.
It can be understood that, when the input layer neurons are large in scale, that is, the bit width of the arithmetic logic operation is wide, the embodiment of the present application may implement the arithmetic logic operation by using the trained neural network, so as to fully exploit the reusability of the arithmetic logic operation task itself. Where arithmetic logic operations include, but are not limited to, addition, subtraction, multiplication, and, or, not, and combinations of basic operations.
According to the scale expanding method of the neural network provided by the embodiment of the application, the actual operation bit width of the target neural network is determined, the small-scale neural network input with low bit width is used as an initial value for large-scale neural network training, the initial value is fixed in the training process, only the weight value of the increased bit width part of the neural network needs to be trained until the operation bit width of the expanded network scale reaches the actual operation bit width, the initial weight parameter of the target neural network is obtained, and the initial weight parameter of the target neural network is utilized to realize the training of the target neural network. Therefore, the problems that the reusability of an arithmetic logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.
Next, a scale expansion apparatus of a neural network proposed according to an embodiment of the present application will be described with reference to the drawings.
Fig. 3 is a block diagram illustrating a scale expansion apparatus for a neural network according to an embodiment of the present disclosure.
As shown in fig. 3, the apparatus 10 for expanding the size of a neural network includes: a determination module 100, a training module 200, and an expansion module 300.
The determining module 100 is configured to determine an actual operation bit width of the target neural network; the training module 200 is used for training a first scale sub-network with a target neural network operation bit width smaller than an actual operation bit width by using a preset arithmetic logic operation training strategy to obtain a weight parameter of the first scale sub-network; the expanding module 300 is configured to use the weight parameter of the first scale sub-network as an initial weight parameter of a second scale sub-network having an operation bit width larger than that of the first scale sub-network, train the second scale sub-network based on the initial weight parameter of the second scale sub-network, and expand the network scale step by step according to the operation bit width from a low bit to a high bit until the operation bit width of the expanded network scale reaches the actual operation bit width, to obtain an initial weight parameter of the target neural network, and implement training of the target neural network by using the initial weight parameter of the target neural network.
In an embodiment of the application, the expanding module 300 is further configured to use the weight parameter of the first scale sub-network as a weight parameter of a neural network corresponding to the low-bit operation part in the second scale sub-network, and train the weight parameter of the neural network corresponding to the increased-bit width part in the second scale sub-network to obtain the weight parameter of the second scale sub-network.
In an embodiment of the present application, the apparatus 10 of the embodiment of the present application further includes: and the processing module is used for realizing the preset arithmetic logic operation by using the trained target neural network after the training of the target neural network is realized by using the initial weight parameters of the target neural network, wherein the preset arithmetic logic operation comprises one or more of AND, OR, NOT and combinational logic operation, addition, subtraction and multiplication.
In one embodiment of the present application, the target neural network includes one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.
It should be noted that the foregoing explanation on the embodiment of the scale expanding method for the neural network is also applicable to the scale expanding apparatus for the neural network of this embodiment, and details are not described here.
According to the scale expanding device of the neural network, provided by the embodiment of the application, the actual operation bit width of the target neural network is determined, the small-scale neural network input with the low bit width is used as an initial value for large-scale neural network training, the initial value is fixed in the training process, only the weight value of the increased bit width part of the neural network needs to be trained until the operation bit width of the expanded network scale reaches the actual operation bit width, the initial weight parameter of the target neural network is obtained, and the training of the target neural network is realized by using the initial weight parameter of the target neural network. Therefore, the problems that the reusability of an arithmetic logic operation task cannot be fully mined, the training complexity is high, the training time is long, the efficiency is low and the like by using the method for initializing the whole network parameters in the related art are solved.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 401, processor 402, and computer programs stored on memory 401 and operable on processor 402.
The processor 402, when executing the program, implements the method of scale expansion of the neural network provided in the above-described embodiments.
Further, the electronic device further includes:
a communication interface 403 for communication between the memory 401 and the processor 402.
A memory 401 for storing computer programs executable on the processor 402.
The Memory 401 may include a high-speed RAM (Random Access Memory) Memory, and may also include a non-volatile Memory, such as at least one disk Memory.
If the memory 401, the processor 402 and the communication interface 403 are implemented independently, the communication interface 403, the memory 401 and the processor 402 may be connected to each other through a bus and perform communication with each other. The bus may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
Optionally, in a specific implementation, if the memory 401, the processor 402, and the communication interface 403 are integrated on a chip, the memory 401, the processor 402, and the communication interface 403 may complete mutual communication through an internal interface.
Processor 402 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
Embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, where the computer program, when executed by a processor, implements the method for scaling up a neural network as above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps in a customized arithmetic logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable gate arrays, field programmable gate arrays, and the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are exemplary and should not be construed as limiting the present application and that changes, modifications, substitutions and alterations in the above embodiments may be made by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1. A scale expansion method of a neural network is characterized by comprising the following steps:
determining the actual operation bit width of the target neural network;
training a first scale sub-network with the operation bit width smaller than the actual operation bit width in the target neural network by using a preset arithmetic logic operation training strategy to obtain a weight parameter of the first scale sub-network;
and taking the weight parameter of the first scale sub-network as an initial weight parameter of a second scale sub-network with the operation bit width larger than that of the first scale sub-network, training the second scale sub-network based on the initial weight parameter of the second scale sub-network, expanding the network scale step by step according to the operation bit width from a low bit to a high bit until the operation bit width of the expanded network scale reaches the actual operation bit width, obtaining the initial weight parameter of the target neural network, and realizing the training of the target neural network by using the initial weight parameter of the target neural network.
2. The method of claim 1, wherein said using the weight parameter of the first scale sub-network as an initial weight parameter of a second scale sub-network having a larger bit width than the first scale sub-network and training the second scale sub-network based on the initial weight parameter of the second scale sub-network comprises:
and taking the weight parameter of the first scale sub-network as the weight parameter of the neural network corresponding to the low-bit operation part in the second scale sub-network, and training the weight parameter of the neural network corresponding to the increased bit width part in the second scale sub-network to obtain the weight parameter of the second scale sub-network.
3. The method of claim 1, after the training of the target neural network is achieved using initial weight parameters of the target neural network, comprising:
and realizing preset arithmetic logic operation by using the trained target neural network, wherein the preset arithmetic logic operation comprises one or more of AND, OR, NOT and combinational logic operation, addition, subtraction and multiplication.
4. The method of any one of claims 1-3, wherein the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.
5. An apparatus for scaling up a neural network, comprising:
the determining module is used for determining the actual operation bit width of the target neural network;
the training module is used for training a first scale sub-network with the target neural network operation bit width smaller than the actual operation bit width by utilizing a preset arithmetic logic operation training strategy to obtain a weight parameter of the first scale sub-network;
and the expanding module is used for taking the weight parameter of the first scale sub-network as an initial weight parameter of a second scale sub-network with the operation bit width larger than that of the first scale sub-network, training the second scale sub-network based on the initial weight parameter of the second scale sub-network, expanding the network scale step by step according to the operation bit width from low bit to high bit until the expanded operation bit width of the network scale reaches the actual operation bit width, obtaining the initial weight parameter of the target neural network, and realizing the training of the target neural network by using the initial weight parameter of the target neural network.
6. The apparatus of claim 5, wherein the expansion module is further configured to:
and taking the weight parameter of the first scale sub-network as the weight parameter of the neural network corresponding to the low-bit operation part in the second scale sub-network, and training the weight parameter of the neural network corresponding to the increased bit width part in the second scale sub-network to obtain the weight parameter of the second scale sub-network.
7. The apparatus of claim 5, further comprising: and the processing module is used for realizing preset arithmetic logic operation by using the trained target neural network after the training of the target neural network is realized by using the initial weight parameters of the target neural network, wherein the preset arithmetic logic operation comprises one or more of AND, OR, NOT and combinational logic operation, addition, subtraction and multiplication.
8. The apparatus of any one of claims 5-7, wherein the target neural network comprises one or more of an optical neural network, an electrical neural network, and an opto-electrical hybrid neural network.
9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of scale up of a neural network as claimed in any one of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing the method of scale-up of a neural network as claimed in any one of claims 1 to 4.
CN202211078157.7A 2022-09-05 2022-09-05 Neural network scale expansion method and device, electronic equipment and storage medium Pending CN115358391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211078157.7A CN115358391A (en) 2022-09-05 2022-09-05 Neural network scale expansion method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211078157.7A CN115358391A (en) 2022-09-05 2022-09-05 Neural network scale expansion method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115358391A true CN115358391A (en) 2022-11-18

Family

ID=84007279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211078157.7A Pending CN115358391A (en) 2022-09-05 2022-09-05 Neural network scale expansion method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115358391A (en)

Similar Documents

Publication Publication Date Title
CN111033529B (en) Architecture optimization training of neural networks
CN109063825B (en) Convolutional neural network accelerator
JP7379821B2 (en) Inference processing device and inference processing method
US20190130295A1 (en) Information Processing Apparatus and Information Processing Method
US20240005135A1 (en) Accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits
JP2019148972A (en) Arithmetic processing device, information processing device, information processing method, and program
JP2019139338A (en) Information processor, information processing method and program
US11620105B2 (en) Hybrid floating point representation for deep learning acceleration
CN113590195B (en) Memory calculation integrated DRAM computing unit supporting floating point format multiply-add
US11561795B2 (en) Accumulating data values and storing in first and second storage devices
JP2021517301A (en) Stochastic rounding logic
JP2009048367A (en) Circuit design method, and integrated circuit manufactured by the method
CN115358390A (en) Neural network training method and device, electronic equipment and storage medium
US11551087B2 (en) Information processor, information processing method, and storage medium
CN115358391A (en) Neural network scale expansion method and device, electronic equipment and storage medium
Zhan et al. Field programmable gate array‐based all‐layer accelerator with quantization neural networks for sustainable cyber‐physical systems
CN115879530A (en) Method for optimizing array structure of RRAM (resistive random access memory) memory computing system
US20220391744A1 (en) Systems and methods for embedding graphs using systolic algorithms
CN114662681A (en) YOLO algorithm-oriented general hardware accelerator system platform capable of being deployed rapidly
JP6961950B2 (en) Storage method, storage device and storage program
CN117077726B (en) Method, device and medium for generating in-memory computing neural network model
US20240037412A1 (en) Neural network generation device, neural network control method, and software generation program
Yan et al. S-GAT: Accelerating Graph Attention Networks Inference on FPGA Platform with Shift Operation
KR102412872B1 (en) Processing element, method of operation thereof, and accelerator including the same
CN116523015A (en) Optical neural network training method, device and equipment for process error robustness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination