CN112529162B - Neural network model updating method, device, equipment and storage medium - Google Patents

Neural network model updating method, device, equipment and storage medium Download PDF

Info

Publication number
CN112529162B
CN112529162B CN202011481689.6A CN202011481689A CN112529162B CN 112529162 B CN112529162 B CN 112529162B CN 202011481689 A CN202011481689 A CN 202011481689A CN 112529162 B CN112529162 B CN 112529162B
Authority
CN
China
Prior art keywords
model
neural network
network module
student
student model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011481689.6A
Other languages
Chinese (zh)
Other versions
CN112529162A (en
Inventor
杨馥魁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011481689.6A priority Critical patent/CN112529162B/en
Publication of CN112529162A publication Critical patent/CN112529162A/en
Application granted granted Critical
Publication of CN112529162B publication Critical patent/CN112529162B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses a neural network model updating method, device, equipment and storage medium, and relates to the field of computer vision and deep learning. The specific implementation scheme is a method for updating a neural network model, which comprises the following steps: acquiring an input image; adding a neural network module to the trained student model; respectively extracting features of an input image by using a student model added with a neural network module and the teacher model; and adjusting parameters of the neural network module added in the student model based on the difference between the feature extraction result of the student model and the feature extraction result of the teacher model to obtain an updated model of the student model.

Description

Neural network model updating method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to computer vision and deep learning, and more particularly, to a method, apparatus, device, and medium for updating a neural network model.
Background
Typically, neural network models are trained by adjusting model parameters, which are fixed once training is complete. This results in the neural network model being unable to be adapted during use if it is problematic.
Disclosure of Invention
Provided are a neural network model updating method, device, equipment and storage medium.
According to a first aspect, there is provided a method for updating a neural network model, including:
acquiring an input image;
adding a neural network module to the trained student model;
respectively extracting features of an input image by using a student model added with a neural network module and the teacher model; and
based on the difference between the feature extraction result of the student model and the feature extraction result of the teacher model, parameters of the neural network module added in the student model are adjusted to obtain an updated model of the student model.
In some embodiments, the method further comprises:
acquiring a currently used neural network model as a trained student model before adding the neural network module to the trained student model; and
after obtaining an updated model of the student model, the currently used neural network model is replaced with the updated model.
In some embodiments, the method further comprises:
after obtaining an updated model of the student model, testing the feature extraction accuracy of the obtained updated model by using preset test data;
the step of replacing the currently used neural network model with the updated model is performed in case the feature extraction accuracy of the updated model is higher than the feature extraction accuracy of the currently used neural network model.
In some embodiments, the method further comprises:
after obtaining the updated model of the student model, the step of acquiring the input image is returned.
In some embodiments, the neural network module includes one or more neurons, and the adding the neural network module to the student model includes:
the neural network module is added to at least one convolutional layer of a plurality of convolutional layers of a student model.
In some embodiments, in the case that the number of neurons that can be added in the at least one convolution layer is smaller than the number of neurons included in a neural network module, another at least one convolution layer is selected from the plurality of convolution layers according to a preset rule or randomly, and the neural network module is added to the another at least one convolution layer.
In some embodiments, the neural network module includes one or more convolution layers, and the adding the neural network module to the student model includes:
the neural network module is added to the student model, wherein the addition of the neural network module is stopped in the event that the number of addable convolutional layers in the student model is less than the number of convolutional layers contained in the neural network module.
In some embodiments, the input image comprises: a plurality of historical input images processed by the neural network model currently in use.
In some embodiments, the differences between the feature extraction results of the student model and the feature extraction results of the teacher model comprise: distillation loss function value between the feature extraction result of the student model and the feature extraction result of the teacher model.
In some embodiments, the student model is a mobilet series neural network model.
In some embodiments, the teacher model is a resnet152 series neural network model.
According to a second aspect, there is provided an updating apparatus of a neural network model, including:
the acquisition module is used for acquiring an input image;
the expansion module is used for adding a neural network module into the trained student model;
the extraction module is used for extracting the characteristics of the input image by using the student model added with the neural network module and the teacher model respectively; and
and the training module is used for adjusting parameters of the neural network module added in the student model based on the difference between the feature extraction result of the student model and the feature extraction result of the teacher model to obtain an updated model of the student model.
According to a third aspect, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the above method.
According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above method.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a method of updating a neural network model according to an embodiment of the present application;
FIG. 2 is a flow chart of a method of updating a neural network model according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a method of updating a neural network model according to another embodiment of the present application;
FIG. 4A is a flowchart of an example method of adding a neural network module, according to an embodiment of the present application;
FIG. 4B is a schematic diagram of the method of adding a neural network module of FIG. 4A;
FIG. 5A is a flowchart of another example method of adding a neural network module, according to an embodiment of the present application;
FIG. 5B is a schematic diagram of the method of adding a neural network module of FIG. 5A;
FIG. 6A is a flowchart of yet another example method of adding a neural network module, according to an embodiment of the present application;
FIG. 6B is a schematic diagram of the method of adding a neural network module of FIG. 6A;
FIG. 7 is a block diagram of an updating apparatus of a neural network model according to an embodiment of the present application;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a method 100 of updating a neural network model according to an embodiment of the present application.
In step S110, an input image and a trained student model are acquired.
In step S120, adding a neural network module to the trained student model;
in step S130, feature extraction is performed on the input image using the student model and the teacher model to which the neural network module is added, respectively.
In step S140, parameters of the neural network module added in the student model are adjusted based on the difference between the feature extraction result of the student model and the feature extraction result of the teacher model, so as to obtain an updated model of the student model.
Fig. 2 is a flowchart of a method of updating a neural network model according to another embodiment of the present application.
In step S210, the neural network model currently in use is acquired as a trained student model.
For example, the client may perform feature extraction on the input image using a neural network model trained in advance, and the server may acquire the neural network model currently used by the client as a student model. The neural network model may be trained prior to use, for example, a mobilet series neural network model may be used as an initial model that may be trained using the sample image and the feature data of the sample image, resulting in a trained neural network model. The initial model may also be trained in some embodiments by way of distillation training, as the invention is not limited in this regard. The student model may include a plurality of convolution layers, each convolution layer including a plurality of neurons.
In step S220, step S230 may be performed in response to receiving a set of history input images processed by the currently used neural network model, and if not, step S220 may be returned to continue waiting for the history input images.
For example, each time a client processes a batch of input images, the batch of input images may be provided to a server, which may update a neural network model used by the client with the batch of input images. Of course, embodiments of the present disclosure are not limited in this regard, in some embodiments the input images may be periodically acquired from the client, and in other embodiments the input images may be sent to the server by the client each time a batch of input images is processed.
In step S230, a neural network module is added to the student model acquired in step S210.
The neural network module may include one or more neurons. Or may include one or more convolutional layers. The neural network module may be added to or between layers of the student model in a variety of ways, as will be described in further detail below. After the neural network module is added, the student model to which the neural network module is added may be subjected to distillation training, which will be described in detail below with reference to steps S240 to S260.
In step S240, feature extraction is performed on the input image obtained in step S220 using the student model and the teacher model to which the neural network module is added, respectively. The teacher model may be a complex model trained in a server, with layers up to 100 or more, with a higher generalization capability than the student model. For example, the teacher model may be a resnet152 series neural network model.
In the step, a student model can be used for respectively extracting characteristics of a plurality of input images to obtain a first group of characteristic data; and respectively carrying out feature extraction on the plurality of input images by using a teacher model to obtain a second group of feature data. The feature data obtained using the student model and the teacher model may be both 128-dimensional feature data.
In step S250, a distillation loss function value between the feature extraction result of the student model and the feature extraction result of the teacher model is calculated.
The distillation loss function value between the feature extraction result of the student model and the feature extraction result of the teacher model may be calculated in accordance with a distillation loss function set in advance.
In step S260, parameters of the neural network module added in step S230 are adjusted according to the distillation loss function value, resulting in an updated model.
In step S270, the feature extraction accuracy of the obtained updated model is tested by using the preset test data.
The preset test data may include a set of test images and feature extraction results for the test images. The update model may be used to perform feature extraction on each test image, and the feature extraction results of each test image may be compared with corresponding feature extraction results in the test data, thereby obtaining a comparison result for each test image. The feature extraction accuracy of the update model may be calculated from the comparison results of all the test images.
In step S280, the feature extraction accuracy of the updated model is compared with that of the currently used neural network model, if the former is higher than the latter, step S290 is performed to perform model replacement, otherwise, step S210 is returned to acquire the student model again. Of course, the embodiments of the present disclosure are not limited thereto, and for example, in the case where the feature extraction accuracy of the updated model is lower than that of the currently used neural network model, the neural network module added to the student module may be deleted and returned to step S220 to wait for the next set of input images. The feature extraction accuracy of the currently used neural network model may be obtained in a similar manner to step S207, and will not be described here.
In step S290, the currently used neural network model is replaced with the updated model obtained in step S260. In some embodiments, after step S290 is performed, step S220 may also be returned to wait for the arrival of the next set of history input images.
For example, in the case where the server determines that the feature extraction accuracy of the update model is higher than that of the neural network model currently used in the client, the update model is considered to be superior to that of the neural network model currently used in the client, so that the former is substituted for the latter, thereby realizing the update of the neural network model.
Embodiments of the present disclosure enable automatically obtaining updated structures of a model during use of the model by adding neural network modules to a trained student model and distillation training the added modules with a teacher model.
According to the embodiment of the disclosure, the currently used neural network model is obtained for expansion, and the expanded model is trained to obtain the updated model to replace the currently used model, so that the neural network model is updated while the use of the neural network model is not affected.
Embodiments of the present disclosure provide a variety of ways to implement the evolution of neural network models, including but not limited to expanding in neurons, in models, and in convolutional layers. The diversified expansion mode is beneficial to improving the updating efficiency of the model.
According to the embodiment of the disclosure, the accuracy of the update model in the aspect of feature extraction can be ensured to be higher than that of the currently used neural network model by performing accuracy test on the update model, so that optimized update is realized.
According to the embodiment of the disclosure, the distillation training is performed on the expansion model by utilizing the historical input image of the currently used network model, so that the whole updating process can be automatically completed by a computer without additionally inputting sample data, and the model updating efficiency is improved.
Fig. 3 is a schematic diagram of a method for updating a neural network model according to another embodiment of the present application.
As shown in fig. 3, the student model 310 has a neural network structure including a plurality of neurons (shown as open circles in fig. 3) connected to each other.
After the first set of input images F1 is acquired, a neural network module (one neuron shown as a filled circle in FIG. 3) may be added to the student model 310. The first set of input images F1 are then input to the student model 310 and the teacher model 320, respectively, for feature extraction. The distillation loss calculation is performed using the feature extraction result from the student model 310 and the feature extraction result from the teacher model 320 as inputs to the distillation loss function 330, thereby obtaining a distillation loss function value. Parameters of the added neural network module (as shown by the dashed line in fig. 3) may be adjusted based on the distillation loss function values, resulting in an updated model. The updated model, if passing the accuracy test, can be used to replace the neural network model currently in use in the client. If the updated model fails the test, then
After the second set of input images F2 is acquired, the addition of neural network modules to the student model 310 may continue, and the training operation described above may be repeated, followed by the acquisition of a third set of input images, and so on, until the student model 310 is filled.
Fig. 4A is a flowchart of an example method of adding a neural network module, according to an embodiment of the present application. In this example, the neural network module includes a single neuron.
In step S431, a convolutional layer to which neurons are to be added is determined among a plurality of convolutional layers included in the student model. The student model may include an input layer, an output layer, and a plurality of convolution layers therebetween, among which the convolution layers to which neurons are to be added may be determined. The manner of determination may be varied. The convolutional layer to which the neural network module is to be added may be determined, for example, according to a preset rule. For example, according to a rule that the number of layers is from low to high, a first layer of convolution layer (i.e., a second layer of the entire network) may be selected as the convolution layer to which the neural network module is to be added when the first layer of convolution layer is received, a second layer of convolution layer is sequentially selected as the convolution layer to which the neural network module is to be added after the first layer of convolution layer is filled, and so on. In other embodiments, one of the plurality of convolutional layers may be randomly selected as the convolutional layer to which the neural network is to be added.
In step S432, it is determined whether adding a neural network module would result in the number of neurons contained in the convolutional layer exceeding the upper limit of the convolutional layer. For example, an upper limit of the number of neurons may be set in advance for each convolution layer, if the upper limit is not exceeded (for example, the number of neurons that can be added in the current convolution layer is greater than or equal to the number of neurons included in the neural network module), step S433 is performed, otherwise step S434 is performed.
In step S433, a neural network module (i.e., one neuron) is added to the convolutional layer determined in step S431. The specific position added to the convolution layer can be determined according to a preset rule or can be determined randomly.
In step S434, the next layer of the current convolutional layer is selected as the convolutional layer to which the neural network module is to be added, and step S432 is returned to determine whether the next layer has enough room to add neurons. The "next layer" may be determined in the above-described predetermined order, for example, the second layer is the next layer to the first layer; it is also possible to let a randomly determined, e.g. randomly select one of the plurality of convolutional layers as the next layer to the current convolutional layer.
Fig. 4B is a schematic diagram of a method of adding a neural network module of fig. 4A. As shown in fig. 4B, the student model includes a plurality of layers L41, L42, L43, and L44, each of which includes a plurality of neurons (as shown by open circles in fig. 4B). When a neuron is first added, the first layer of convolutional layer L42 (i.e., the second layer of the entire network) may be considered as the convolutional layer to which the neuron is to be added. The next time a neuron is added, the second layer convolution layer L43 (i.e., the third layer of the entire network) may be taken as the convolution layer to which the neuron is to be added, and so on. After adding the neuron, the neuron establishes a connection with other layers in the student model (as indicated by the arrow connected with the solid circle in fig. 4B), thereby completing the topological extension of the model. The extended student model may thereafter be subjected to the distillation training described above.
Fig. 5A is a flowchart of another example method of adding a neural network module, according to an embodiment of the present application. In this example, the neural network module includes a plurality of neurons connected in a multi-layer structure.
In step S531, a set of convolutional layers to which the neural network module is to be added is determined among a plurality of convolutional layers included in the student model. For example, similar to the above, the plurality of convolutional layers to which the neural network module is to be added may be determined according to a preset rule or randomly.
In step S532, it is determined whether adding a neural network module would result in at least one of the set of convolutional layers exceeding an upper limit, and if not (e.g., the number of addable neurons in each of the set of convolutional layers is greater than the number of neurons in the neural network module to be added to that layer), step S533 is performed, and if so, step S534 is performed.
In step S533, the neural network module is added to the set of convolutional layers determined in step S531. The specific position added to the convolution layer can be determined according to a preset rule or can be determined randomly.
In step S534, the next convolutional layer of the current set of convolutional layers is selected as the convolutional layer to which the neural network module is to be added, and step S532 is returned to determine whether the next set of convolutional layers has enough room to add neurons. Similarly, the so-called "next set of convolutional layers" herein may be determined according to a preset rule or randomly.
Fig. 5B is a schematic diagram of a method of adding a neural network module of fig. 5A. As shown in fig. 5B, the student model includes a plurality of layers L51, L52, L53, and L54, each of which includes a plurality of neurons (as shown by open circles in fig. 5B). The neural network module to be added includes a plurality of neurons connected in a two-layer structure (as shown by solid circles connected to each other in fig. 5B). In fig. 5B, the first layer convolution layer L52 and the second layer convolution layer L53 are taken as convolution layers to which neurons are to be added, the first two neurons in the neural network module are added to the first layer convolution layer L52, and the second two neurons in the neural network module are added to the second layer convolution layer L53. Of course, this is merely an example, and the structure and addition of the neural network module may be set as desired.
Fig. 6A is a flowchart of yet another example method of adding a neural network module, according to an embodiment of the present application. In this example, the neural network module includes one or more convolutional layers.
In step S631, the position to which the convolution layer is to be added is determined in the student model. For example, in a manner similar to that described above, it may be determined according to a preset rule or randomly which two layers of the student model to add the neural network module to.
In step S632, it is determined whether adding the neural network module will cause the number of convolution layers of the student model to exceed the upper limit, and if not (for example, the number of the convolution layers that can be added in the student model is greater than or equal to the number of the convolution layers included in the neural network module), step S633 is performed, and if so, step S634 is performed.
In step S633, the neural network module is added to the position determined in step S631. The specific position added to the student model can be determined according to a preset rule or can be determined randomly.
In step S634, the addition of the neural network model is ended. At this point, the neural network model may be considered to have been extended to the maximum, and the entire update flow may be ended.
Fig. 6B is a schematic diagram of a method of adding a neural network module of fig. 6A. As shown in fig. 6B, the student model includes a plurality of layers L61, L62, L63, and L65, each of which includes a plurality of neurons (as shown by open circles in fig. 6B). The neural network module to be added includes a convolutional layer (as shown by the filled circles connected to each other in fig. 5B). In fig. 6B, a neural network module is added to the upper layer of the second layer convolution layer L63 (i.e., the fourth layer of the entire network), i.e., between the layers L64 and L65, thereby forming the third layer convolution layer L64 of the student model (i.e., the fourth layer of the entire network).
Fig. 7 is a block diagram of an updating apparatus of a neural network model according to an embodiment of the present application. As shown in fig. 7, the updating apparatus 700 of the neural network model includes an acquisition module 710, an expansion module 720, an extraction module 730, and a training module 740. The acquisition module 710 is used for acquiring an input image. The expansion module 720 is used to add neural network modules to the trained student model. The extraction module 730 is used for extracting features of the input image by using the student model and the teacher model to which the neural network module is added, respectively. The training module 740 is configured to adjust parameters of the neural network module added in the student model based on a difference between the feature extraction result of the student model and the feature extraction result of the teacher model, so as to obtain an updated model of the student model.
According to embodiments of the present application, there is also provided an electronic device, a readable storage medium and a computer program product.
Fig. 8 is a block diagram of an electronic device 800 according to an embodiment of the present application. Electronic device 800 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic device 800 may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 8, the electronic device 800 includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 801 is illustrated in fig. 8.
Memory 802 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for updating the neural network model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of updating the neural network model provided by the present application.
The memory 802 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 710, the expansion module 720, the extraction module 730, and the training module 740 shown in fig. 7) corresponding to the method for updating a neural network model in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing, that is, implements the update method of the neural network model in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 802.
Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by the use of the electronic device 800, and the like. In addition, memory 802 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 802 may optionally include memory located remotely from processor 801, which may be connected to electronic device 800 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device 800 may further include: an input device 803 and an output device 804. The processor 801, memory 802, input devices 803, and output devices 804 may be connected by a bus or other means, for example in fig. 8.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device 800, such as a touch screen, keypad, mouse, trackpad, touchpad, pointer stick, one or more mouse buttons, trackball, joystick, and like input devices. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The method for updating the neural network model according to the embodiment of the present disclosure may be performed by a server or a client.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (12)

1. A method of updating a neural network model, the method comprising:
acquiring an input image and a trained student model;
adding a neural network module to a trained student model, comprising: adding the neural network module to at least one convolutional layer of a plurality of convolutional layers of a student model; wherein, in the case that the number of addable neurons in the at least one convolution layer is smaller than the number of neurons included in the neural network module, selecting another at least one convolution layer from the plurality of convolution layers according to a preset rule or randomly, and adding the neural network module to the another at least one convolution layer; the neural network module includes one or more neurons;
respectively extracting the characteristics of the input image by using a student model and a teacher model added with the neural network module; and
based on the difference between the feature extraction result of the student model and the feature extraction result of the teacher model, parameters of the neural network module added in the student model are adjusted to obtain an updated model of the student model.
2. The method of claim 1, further comprising:
acquiring a currently used neural network model as a trained student model before adding the neural network module to the trained student model; and
after obtaining an updated model of the student model, the currently used neural network model is replaced with the updated model.
3. The method of claim 2, further comprising:
after obtaining an updated model of the student model, testing the feature extraction accuracy of the obtained updated model by using preset test data;
the step of replacing the currently used neural network model with the updated model is performed in case the feature extraction accuracy of the updated model is higher than the feature extraction accuracy of the currently used neural network model.
4. The method of claim 1, further comprising:
after obtaining the updated model of the student model, the step of acquiring the input image is returned.
5. The method of claim 1, wherein the neural network module comprises one or more convolutional layers, the adding the neural network module to the student model comprising:
the neural network module is added to the student model, wherein the addition of the neural network module is stopped in the event that the number of addable convolutional layers in the student model is less than the number of convolutional layers contained in the neural network module.
6. The method of claim 1, wherein the input image comprises: a plurality of historical input images processed by the neural network model currently in use.
7. The method of any one of claims 1 to 6, wherein the difference between the feature extraction result of the student model and the feature extraction result of the teacher model comprises: distillation loss function value between the feature extraction result of the student model and the feature extraction result of the teacher model.
8. The method of any one of claims 1 to 6, wherein the student model is a mobilet series neural network model.
9. The method of any one of claims 1 to 6, wherein the teacher model is a resnet152 series neural network model.
10. An updating apparatus of a neural network model, comprising:
the acquisition module is used for acquiring an input image;
an expansion module for adding a neural network module to a trained student model, comprising: adding the neural network module to at least one convolutional layer of a plurality of convolutional layers of a student model; wherein, in the case that the number of addable neurons in the at least one convolution layer is smaller than the number of neurons included in the neural network module, selecting another at least one convolution layer from the plurality of convolution layers according to a preset rule or randomly, and adding the neural network module to the another at least one convolution layer; the neural network module includes one or more neurons;
the extraction module is used for extracting the characteristics of the input image by using the student model and the teacher model added with the neural network module respectively; and
and the training module is used for adjusting parameters of the neural network module added in the student model based on the difference between the feature extraction result of the student model and the feature extraction result of the teacher model to obtain an updated model of the student model.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-9.
CN202011481689.6A 2020-12-15 2020-12-15 Neural network model updating method, device, equipment and storage medium Active CN112529162B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011481689.6A CN112529162B (en) 2020-12-15 2020-12-15 Neural network model updating method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011481689.6A CN112529162B (en) 2020-12-15 2020-12-15 Neural network model updating method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112529162A CN112529162A (en) 2021-03-19
CN112529162B true CN112529162B (en) 2024-02-27

Family

ID=75000307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011481689.6A Active CN112529162B (en) 2020-12-15 2020-12-15 Neural network model updating method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112529162B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553939A (en) * 2021-07-19 2021-10-26 中国工商银行股份有限公司 Point cloud classification model training method and device, electronic equipment and storage medium
CN113657467B (en) * 2021-07-29 2023-04-07 北京百度网讯科技有限公司 Model pre-training method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105008A (en) * 2018-10-29 2020-05-05 富士通株式会社 Model training method, data recognition method and data recognition device
CN111160409A (en) * 2019-12-11 2020-05-15 浙江大学 Heterogeneous neural network knowledge reorganization method based on common feature learning
CN111582479A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Distillation method and device of neural network model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929755B2 (en) * 2019-04-08 2021-02-23 Advanced New Technologies Co., Ltd. Optimization processing for neural network model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105008A (en) * 2018-10-29 2020-05-05 富士通株式会社 Model training method, data recognition method and data recognition device
CN111160409A (en) * 2019-12-11 2020-05-15 浙江大学 Heterogeneous neural network knowledge reorganization method based on common feature learning
CN111582479A (en) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 Distillation method and device of neural network model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于FTVGG16卷积神经网络的鱼类识别方法;陈英义;龚川洋;刘烨琦;方晓敏;;农业机械学报(第05期);全文 *

Also Published As

Publication number Publication date
CN112529162A (en) 2021-03-19

Similar Documents

Publication Publication Date Title
CN111667054B (en) Method, device, electronic equipment and storage medium for generating neural network model
JP7042897B2 (en) Model parameter update method and equipment
CN111275190B (en) Compression method and device of neural network model, image processing method and processor
CN111738414B (en) Recommendation model generation method, content recommendation method, device, equipment and medium
CN112529162B (en) Neural network model updating method, device, equipment and storage medium
EP3872763A1 (en) Point cloud data processing method, apparatus, electronic device and computer readable storage medium
CN111667057B (en) Method and apparatus for searching model structures
CN110795569B (en) Method, device and equipment for generating vector representation of knowledge graph
CN111582479B (en) Distillation method and device for neural network model
CN111914994B (en) Generation method and device of multi-layer perceptron, electronic equipment and storage medium
CN111582375A (en) Data enhancement strategy searching method, device, equipment and storage medium
CN111461343B (en) Model parameter updating method and related equipment thereof
CN111950293B (en) Semantic representation model generation method and device, electronic equipment and storage medium
CN112560499B (en) Pre-training method and device for semantic representation model, electronic equipment and storage medium
CN111859907B (en) Text error correction method and device, electronic equipment and storage medium
CN111652354B (en) Method, apparatus, device and storage medium for training super network
CN114492788A (en) Method and device for training deep learning model, electronic equipment and storage medium
CN112100466A (en) Method, device and equipment for generating search space and storage medium
CN112580723B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111680599B (en) Face recognition model processing method, device, equipment and storage medium
CN110598629B (en) Super-network search space construction method and device and electronic equipment
CN111783872B (en) Method, device, electronic equipment and computer readable storage medium for training model
CN111340222B (en) Neural network model searching method and device and electronic equipment
CN111461340B (en) Weight matrix updating method and device and electronic equipment
CN111753955A (en) Model parameter adjusting method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant