CN115205094A

CN115205094A - Neural network training method, image detection method and equipment thereof

Info

Publication number: CN115205094A
Application number: CN202210632202.2A
Authority: CN
Inventors: 王超运; 孙鹤; 潘华东; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2022-10-18

Abstract

The application discloses a neural network training method, an image detection method, a terminal device and a computer storage medium, wherein the method comprises the following steps: acquiring a training image set, wherein the training image set comprises a first-resolution image and a second-resolution image, and the resolution of the first-resolution image is higher than that of the second-resolution image; inputting the first resolution image into a first neural network for training; inputting the second resolution image into a second neural network for training; calculating an overall loss value based on the first loss value and the second loss value; and updating network parameters of the second neural network by using the total loss value to obtain a final neural network. The neural network training method can assist the low-resolution neural network to train by utilizing the high-resolution images and the training results of the high-resolution neural network, reduces the resolution of the input images in a distillation learning mode, reduces the whole time consumption of neural network training, and ensures that the training precision is consistent with the high-resolution neural network.

Description

Neural network training method, image detection method and equipment thereof

Technical Field

The present application relates to the field of neural network application technologies, and in particular, to a neural network training method, an image detection method, a terminal device, and a computer storage medium.

Background

Deep learning has been widely used in recent years, and is widely used in the computer vision field such as target detection, target classification, target segmentation and the like because of its strong feature extraction capability. However, because the parameter amount is huge, most of applications are deployed in the cloud by means of gpu (graphics processing unit) calculation, and the applications are difficult to fall to the ground in many application fields or the excellent performance of a large network is difficult to be exerted.

In order to solve the problem, many researchers research network compression methods such as distillation learning and model pruning, so that the calculated amount of the network is reduced, and the method can be deployed in lightweight equipment. In addition, reducing the resolution of the input image can directly reduce the overall network calculation amount, but also can cause the reduction of the precision, and how to solve the problem of ensuring that the precision is not reduced while the resolution is reduced is a research problem.

Disclosure of Invention

The application provides a neural network training method, an image detection method, a terminal device and a computer storage medium.

One technical solution adopted by the present application is to provide a neural network training method, including:

acquiring a training image set, wherein the training image set comprises a first resolution image and a second resolution image, and the resolution of the first resolution image is higher than that of the second resolution image;

inputting the first resolution ratio image into a first neural network for training to obtain a first loss value;

inputting the second resolution image into a second neural network for training to obtain a second loss value;

calculating an overall loss value based on the first loss value and the second loss value;

and updating network parameters of the second neural network by using the total loss value to obtain a final neural network.

The first resolution image and the second resolution image are obtained by processing the same image with different resolutions; the first neural network includes at least a network structure of the second neural network.

Wherein said calculating an overall loss value based on said first loss value and said second loss value comprises:

setting a balance parameter;

and weighting and adding the first loss value and the second loss value by using the balance parameter to obtain the total loss value.

Wherein, the updating the network parameters of the second neural network by using the total loss value to obtain a final neural network comprises:

updating network parameters of the first neural network by using the total loss value to obtain first network parameters of the first neural network;

and sharing the first network parameters to the second neural network so as to update the network parameters of the second neural network to obtain a final neural network.

updating network parameters of the first neural network by using the total loss value to obtain third network parameters of the first neural network;

updating network parameters of the second neural network by using the total loss value to obtain second network parameters of the second neural network;

and calculating to obtain a final network parameter and a final neural network formed by the final network parameter based on the second network parameter and the third network parameter.

Wherein, after inputting the first resolution image into a first neural network for training and obtaining a first loss value, the neural network training method further comprises:

comparing the first loss value with a preset loss threshold value;

and when the first loss value is less than or equal to the preset loss threshold value, fixing the network parameters of the first neural network.

Another technical solution adopted by the present application is to provide an image detection method, including:

acquiring a pre-trained neural network, wherein the neural network is obtained by training through the neural network training method;

acquiring a video to be detected, processing the video to be detected, and extracting at least one image to be detected which accords with the input of the neural network from the video to be detected;

and inputting the at least one image to be detected into the neural network, and detecting a target object in the image to be detected.

The video to be detected is a video shot by a collecting device of a monitored area in a kitchen, and the target object is a mouse.

Another technical solution adopted by the present application is to provide a terminal device, where the terminal device includes a memory and a processor coupled to the memory;

wherein the memory is configured to store program data and the processor is configured to execute the program data to implement a neural network training method and/or an image detection method as described above.

Another technical solution adopted by the present application is to provide a computer storage medium for storing program data, which when executed by a computer, is used to implement the neural network training method and/or the image detection method as described above.

The beneficial effect of this application is: the method comprises the steps that a terminal device acquires a training image set, wherein the training image set comprises a first-resolution image and a second-resolution image, and the resolution of the first-resolution image is higher than that of the second-resolution image; inputting the first resolution image into a first neural network for training to obtain a first loss value; inputting the second resolution image into a second neural network for training to obtain a second loss value; calculating an overall loss value based on the first loss value and the second loss value; and updating network parameters of the second neural network by using the total loss value to obtain a final neural network. The neural network training method can assist the low-resolution neural network to train by utilizing the high-resolution images and the training results of the high-resolution neural network, reduces the resolution of the input images in a distillation learning mode, reduces the whole time consumption of neural network training, and ensures that the training precision is consistent with the high-resolution neural network.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a neural network training method provided herein;

FIG. 2 is a schematic diagram of a general flow of a neural network training method provided in the present application;

FIG. 3 is a schematic flow chart of the distillation learning of the high and low resolution model provided in the present application;

FIG. 4 is a schematic flowchart of an embodiment of an image detection method provided in the present application;

fig. 5 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;

fig. 6 is a schematic structural diagram of another embodiment of a terminal device provided in the present application;

fig. 7 is a schematic structural diagram of a terminal device according to another embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of an embodiment of a computer storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

The method for reducing the time consumption of the equipment-side deep learning model for processing the images is mainly designed, and compared with the traditional method, the method starts from the angle of image resolution, combines distillation learning, can reduce the time consumption at two angles of network calculation and image preprocessing, and simultaneously ensures consistent accuracy.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of an embodiment of a neural network training method provided in the present application, and fig. 2 is a schematic flowchart of a general flow of the neural network training method provided in the present application.

As shown in fig. 1, the neural network training method according to the embodiment of the present application may specifically include the following steps:

step S11: acquiring a training image set, wherein the training image set comprises a first-resolution image and a second-resolution image, and the resolution of the first-resolution image is higher than that of the second-resolution image.

In the embodiment of the application, the terminal device obtains a training image set, wherein, taking image classification application as an example, the terminal device can be connected with a monitoring camera arranged in a monitoring scene, and a video stream or a plurality of video frames are acquired through the monitoring camera to form the training image set; or the device can be connected with a storage device, and a plurality of images are directly extracted from the storage device to form a training image set; training images which accord with image classification application can be obtained from the public training set to form a training image set used for neural network training.

Further, the training image set at least includes images with two or more resolutions, i.e. a first resolution image and a second resolution image, where the resolution of the first resolution image is higher than that of the second resolution image. In one possible embodiment, the resolution of the first resolution image may be 224 × 224, and the resolution of the second resolution image may be 112 × 112.

In other embodiments, as shown in fig. 2, the terminal device may also acquire a training image data set with the same image resolution, and then generate a high resolution image data set and a low resolution image data set based on the training image data set through image preprocessing, i.e., a resolution preprocessing manner. Specifically, the terminal device may perform an image preprocessing operation on the training image data set, in this embodiment, a scaling transform of a size may be adopted, and for distinguishing, the size transform may be divided into 2 size transforms of different sizes, and the size transform results are a high-resolution image and a low-resolution image. In addition, the terminal device may use only 1 size transformation, that is, the training image data set is directly used as the high resolution image set, and the training image data set is subjected to size transformation to generate the low resolution image set.

Step S12: and inputting the first resolution ratio image into a first neural network for training to obtain a first loss value.

In the embodiment of the application, the terminal device inputs the first resolution image into a first neural network, that is, a high-resolution neural network, to obtain a prediction tag of the high-resolution neural network for the high-resolution image. Then, the terminal device calculates a first loss value of the high-resolution neural network by using the difference between the real label and the predicted label of the high-resolution image.

Step S13: and inputting the second resolution image into a second neural network for training to obtain a second loss value.

In the embodiment of the application, the terminal device inputs the second resolution image into a second neural network, that is, a low resolution neural network, to obtain a prediction label of the low resolution neural network for the low resolution image. And then, the terminal equipment calculates a second loss value of the low-resolution neural network by using the difference between the real label and the predicted label of the low-resolution image.

And the terminal equipment inputs the high-resolution image into the high-resolution neural network and inputs the low-resolution image into the low-resolution neural network for joint training. The result of the joint training is that what is better is the high-resolution neural network input as a high-resolution image, namely the model 1_ high-resolution model shown in fig. 2.

Step S14: based on the first loss value and the second loss value, an overall loss value is calculated.

In the embodiment of the present application, the terminal device uses the high resolution neural network trained in step S12 as a teacher model for distillation learning, and uses the low resolution neural network required for the purpose of the embodiment of the present application as a student model for distillation learning.

Specifically, please refer to fig. 3, in which fig. 3 is a schematic flow chart of the distillation learning of the high and low resolution model provided in the present application. The original image is size-converted into 2 different resolution images, i.e., the low resolution image and the high resolution image shown in fig. 3, by the size change described in step S11, such as 2 different image pre-processes.

After size conversion, the feature information of the low-resolution image is less than that of the high-resolution image, but the whole semantic information is kept unchanged.

As shown in fig. 3, in the training process, where model 2_ low resolution model is the target model, the update of the parameters is required, and model 1_ high resolution model is to calculate loss, i.e. loss value, by using only the output result. The calculation formula for training the total loss value of the low-resolution model is specifically as follows:

Loss＝a*loss1+(1-a)*loss2

wherein, loss2 is the output loss value of the model 2_ low resolution model, loss1 is the output result of the model 1_ high resolution model, a is the balance parameter, and loss1 and loss2 are balanced. In a specific embodiment, in actual use, the terminal device selects a =0.9, and obtains the overall Loss to perform parameter update training on the model 2_ low-resolution model. In other embodiments, the terminal device may select other values of the balance parameter, which is not listed here.

Step S15: and updating network parameters of the second neural network by using the total loss value to obtain a final neural network.

In the embodiment of the application, the terminal device trains the completed second neural network, namely the model 2_ low-resolution model, through the joint distillation learning of the first neural network and the second neural network, and the effect of the model is equivalent to that of the first neural network, namely the model 1_ high-resolution model, but the resolution of the input image of the second neural network is reduced, so that the time consumption of image preprocessing and the time consumption of model calculation can be effectively reduced.

The network parameter updating method of the second neural network includes but is not limited to the following three methods:

and the first and second neural networks directly update the network parameters of the own neural network according to the overall loss value.

And the first neural network updates the network parameters of the own neural network according to the total loss value, and the first neural network is a teacher model, the second neural network is a student model, and the first neural network at least comprises a network structure of the second neural network. Therefore, the first neural network may share the network parameters of the network structure shared with the second neural network in the own neural network to the second neural network, so that the second neural network updates the network parameters of the own neural network.

And thirdly, the first neural network updates the network parameters of the self neural network according to the total loss value, and the second neural network updates the network parameters of the self neural network according to the total loss value. And the second neural network fuses the network parameters of the first neural network and the network parameters of the second neural network according to a preset rule to obtain the final network parameters of the second neural network.

Wherein the preset rules include but are not limited to: taking the average value of the network parameters of the first neural network and the network parameters of the second neural network as the final network parameters of the second neural network; and weighting and fusing the network parameters of the first neural network and the network parameters of the second neural network, wherein the fused result is used as the final network parameters of the second neural network, and the weight can refer to the setting of the balance parameters, which is not described herein again.

In the three network parameter updating methods, in the process of multiple iterations, the network parameters of the second neural network are continuously updated, and the training loss value of the second neural network is continuously close to the preset threshold value. And the network parameters of the first neural network are continuously updated until the training loss value of the first neural network is less than or equal to the preset threshold value, the terminal equipment can fix the network parameters of the first neural network, even stop inputting the high-resolution image into the first neural network, only execute the step of inputting the low resolution into the second neural network, and help the second neural network to continue training by using the fixed network parameters and the training loss value thereof.

In the embodiment of the application, a terminal device acquires a training image set, wherein the training image set comprises a first resolution image and a second resolution image, and the resolution of the first resolution image is higher than that of the second resolution image; inputting the first resolution ratio image into a first neural network for training to obtain a first loss value; inputting the second resolution image into a second neural network for training to obtain a second loss value; calculating an overall loss value based on the first loss value and the second loss value; and updating network parameters of the second neural network by using the total loss value to obtain a final neural network. The neural network training method can assist the low-resolution neural network to train by utilizing the high-resolution images and the training results of the high-resolution neural network, reduces the resolution of the input images in a distillation learning mode, reduces the whole time consumption of neural network training, and ensures that the training precision is consistent with the high-resolution neural network.

According to the method, the high-resolution network model is used as a teacher model, the low-resolution network model is used as a student model, distillation learning is carried out after different preprocessing transformations are carried out on the same image, the resolution of the input image can be reduced on the basis of not changing a deep learning network, the whole time consumed is shortened from two aspects of preprocessing and network calculation, and the consistency of the precision and the high-resolution model is ensured.

Referring to fig. 4, fig. 4 is a schematic flowchart illustrating an embodiment of an image detection method provided in the present application.

As shown in fig. 4, the image detection method according to the embodiment of the present application may specifically include the following steps:

step S21: a pre-trained neural network is obtained.

In the embodiment of the present application, the pre-trained neural network may be the low-resolution neural network shown in fig. 1 to 3, and the training process thereof is not described herein again.

Step S22: and acquiring a video to be detected, processing the video to be detected, and extracting at least one image to be detected which accords with the input of the neural network from the video to be detected.

In the embodiment of the application, the terminal device shoots a video to be detected in a monitoring area in a kitchen scene by using the acquisition device, and then extracts at least one image to be detected from the video to be detected. Further, the terminal device needs to perform image processing on the image to be detected according to the input requirement of the neural network, for example, the terminal device may perform size transformation processing on the image to be detected with high resolution to obtain a low resolution image set by the neural network.

Similarly, the image detection method of the embodiment of the present application is also applicable to other application scenarios, such as a classroom, a subway, and the like, which are not listed here.

Step S23: and inputting at least one image to be detected into the neural network, and detecting the target object in the image to be detected.

In the embodiment of the present application, the terminal device inputs the at least one image to be detected extracted in step S22 into a neural network, and detects a target object in the image to be detected, such as a mouse in a kitchen scene, by using the neural network.

The mouse in the kitchen scene only needs to be detected whether to exist for human beings, the specific form of the mouse does not need to be detected, the low-resolution neural network is enough to detect and mark the mouse in the kitchen scene, the high-resolution detection for the mouse is not needed, for the scenes and the target objects thereof, the low-resolution neural network can be adopted to ensure the detection precision, simultaneously, the overall time consumption of neural network prediction and network calculation amount is effectively reduced, and the accuracy of the image detection method is improved.

The above embodiments are only one of the common cases of the present application and do not limit the technical scope of the present application, so that any minor modifications, equivalent changes or modifications made to the above contents according to the essence of the present application still fall within the technical scope of the present application.

Continuing to refer to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of a terminal device provided in the present application. The terminal device 300 of the embodiment of the present application includes an obtaining module 31, a training module 32, a calculating module 33, and an updating module 34.

The acquiring module 31 is configured to acquire a training image set, where the training image set includes a first resolution image and a second resolution image, and a resolution of the first resolution image is higher than a resolution of the second resolution image.

A training module 32, configured to input the first resolution image into a first neural network for training, so as to obtain a first loss value; and inputting the second resolution image into a second neural network for training to obtain a second loss value.

A calculation module 33, configured to calculate an overall loss value based on the first loss value and the second loss value.

And an updating module 34, configured to update network parameters of the second neural network with the total loss value to obtain a final neural network.

With continuing reference to fig. 6, fig. 6 is a schematic structural diagram of another embodiment of the terminal device provided in the present application. The terminal device 400 of the embodiment of the present application includes a network module 41, an image module 42, and a detection module 43.

The network module 41 is configured to obtain a pre-trained neural network.

The image module 42 is configured to acquire a video to be detected, process the video to be detected, and extract at least one image to be detected that meets the input of the neural network from the video to be detected.

A detecting module 43, configured to input the at least one image to be detected into the neural network, and detect a target object in the image to be detected.

With continuing reference to fig. 7, fig. 7 is a schematic structural diagram of another embodiment of the terminal device provided in the present application. The terminal device 500 of the embodiment of the present application includes a processor 51, a memory 52, an input-output device 53, and a bus 54.

The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, the memory 52 stores program data, and the processor 51 is configured to execute the program data to implement the neural network training method and/or the image detection method according to the above embodiments.

In the embodiment of the present application, the processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

Fig. 8 is a schematic structural diagram of an embodiment of the computer storage medium provided in the present application, and the computer storage medium 600 stores program data 61, and when the program data 61 is executed by a processor, the program data is used to implement the neural network training method and/or the image detection method of the foregoing embodiment.

The embodiments of the present application may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when being sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an embodiment of the present application, and is not intended to limit the scope of the present application, and the present application is also intended to cover the modifications and equivalents of the structure or equivalent process included in the description and drawings of the present application, or to be directly or indirectly applied to other related technical fields.

Claims

1. A neural network training method, comprising:

inputting the first resolution image into a first neural network for training to obtain a first loss value;

2. The neural network training method of claim 1,

3. The neural network training method of claim 1,

said calculating an overall loss value based on said first loss value and said second loss value, comprising:

setting a balance parameter;

4. The neural network training method of claim 1,

the updating network parameters of the second neural network by using the total loss value to obtain a final neural network comprises the following steps:

5. The neural network training method of claim 4,

6. The neural network training method of claim 1,

after the first resolution image is input into a first neural network for training and a first loss value is obtained, the neural network training method further includes:

comparing the first loss value with a preset loss threshold value;

7. An image detection method, characterized in that the image detection method comprises:

obtaining a pre-trained neural network, wherein the neural network is obtained by training through the neural network training method of any one of claims 1 to 6;

8. The image detection method according to claim 7,

the video to be detected is a video shot by a collecting device on a monitored area in a kitchen, and the target object is a mouse.

9. A terminal device, comprising a memory and a processor coupled to the memory;

wherein the memory is configured to store program data, and the processor is configured to execute the program data to implement the neural network training method of any one of claims 1 to 6 and/or the image detection method of any one of claims 7 to 8.

10. A computer storage medium for storing program data which, when executed by a computer, is adapted to implement the neural network training method of any one of claims 1 to 6 and/or the image detection method of any one of claims 7 to 8.