CN114359618A

CN114359618A - Training method of neural network model, electronic equipment and computer program product

Info

Publication number: CN114359618A
Application number: CN202111420295.4A
Authority: CN
Inventors: 杨越
Original assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-04-15

Abstract

The application provides a training method of a neural network model, electronic equipment and a computer program product, and relates to the technical field of image processing. The method comprises the following steps: acquiring an initial image set corresponding to a target scene; performing target prediction on the initial image in the initial image set according to the initial neural network model to obtain first prediction information corresponding to the initial image; determining pseudo-annotation information corresponding to the initial image according to the first prediction score and a pseudo-annotation threshold corresponding to the target scene; and training the initial neural network model according to the initial image set and the pseudo-annotation information corresponding to the initial images in the initial image set to obtain the target neural network model. The method and the device for detecting the target in the target neural network model have the advantages that the pseudo labeling information obtained according to different pseudo labeling threshold values is more consistent with the current application scene, the conditions of label missing and label error are avoided, and the accuracy of target detection of the target neural network model in each application scene is improved.

Description

Training method of neural network model, electronic equipment and computer program product

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a training method for a neural network model, an electronic device, and a computer program product.

Background

The deep learning is a process of labeling samples and training and optimizing a neural network model based on labeled information, and the optimization effect of the model is directly related to the number of the samples.

However, it is difficult to obtain a large number of samples with labeled information, and therefore, a detection method based on semi-supervised learning is generated in the deep learning field. The semi-supervised learning method is generally to train a model on part of labeled samples, enable the model to have the capability of performing label prediction on a large amount of label-free data, and then use high-quality pseudo labels for further iterative optimization of the model. The method has the advantages that a small amount of labeled data and a large amount of unlabeled data are effectively utilized to improve the performance of the model, and meanwhile, the manual labeling cost is reduced.

In order to enable a network model to have strong generalization capability, the existing semi-supervised detection technology usually focuses on the diversity and richness of data, namely training data are obtained from different scenes, but pseudo-labeling performed on unlabelled data of different scenes is not accurate enough, so that a large amount of wrong pseudo-labeling information appears in part of scenes, and the accuracy of target detection is caused.

Disclosure of Invention

In view of the above, an object of the present application is to provide a method, an electronic device and a computer program product for training a neural network model, so as to improve the accuracy of the trained neural network model for detecting a target.

In a first aspect, an embodiment of the present application provides a training method for a neural network model, where the method is applied to an electronic device, and an initial neural network model obtained based on training of an artificial labeling sample image corresponding to a target scene and a pseudo labeling threshold corresponding to the target scene are stored in the electronic device; the method comprises the following steps: acquiring an initial image set corresponding to a target scene; performing target prediction on the initial image in the initial image set according to the initial neural network model to obtain first prediction information corresponding to the initial image; the first prediction information comprises a first prediction frame and a first prediction score corresponding to the first prediction frame; determining pseudo-annotation information corresponding to the initial image according to the first prediction score and a pseudo-annotation threshold corresponding to the target scene; the pseudo labeling information comprises a first prediction frame and a pseudo labeling score corresponding to the first prediction frame; and training the initial neural network model according to the initial image set and the pseudo-annotation information corresponding to the initial images in the initial image set to obtain the target neural network model.

Further, the pseudo labeling threshold is determined by the following method: acquiring a plurality of historical images containing a target scene; predicting each historical image through an initial neural network model to obtain a second prediction frame corresponding to each historical image and a second prediction score corresponding to the second prediction frame; determining a predictive score distribution function according to the second predictive score; and determining a pseudo-labeling threshold value of a pseudo sample corresponding to the target scene according to the prediction score distribution function.

Further, the step of determining the pseudo-labeling threshold of the pseudo sample corresponding to the target scene according to the prediction score distribution function includes: and fitting the second prediction scores corresponding to all the second prediction frames to obtain a prediction score distribution function.

Further, the step of determining the pseudo-labeling threshold of the pseudo sample corresponding to the target scene according to the prediction score distribution function includes: determining a trough position between a first peak and a second peak of the predictive score distribution function; the first peak is a peak corresponding to a positive sample threshold value of the target scene, and the second peak is a peak corresponding to a negative sample threshold value of the target scene; and determining the vertical coordinate corresponding to the trough position as the pseudo-labeling threshold of the pseudo sample corresponding to the target scene.

Furthermore, the number of the target scenes is at least two, and the difference value between the number of the initial images corresponding to different target scenes is smaller than a first number difference threshold value.

Further, the initial image set comprises an initial image containing the target scene and shot under the first lighting condition, and an initial image containing the target scene and shot under the second lighting condition; wherein the first illumination condition and the second illumination condition have different light quantity.

Further, the step of determining the pseudo-annotation information corresponding to the initial image according to the first prediction score corresponding to the prediction frame in the initial image and the pseudo-annotation threshold corresponding to the target scene includes: judging whether the first prediction score is larger than a pseudo-labeling threshold value or not, and if so, determining that the pseudo-labeling score of the prediction box is a first target numerical value; otherwise, determining the pseudo-label score of the prediction box as a second target numerical value.

In a second aspect, an embodiment of the present application further provides an electronic device, which includes a processor and a memory, where the memory stores computer-executable instructions capable of being executed by the processor, and the processor executes the computer-executable instructions to implement the method for training a neural network model according to the first aspect.

In a third aspect, embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions, which, when invoked and executed by a processor, cause the processor to implement the method for training a neural network model of the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer program product, where the computer program product includes a computer program, and when the computer program is executed by a processor, the method for training a neural network model in the first aspect is implemented.

Compared with the prior art, the method has the following beneficial effects:

according to the training method of the neural network model, the electronic device and the computer program product provided by the embodiment of the application, firstly, an initial image set corresponding to a target scene is obtained; performing target prediction on initial images in the initial image set according to the initial neural network model to obtain a first prediction frame corresponding to the initial images and a first prediction score corresponding to the first prediction frame, and determining pseudo-annotation information corresponding to the initial images according to the first prediction score and a pseudo-annotation threshold corresponding to a target scene; and finally, training the initial neural network model according to the initial images in the initial image set and the pseudo-labeling information corresponding to the initial images to obtain a target neural network model.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part may be learned by the practice of the above-described techniques of the disclosure, or may be learned by practice of the disclosure.

In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of an electronic system according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a training method of a neural network model according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a method for determining a pseudo-annotation threshold according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a training apparatus for a neural network model according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

Based on the fact that the existing semi-supervised detection technology adopts a uniform pseudo-labeling threshold value aiming at different application scenes, and a large number of missed labels or wrong labels appear in part of the application scenes, the embodiment of the application provides a training method of a neural network model, electronic equipment and a computer program product, so that the accuracy of the trained neural network model on target detection is improved.

Referring to fig. 1, a schematic diagram of an electronic system 100 is shown. The electronic system can be used for implementing the training method of the neural network model of the embodiment of the application.

As shown in FIG. 1, an electronic system 100 includes one or more processing devices 102, one or more memory devices 104, an input device 106, an output device 108, and one or more image capture devices 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic system 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic system may have other components and structures as desired.

The processing device 102 may be a server, a smart terminal, or a device containing a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, may process data for other components in the electronic system 100, and may control other components in the electronic system 100 to perform training functions of the neural network model.

Storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 102 to implement the client functionality (implemented by the processing device) of the embodiments of the present application described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Image capture device 110 may acquire the image to be processed and store the image to be processed in storage 104 for use by other components.

For example, the devices in the training method, the electronic device and the computer program product for implementing the neural network model according to the embodiments of the present application may be integrally disposed, or may be disposed in a decentralized manner, such as integrally disposing the processing device 102, the storage device 104, the input device 106 and the output device 108, and disposing the image capturing device 110 at a designated position where an image can be captured. When the above-described devices in the electronic system are integrally provided, the electronic system may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, and the like.

Fig. 2 is a flowchart of a training method for a neural network model according to an embodiment of the present application, where the method is applied to an electronic device, and an initial neural network model obtained based on training of an artificial labeling sample image corresponding to a target scene and a pseudo-labeling threshold corresponding to the target scene are stored in the electronic device. When an initial neural network model is trained, because artificially labeled sample images in each scene are often not enough to train to obtain a neural network model with higher precision, the unlabeled images need to be subjected to pseudo labeling, namely, prediction information of the unlabeled images is predicted by using the neural network model with lower precision, the prediction information can be called as pseudo labeling information, the pseudo labeling information comprises a pseudo labeling prediction frame and prediction scores corresponding to the pseudo labeling prediction frame, the prediction scores corresponding to different images have differences, and for the images with relatively higher prediction scores, the images are expected to be used as positive samples, and then the training of the neural network model is carried out by the positive samples, so that a pseudo labeling threshold value is set for representing the minimum value of the prediction scores corresponding to the pseudo labeling prediction frames of the images which can be used as the positive samples.

Referring to fig. 2, the method comprises the steps of:

s202: acquiring an initial image set corresponding to a target scene;

in this step, an initial image set corresponding to a target scene is first obtained, where the content of initial images included in the initial image set is the target scene, for example, if the target scene is a building, the initial image set is a plurality of initial images including the building. The initial image set may be obtained by shooting the target scene through a shooting device, or may be an existing image of the target scene obtained through an electronic device. The imaging device may be a device independent of the electronic device, or may be a functional module provided in the electronic device, for example, a camera in the electronic device with a camera.

S204: performing target prediction on the initial image in the initial image set according to the initial neural network model to obtain first prediction information corresponding to the initial image; the first prediction information comprises a first prediction frame and a first prediction score corresponding to the first prediction frame;

the initial neural network model is obtained by pre-training a sample image. The structure of the model may adopt a target monitoring model in the related art, such as a Retina Net model, and the embodiment of the present application does not limit the specific structure of the neural network model. The pre-training method adopts a fully supervised training method, and the specific process of the pre-training method will be described in detail below, which is not described herein again.

S206: determining pseudo-annotation information corresponding to the initial image according to the first prediction score and a pseudo-annotation threshold corresponding to the target scene; the pseudo labeling information comprises a first prediction frame and a pseudo labeling score corresponding to the first prediction frame;

s208: and training the initial neural network model according to the initial image set and the pseudo-annotation information corresponding to the initial images in the initial image set to obtain the target neural network model.

Specifically, the initial image is input into the initial neural network model again to obtain a prediction score, the loss value of the prediction score and the pseudo-annotation score in the pseudo-annotation information are calculated, and the neural network model is trained according to the calculation result.

According to the training method of the neural network model, an initial image set corresponding to a target scene is obtained; performing target prediction on initial images in the initial image set according to the initial neural network model to obtain a first prediction frame corresponding to the initial images and a first prediction score corresponding to the first prediction frame, and determining pseudo-annotation information corresponding to the initial images according to the first prediction score and a pseudo-annotation threshold corresponding to a target scene; and finally, training the initial neural network model according to the initial images in the initial image set and the pseudo-labeling information corresponding to the initial images to obtain a target neural network model.

In some possible embodiments, the number of the target scenes is at least two, and a difference value between the number of the initial images corresponding to different target scenes is smaller than a first number difference threshold. It is understood that the first quantity difference threshold may be a small value or 0, and when the first quantity difference threshold is 0, the number of the initial images corresponding to each target scene is equal to each other. For example, if the target scene 1 is a building a and the target scene 2 is a building B, a first number of initial images needs to be acquired for the target scene 1 and a second number of initial images needs to be acquired for the target scene 2, and if the first number difference threshold is set to 10, the difference between the first number and the second number is smaller than 10, and when the first number difference threshold is set to 0, the first number and the second number need to be ensured to be equal. By setting a plurality of target scenes, the target neural network model can adapt to various scenes, and meanwhile, the difference value between the number of the initial images in each scene is controlled below a first quantity difference threshold value, so that the trained model can achieve the overfitting effect of the same degree on each scene.

In order to ensure the effect of neural network model training and avoid the deterioration of model fitting effect caused by too large image difference in the same scene due to factors such as illumination change, in some examples, the initial image set includes an initial image shot under a first illumination condition and containing a target scene, and an initial image shot under a second illumination condition and containing the target scene; wherein the first illumination condition and the second illumination condition have different light quantity. For example, the initial image for a certain target scene may include both an image taken during the day and an image taken during the night.

In the process of performing pseudo-labeling on an initial image, the determination of a pseudo-labeling threshold value is a key factor of the pseudo-labeling quality of the initial image, and the current method sets a uniform pseudo-labeling threshold value for different application scenes, and the uniform pseudo-labeling threshold value cannot well screen prediction scores in different scenes, so that on the basis of the embodiment, the application provides a method for determining the pseudo-labeling threshold value, and as shown in fig. 3, the method specifically comprises the following steps:

s302: acquiring a plurality of historical images containing a target scene;

the historical images are images corresponding to the target scene acquired before model training.

S304: predicting each historical image through the initial neural network model to obtain second prediction information corresponding to each historical image; the second prediction information comprises a second prediction frame and a second prediction score corresponding to the second prediction frame;

it should be noted that before the determination of the pseudo-labeling threshold, the initial neural network model needs to be pre-trained, so that the initial neural network model has the capability of performing target detection on the input image. The training process of the initial neural network model is as follows:

(1) acquiring a sample image set formed by a plurality of sample image pairs, wherein the sample image pairs comprise an unmarked image without mark information and corresponding mark information;

the marking information may be obtained by manually marking.

(2) Inputting unmarked images in the sample image set into a neural network model to be trained to obtain a prediction frame and a prediction score corresponding to the unmarked images;

(3) calculating loss values of the predicted scores and the scores in the marking information, and adjusting parameters of the neural network model to be trained according to the calculation result;

(4) and when the training stopping condition is met, finishing the training of the neural network model to be trained, and taking the neural network model to be trained when the training stopping condition is met as an initial neural network model.

S306: determining a predictive score distribution function according to the second predictive score;

specifically, in some possible embodiments, the second prediction scores corresponding to all the second prediction boxes may be fitted to obtain a prediction score distribution function;

the fitting process may be polynomial fitting, and the obtained score distribution function may be an approximate polynomial function.

S308: and determining a pseudo-labeling threshold value of a pseudo sample corresponding to the target scene according to the prediction score distribution function.

In particular, a valley position between a first peak and a second peak of the predictive score distribution function may be determined; the first peak is a peak corresponding to a positive sample threshold value of the target scene, and the second peak is a peak corresponding to a negative sample threshold value of the target scene; and determining the vertical coordinate corresponding to the trough position as the pseudo-labeling threshold of the pseudo sample corresponding to the target scene.

For example, the prediction score obtained by the neural network model prediction is a decimal between 0 and 1, and the pseudo labeling is to judge the prediction score as 0 (negative sample) or 1 (positive sample), specifically, the positive and negative samples may be divided manually, or the positive and negative samples may be divided by setting a threshold. The positive and negative samples in the distribution function are respectively concentrated in the areas close to 1 and 0, so that two peaks appear in the two areas, a trough appears between the two peaks, and the areas with the highest occurrence frequency of the high-score negative samples and the low-score positive samples are near the trough. And the positions of the wave troughs are related to the characteristics of the scenes, so that in order to avoid serious label missing and label error of part of point positions, the pseudo-labeling threshold of each target application scene is set at the wave troughs in the embodiment of the application.

It will be appreciated that the corresponding pseudo-annotation threshold is not the same for different target scenarios. The pseudo-annotation threshold is obtained before the neural network model is trained, and the pseudo-annotation threshold can be obtained based on historical images or a small number of initial images in the initial image set.

In some possible embodiments, after obtaining the pseudo-annotation threshold, in step S206, that is, the step of determining the pseudo-annotation information corresponding to the initial image according to the first prediction score corresponding to the prediction frame in the initial image and the pseudo-annotation threshold corresponding to the target scene may specifically include:

(1) judging whether the first prediction score is larger than a pseudo-labeling threshold value or not;

(2) if so, determining that the pseudo-labeling score of the prediction box is a first target numerical value;

(3) otherwise, determining the pseudo-label score of the prediction box as a second target numerical value.

It is noted that the first target value is used for characterizing the initial image that can be used as a positive sample, and the second target value is used for characterizing the initial image that can not be used as a positive sample, therefore, the first target value is larger than the second target value, specifically, the first target value can be set between 0.5 and 1, the second target value can be set between 0 and 0.5, for example, the first target value is set to 1, and the second target value is set to 0.

For convenience of understanding, the following describes in detail a training method of the neural network model provided in the embodiment of the present application, with reference to a practical application scenario, where the training method includes:

step 1: acquiring an initial image set F1 in a scene A and an initial image set F2 in a scene B;

the number of initial images included in F1 is 2000, and the number of initial images included in F2 is 2000. F1 includes 1000 images of scene a taken during the day and 1000 images of scene a taken at night; f2 includes 500 images of scene B photographed in the daytime and 1500 images of scene B photographed at night.

Step 2: setting initial parameters of a neural network model;

and step 3: extracting 500 images from the initial image set F1 for manual annotation to obtain manual annotation information S1; extracting 800 images from the initial image set F2 for manual annotation to obtain manual annotation information S2;

and 4, step 4: pre-training the neural network model according to the artificial labeling information S1, and pre-training the neural network model according to the artificial labeling information S2;

and 5: inputting all images in the initial image set F1 into a pre-trained neural network model to obtain a first prediction score corresponding to each image, and setting the images with the first prediction scores larger than a pseudo-labeling threshold corresponding to the scene A as positive samples (namely, the pseudo-labeling scores are 1); and setting the image with the first prediction score smaller than the corresponding false mark threshold value of the scene A as a negative sample (namely, the false mark score is 0).

Step 6: inputting all images in the initial image set F2 into a pre-trained neural network model to obtain a second prediction score corresponding to each image, and setting the images with the second prediction scores larger than a pseudo-label threshold corresponding to the scene B as positive samples (namely the pseudo-label scores are 1); and setting the image with the second prediction score smaller than the pseudo-labeling threshold corresponding to the scene B as a negative sample (namely, the pseudo-label score is 0).

And 7: inputting the initial image into the neural network model again aiming at the initial image set F1 of the scene A to obtain a third prediction score, and continuing training the neural network model according to the third prediction score and the pseudo-mark score;

and 8: and (4) inputting the initial image into the neural network model again aiming at the initial image set F2 of the scene B to obtain a fourth prediction score, and continuing training the neural network model according to the fourth prediction score and the pseudo label score.

Based on the above method embodiment, an embodiment of the present application further provides a training device for a neural network model, as shown in fig. 4, the device stores an initial neural network model obtained based on training of an artificial labeling sample image corresponding to a target scene and a pseudo-labeling threshold corresponding to the target scene, where the pseudo-labeling threshold is determined based on a distribution function of prediction scores of a plurality of historical images corresponding to the target scene; the device includes:

an image obtaining module 402, configured to obtain an initial image set corresponding to a target scene;

a first prediction score determining module 404, configured to perform target prediction on an initial image in the initial image set according to the initial neural network model, so as to obtain first prediction information corresponding to the initial image; the first prediction information comprises a first prediction frame and a first prediction score corresponding to the first prediction frame;

a pseudo annotation information determining module 406, configured to determine, according to the first prediction score and a pseudo annotation threshold corresponding to the target scene, pseudo annotation information corresponding to the initial image; the pseudo labeling information comprises a first prediction frame and a pseudo labeling score corresponding to the first prediction frame;

the training module 408 is configured to train the initial neural network model according to the initial image set and the pseudo-annotation information corresponding to the initial images in the initial image set, so as to obtain a target neural network model.

The training device of the neural network model provided by the embodiment of the application firstly obtains an initial image set corresponding to a target scene; performing target prediction on initial images in the initial image set according to the initial neural network model to obtain a first prediction frame corresponding to the initial images and a first prediction score corresponding to the first prediction frame, and determining pseudo-annotation information corresponding to the initial images according to the first prediction score and a pseudo-annotation threshold corresponding to a target scene; and finally, training the initial neural network model according to the initial images in the initial image set and the pseudo-labeling information corresponding to the initial images to obtain a target neural network model.

The pseudo labeling threshold is determined by the following method: acquiring a plurality of historical images containing a target scene; predicting each historical image through the initial neural network model to obtain second prediction information corresponding to each historical image; the second prediction information comprises a second prediction frame and a second prediction score corresponding to the second prediction frame; determining a predictive score distribution function according to the second predictive score; and determining a pseudo-labeling threshold value of a pseudo sample corresponding to the target scene according to the prediction score distribution function.

The process of determining the distribution function of the prediction scores according to the second prediction scores includes: and fitting the second prediction scores corresponding to all the second prediction frames to obtain a prediction score distribution function.

The process of determining the pseudo-labeling threshold of the pseudo sample corresponding to the target scene according to the prediction score distribution function includes: determining a trough position between a first peak and a second peak of the predictive score distribution function; the first peak is a peak corresponding to a positive sample threshold value of the target scene, and the second peak is a peak corresponding to a negative sample threshold value of the target scene; and determining the vertical coordinate corresponding to the trough position as the pseudo-labeling threshold of the pseudo sample corresponding to the target scene.

The number of the target scenes is at least two, and the difference value between the number of the initial images corresponding to different target scenes is smaller than a first number difference threshold value.

The initial image set comprises an initial image which is shot under a first lighting condition and contains a target scene, and an initial image which is shot under a second lighting condition and contains the target scene; wherein the first illumination condition and the second illumination condition have different light quantity.

The process of determining the pseudo-annotation information corresponding to the initial image according to the first prediction score corresponding to the prediction frame in the initial image and the pseudo-annotation threshold corresponding to the target scene includes: judging whether the first prediction score is larger than a pseudo-labeling threshold value or not, and if so, determining that the pseudo-labeling score of the prediction box is a first target numerical value; otherwise, determining the pseudo-label score of the prediction box as a second target numerical value.

The implementation principle and the generated technical effect of the training device of the neural network model provided in the embodiment of the present application are the same as those of the aforementioned method embodiment, and for brief description, reference may be made to corresponding contents in the aforementioned training method embodiment of the neural network model where no mention is made in the embodiment of the aforementioned device.

An electronic device is further provided in an embodiment of the present application, as shown in fig. 5, which is a schematic structural diagram of the electronic device, where the electronic device includes a processor 501 and a memory 502, where the memory 502 stores computer-executable instructions capable of being executed by the processor 501, and the processor 501 executes the computer-executable instructions to implement the training method for the neural network model.

In the embodiment shown in fig. 5, the electronic device further comprises a bus 503 and a communication interface 504, wherein the processor 501, the communication interface 504 and the memory 502 are connected by the bus 503.

The Memory 502 may include a high-speed Random Access Memory (RAM) and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 504 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 503 may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 503 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 5, but this does not indicate only one bus or one type of bus.

The processor 501 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 501. The Processor 501 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory, and the processor 501 reads the information in the memory and completes the steps of the training method of the neural network model of the foregoing embodiment in combination with the hardware thereof.

The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the above training method for a neural network model, and specific implementation may refer to the foregoing method embodiment, and details are not described herein again.

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the above training method for the neural network model.

The neural network model training method, the electronic device, and the computer program product of the computer program product provided in the embodiments of the present application include a computer-readable storage medium storing program codes, instructions included in the program codes may be used to execute the methods described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and are not described herein again.

Unless specifically stated otherwise, the relative steps, numerical expressions, and values of the components and steps set forth in these embodiments do not limit the scope of the present application.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description of the present application, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present application. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the exemplary embodiments of the present application, and are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The training method of the neural network model is characterized in that the method is applied to electronic equipment, and an initial neural network model obtained based on training of an artificial labeling sample image corresponding to a target scene and a pseudo labeling threshold corresponding to the target scene are stored in the electronic equipment; the method comprises the following steps:

acquiring an initial image set corresponding to the target scene;

performing target prediction on initial images in the initial image set according to the initial neural network model to obtain first prediction information corresponding to the initial images; the first prediction information comprises a first prediction frame and a first prediction score corresponding to the first prediction frame;

determining pseudo-annotation information corresponding to the initial image according to the first prediction score and a pseudo-annotation threshold corresponding to the target scene; the pseudo labeling information comprises the first prediction box and a pseudo labeling score corresponding to the first prediction box;

and training the initial neural network model according to the initial image set and the pseudo-annotation information corresponding to the initial images in the initial image set to obtain a target neural network model.

2. The method of claim 1, wherein the pseudo-annotation threshold is determined by:

acquiring a plurality of historical images containing the target scene;

predicting each historical image through the initial neural network model to obtain second prediction information corresponding to each historical image; the second prediction information comprises a second prediction frame and a second prediction score corresponding to the second prediction frame;

determining a predictive score distribution function according to the second predictive score;

and determining a pseudo-labeling threshold value of a pseudo sample corresponding to the target scene according to the prediction score distribution function.

3. The method of claim 2, wherein the step of determining a predictive score distribution function based on the second predictive score comprises:

and fitting the second prediction scores corresponding to all the second prediction frames to obtain a prediction score distribution function.

4. The method according to claim 2, wherein the step of determining the pseudo-labeling threshold of the pseudo-sample corresponding to the target scene according to the predictive score distribution function comprises:

determining a trough position between a first peak and a second peak of the predictive score distribution function; the first peak is a peak corresponding to a positive sample threshold of the target scene, and the second peak is a peak corresponding to a negative sample threshold of the target scene;

and determining the vertical coordinate corresponding to the trough position as a pseudo labeling threshold value of a pseudo sample corresponding to the target scene.

5. The method of claim 1, wherein there are at least two target scenes, and a difference between the number of initial images corresponding to different target scenes is less than a first number difference threshold.

6. The method of claim 1, wherein the initial set of images comprises an initial image taken under a first lighting condition containing the subject scene and an initial image taken under a second lighting condition containing the subject scene; wherein the first lighting condition is different from the second lighting condition in light amount.

7. The method according to claim 1, wherein the step of determining the pseudo-label information corresponding to the initial image according to the first prediction score corresponding to the prediction frame in the initial image and the pseudo-label threshold corresponding to the target scene comprises:

judging whether the first prediction score is larger than the pseudo-labeling threshold value or not, and if so, determining that the pseudo-labeling score of the prediction box is a first target numerical value;

otherwise, determining that the pseudo-labeling score of the prediction box is a second target numerical value.

8. An electronic device comprising a processor and a memory, the memory storing computer-executable instructions executable by the processor, the processor executing the computer-executable instructions to implement the method of any one of claims 1-7.

9. A computer-readable storage medium having computer-executable instructions stored thereon that, when invoked and executed by a processor, cause the processor to implement the method of any one of claims 1-7.

10. A computer program product, characterized in that the computer program product comprises a computer program which, when being executed by a processor, carries out the method of any one of claims 1-7.