CN112434753A - Model training method, target detection method, device, equipment and storage medium - Google Patents

Model training method, target detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN112434753A
CN112434753A CN202011459601.0A CN202011459601A CN112434753A CN 112434753 A CN112434753 A CN 112434753A CN 202011459601 A CN202011459601 A CN 202011459601A CN 112434753 A CN112434753 A CN 112434753A
Authority
CN
China
Prior art keywords
image
reference image
loss value
fused
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011459601.0A
Other languages
Chinese (zh)
Inventor
丁子凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eye Control Technology Co Ltd
Original Assignee
Shanghai Eye Control Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eye Control Technology Co Ltd filed Critical Shanghai Eye Control Technology Co Ltd
Priority to CN202011459601.0A priority Critical patent/CN112434753A/en
Publication of CN112434753A publication Critical patent/CN112434753A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention discloses a model training method, a target detection device, a model training apparatus and a storage medium. The model training method comprises the steps of inputting an obtained image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, determining a first loss value of the second reference image and a first fusion image and a second loss value of the first reference image and a second fusion image, and training the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value. According to the scheme, the first reference image and the second reference image are fused to obtain the first fused image, the first fused image and the initial reference image are fused to obtain the second fused image, the first target detection model is trained according to the first loss value of the second reference image and the first fused image and the second loss value of the first reference image and the second fused image, the trained model is more accurate, and the accuracy of a small target detection result can be improved.

Description

Model training method, target detection method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to a model training method, a target detection device, a model training apparatus and a storage medium.
Background
With the rapid development of deep learning and computer vision technologies, AI image correlation techniques are widely used in many fields. Target detection, an important part of image understanding, is a core problem in the field of machine vision, and the task of the target detection is to find out a target of interest in an image and determine the position and size of the target.
At present, a Feature Pyramid network (FPN for Object Detection) is mainly trained, and target Detection is performed by using the trained FPN. The network model has a certain detection effect on large targets, and has a poor detection effect on smaller targets.
Disclosure of Invention
The embodiment of the invention provides a model training method, a target detection device, a model training device and a storage medium, which can optimize the existing model training scheme and further improve the accuracy of a small target detection result.
In a first aspect, an embodiment of the present invention provides a model training method, including:
inputting an obtained image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, wherein the resolution of the initial reference image is greater than that of the first reference image, and the resolution of the first reference image is greater than that of the second reference image;
determining a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image; the first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image;
and performing back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to realize the training of the first target detection model.
In a second aspect, an embodiment of the present invention further provides a target detection method, including:
acquiring an original image to be detected;
inputting the original image into a preset target detection model, and obtaining position coordinates of a target object in the original image output by the preset target detection model, wherein the preset target detection model is obtained by training according to the method of the first aspect.
In a third aspect, an embodiment of the present invention further provides a model training apparatus, including:
the training sample input module is used for inputting the acquired image training samples into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, wherein the resolution of the initial reference image is greater than that of the first reference image, and the resolution of the first reference image is greater than that of the second reference image;
a loss value determining module for determining a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image; the first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image;
and the model training module is used for carrying out back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to realize the training of the first target detection model.
In a fourth aspect, an embodiment of the present invention further provides an object detection apparatus, including:
the image acquisition module is used for acquiring an original image to be detected;
an image input module, configured to input the original image into a preset target detection model, and obtain a position coordinate of a target object in the original image output by the preset target detection model, where the preset target detection model is obtained by training according to the method of the first aspect.
In a fifth aspect, an embodiment of the present invention further provides an electronic device, including:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, implement the method of the first aspect or the second aspect.
In a sixth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to the first or second aspect.
According to the training scheme of the target detection model provided by the embodiment of the invention, the acquired image training sample is input into the first target detection model to obtain the initial reference image, the first reference image and the second reference image, the first loss value of the second reference image and the first fusion image and the second loss value of the first reference image and the second fusion image are determined, and the first target detection model is subjected to back propagation based on the comprehensive loss value formed by the first loss value and the second loss value to obtain the second target detection model, so that the training of the first target detection model is realized. By adopting the technical scheme, the first reference image and the second reference image which are obtained by the first target detection model can be fused to obtain the first fused image, the first fused image and the initial reference image are fused to obtain the second fused image, and then the first target detection model is trained according to the first loss value of the second reference image and the first fused image and the second loss value of the first reference image and the second fused image, so that the trained model is more accurate, and the accuracy of a small target detection result can be improved when the obtained model is used for small target detection.
Drawings
Fig. 1 is a flowchart of a model training method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a first target detection model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a texture migration network according to an embodiment of the present invention;
FIG. 4 is a flowchart of a model training method according to a second embodiment of the present invention;
fig. 5 is a schematic diagram of a target object detection result according to a second embodiment of the present invention;
fig. 6 is a schematic diagram of a real result of a target object according to a second embodiment of the present invention;
fig. 7 is a flowchart of a target detection method according to a third embodiment of the present invention;
fig. 8 is a structural diagram of a model training apparatus according to a fourth embodiment of the present invention;
fig. 9 is a structural diagram of an object detection apparatus according to a fifth embodiment of the present invention;
fig. 10 is a structural diagram of an electronic device according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
Example one
Fig. 1 is a flowchart of a model training method according to an embodiment of the present invention, which is applicable to training a target detection model, where the target detection model may be used to detect a small target in an image, and the small target may be a traffic light at a traffic intersection or a pedestrian at a distance. The method may be performed by a model training apparatus, which may be implemented by software and/or hardware, and may be integrated in an electronic device, which may be an intelligent device with data processing function, such as a notebook, a desktop, a server, or a vehicle-mounted terminal. Referring to fig. 1, the method may include the steps of:
s110, inputting the acquired image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image.
Wherein the resolution of the initial reference picture is greater than the resolution of the first reference picture, which is greater than the resolution of the second reference picture. The image training samples are image samples that can be used for learning, and the present embodiment can train the first target detection model using the image training samples. Specifically, an image training sample can be obtained from a big data platform, and an image can also be collected by an image collecting device as the image training sample, wherein the image collecting device can be a camera, a video camera, or other devices capable of collecting images. The sizes of the selected image training samples can be the same or different.
The first target detection model is a network model for detecting the image to determine the type and the position of the target object in the image, and the structure of the first target detection model can be determined as appropriate, for example, the conventional FPN can be directly adopted, or the conventional FPN can be improved according to the requirements. The target object may be any object included in the image, in the embodiment, a small object included in the image is taken as an example of the target object, and the small object is an object occupying a small area in the image, for example, a traffic light or a pedestrian at a far distance may be taken as the target object. The small targets in the image can be detected by training the first target detection model, and the method can be applied to traffic light detection, for example, to assist a driver to make a decision in advance and reduce the occurrence of traffic accidents or violation incidents.
The initial reference image, the first reference image, and the second reference image are images of different scales obtained by the first target detection model based on the image training sample, where each scale corresponds to one resolution, and in this embodiment, the resolution of the initial reference image > the resolution of the first reference image > the resolution of the second reference image. Taking the FPN and the texture migration network as an example to construct the first target detection model, the determination process of the initial reference image, the first reference image, and the second reference image is schematically described below, and it should be noted that other types of target detection models may also be adopted in the embodiment of the present invention. Fig. 2 is a schematic structural diagram of a first target detection model according to an embodiment of the present invention, as shown in fig. 2, the structure in the dashed box is a conventional FPN, C1-C5 are images at different scales, the corresponding resolutions decrease sequentially, and the resolutions increase sequentially from top to bottom for P5-P2, where P5 may be obtained by performing a convolution operation on C5, P4-P2 may be obtained based on the left image and the adjacent previous image, for example, P5 may be upsampled to make the size of the P5 consistent with that of C4, and then the determination process of P4 is obtained by overlapping the upsampled P4, and the determination process of P3 and P2 is similar to that of P4. The embodiment takes C1 as the initial reference picture, C2 as the first reference picture, and P3 as the second reference picture.
And S120, determining a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image.
The first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image. The fused image may be an image obtained by superimposing images of the same size, in this embodiment, the first fused image may be obtained by superimposing the first reference image and the second reference image, and considering that the resolutions of the first reference image and the second reference image are different and the sizes of the images are different, the first reference image or the second reference image may be preprocessed before superimposing the first reference image and the second reference image, for example, the first reference image may be downsampled to keep the sizes of the first reference image and the second reference image consistent, or the second reference image may be upsampled to keep the sizes of the second reference image and the first reference image consistent. Considering that the smaller the image is, the less image features it retains, in order to improve the accuracy of the model, the embodiment up-samples the second reference image, and superimposes the up-sampled second reference image with the first reference image to obtain the first fused image. Similarly, the first fused image may be up-sampled and overlaid with the initial reference image to obtain a second fused image.
The following schematically describes the determination process of the first fused image and the second fused image with reference to the specific structure of the texture migration network. Fig. 3 is a schematic structural diagram of a texture migration network according to a first embodiment of the present invention, as shown in fig. 3, in consideration that a resolution of a second reference image is smaller than that of a first reference image, the texture migration network inputs a first reference image and a second reference image, that is, C2 and P3 in fig. 2, first performs upsampling processing on the second reference image to obtain a first upsampled image, then splices the first upsampled image with the first reference image to obtain a first spliced image, then inputs the first spliced image to a texture feature extraction module, extracts texture features of the first spliced image to obtain a texture feature image, and simultaneously inputs the second reference image to a content feature extraction module, extracts content features of the second reference image to obtain a content feature image, and then performs upsampling processing on the content feature image to obtain a second upsampled image, and finally, overlapping the second up-sampling image and the texture feature image to obtain a first fusion image. The upsampling is to ensure that the sizes of the two cascaded images are consistent, and the specific parameters can be set according to actual conditions, for example, the sub-pixel convolution can be used for amplifying by 1 time to obtain a first upsampled image and a second upsampled image. Stitching may also be understood as superimposing, i.e. stitching the first up-sampled image and the first reference image corresponds to superimposing the first up-sampled image and the first reference image. In the embodiment, the C2 and the P3 are fused, so that the finally obtained first fused image not only contains rich texture features, but also contains rich semantic content, and the accuracy of the model can be improved when the first target detection model is trained. The textural features may be contours, shapes, etc. of the object. As shown in fig. 2, the first fused image is upsampled to obtain a third upsampled image, and the third upsampled image and the initial reference image are superimposed to obtain a second fused image. Optionally, the following formula may be determined according to the determination process of the first fused image and the second fused image, and the first fused image and the second fused image are represented, which is simple and convenient:
M1=TE(C2||P3↑*2)+CE(P3)↑*2
M2=M1↑*2+C1
where M1 represents the first fused image, M2 represents the second fused image, TE represents texture feature extraction, CE represents content feature extraction, | represents stitching, # represents upsampling, and # 2 represents an upsampling coefficient of 2, e.g., P3 × # 2 represents a doubling of P3.
The loss value generally refers to a difference between a predicted value and a true value of the model, and in this embodiment, a difference between the second reference image and the first fused image is denoted as a first loss value, and a difference between the first reference image and the second fused image is denoted as a second loss value. The size of the loss value may be determined by a loss function, and the loss function may be determined according to actual conditions, and the embodiment is not limited.
S130, performing back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to train the first target detection model.
The synthetic loss value is a loss value obtained by accumulating the first loss value and the second loss value, and is used for performing back propagation on the first target detection model. In the training process of the network model, the back propagation method can continuously update and adjust the network weight (also called as a filter) until the output of the network is consistent with the target, and the method is an effective gradient calculation method. In this embodiment, after the synthetic loss value is determined, the first target detection model is subjected to back propagation by using the synthetic loss value, so as to obtain a second target detection model. The embodiment does not limit the specific back propagation process, and can be set according to specific situations.
The embodiment of the invention provides a model training method, which includes inputting an acquired image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, determining a first loss value of the second reference image and a first fusion image and a second loss value of the first reference image and the second fusion image, and performing back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to realize training of the first target detection model. By adopting the technical scheme, the first reference image and the second reference image which are obtained by the first target detection model can be fused to obtain the first fused image, the first fused image and the initial reference image are fused to obtain the second fused image, and then the first target detection model is trained according to the first loss value of the second reference image and the first fused image and the second loss value of the first reference image and the second fused image, so that the trained model is more accurate, and the accuracy of a small target detection result can be improved when the obtained model is used for small target detection.
Example two
Fig. 4 is a flowchart of a model training method according to a second embodiment of the present invention, where the present embodiment is optimized based on the foregoing embodiment, and referring to fig. 4, the method may include the following steps:
s210, inputting the acquired image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image.
And S220, acquiring a detection result of the target object output by the first target detection model.
The detection result of the target object output by the first target detection model, that is, the result output by the Predict module in fig. 2, includes the type of the target object and the position coordinate of the target object, the position coordinate of the target object may be represented by the area coordinate of the area where the target object is located, the area where the target object is located may be identified in the form of a rectangular frame, that is, the position coordinate of the rectangular frame may be used as the position coordinate of the target object, and of course, other forms may be used to identify the target object, and the embodiment is not particularly limited. The type of the target object can be generally obtained by means of image recognition, which is currently mature, and the embodiment mainly refers to the position coordinates of the target object.
S230, determining a balance loss function according to the detection result of the target object, the real result of the target object in the image training sample, the second fusion image and the corresponding image training sample.
The balance loss function is used for balancing the area between the small target and the large background of the image, and it can be understood that the small target occupies a smaller area in the whole image, and the corresponding loss value is relatively smaller. Alternatively, the balance loss function may be determined as follows:
s2301, determining a target foreground loss value of the target object according to the detection result of the target object and the real result of the target object in the image training sample by combining a target foreground loss function.
For example, referring to fig. 5 to 6, fig. 5 is a schematic diagram of a target object detection result according to a second embodiment of the present invention, a dashed box in fig. 5 is a position of a target object detected by a first target detection model, fig. 6 is a schematic diagram of a real result of a target object according to a second embodiment of the present invention, and a dashed box in fig. 6 is a real position of the target object in an image. The target foreground loss function is a loss value for determining a dashed box in fig. 5 and a dashed box in fig. 6, and the form thereof may be set according to the situation, for example, may be set as:
Figure BDA0002830962070000101
wherein loss _ obj is the target foreground loss of the target object, FgtFor training samples for images, i.e. raw images, FgenerateFor the second fused image, i.e. M2 in fig. 2,
Figure BDA0002830962070000102
is the real result of the target object, i.e. the area corresponding to the dashed box in fig. 6, FobjThe detection result of the target object, that is, the area corresponding to the dashed line box in fig. 5, N is the number of pixels included in the dashed line box corresponding to the real result of the target object, and (x, y) represents the coordinates of the pixels. The target foreground loss value of the target object can be determined through the formula. Of course other types of target foreground loss functions may be employed.
S2302, determining a global background loss value according to the second fusion image and the corresponding image training sample by combining a global background loss function.
The global background loss in this embodiment is a loss for the entire image corresponding to fig. 5 and the entire image corresponding to fig. 6, that is, a loss between the second fused image and the corresponding image training sample. Optionally, the global background loss function is of the form:
loss_glob(Fgt,Fgenerate)=||Fgt-Fgenerate||2
the loss _ glob is a global background loss, but other types of global background loss functions may be used.
S2303, determining a target weight corresponding to the target foreground loss function and a global weight corresponding to the global background loss function according to the target foreground loss value and the global background loss value.
In order to balance the area balance of the small target and the large background, the embodiment sets weights for the target foreground loss function and the global background loss function respectively based on the target foreground loss value and the global background loss value, for example, a larger target weight may be set for the target foreground loss function to match with the global background loss, and the global weight may be set to 1; or setting a smaller global weight for the global background loss function to match the global weight with the target foreground loss, wherein the target weight can be set to be 1; target weights and global weights, both of which are not 1, may also be set separately. The specific size of the weights may be determined according to circumstances.
S2304, determining the balance loss function according to the target foreground loss function and the corresponding target weight as well as the global loss function and the corresponding global weight.
Illustratively, the form of the balance loss function is as follows:
loss_b(Fgt,Fgenerate)=loss_glob(Fgt,Fgenerate)+λloss_obj(Fgt,Fgenerate)
where loss _ b is the balance loss and λ is the target weight, which may also be referred to as a balance factor. The balance loss function takes as an example the adjustment of the target foreground loss to match the global background loss.
S240, according to the balance loss function, combining the second reference image and the first fusion image to obtain a first loss value.
Optionally, the second reference image may be upsampled instead of FgtThe first fused image replaces FgenerateThe first loss value, which may also be referred to as a first equilibrium loss value, is obtained by substituting the above equation.
And S250, combining the first reference image and the second fusion image according to the balance loss function to obtain a second loss value.
Similarly, the first reference image may be upsampled instead of FgtThe second fused image replaces FgenerateThe second loss value can be obtained by substituting the above formula, and can also be called as a second equilibrium loss value.
S260, performing back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to train the first target detection model.
The composite loss value may be obtained by adding the first loss value and the second loss value, and may be determined, for example, by the following composite loss function:
loss_supervision=loss_b(P3↑*2,M1)+loss_b(C2↑*2,M2)
wherein loss _ persistence is a synthetic loss. Specifically, the first target detection model may be propagated backward by using the composite loss value until the obtained composite loss value meets a preset condition, for example, when the composite loss value is converged, the training is finished, and the current first target detection model is marked as a second target detection model for subsequent small target detection.
The second embodiment of the invention provides a model training method, which improves the FPN on the basis of the first embodiment, introduces a texture migration network, fuses texture features and content features of an image, enriches the features of the image, improves the accuracy of the model, introduces a balance factor, balances the areas of a small target and a large background, further improves the accuracy of the model, and also enhances the robustness of the model.
EXAMPLE III
Fig. 7 is a flowchart of a target detection method according to a third embodiment of the present invention, where the third embodiment of the present invention may be used to detect a small target in an image, where the small target may be a traffic light at a traffic intersection or a pedestrian at a distance. The method may be performed by an object detection apparatus, which may be implemented by software and/or hardware, and may be integrated in an electronic device, which may be an intelligent device with data processing function, such as a notebook, a desktop, or a server. Referring to fig. 7, the method may include the steps of:
s310, acquiring an original image to be detected.
S320, inputting the original image into a preset target detection model, and acquiring the position coordinates of the target object in the original image output by the preset target detection model.
The preset target detection model is obtained by training by adopting any one of the training methods of the target detection models provided by the embodiment of the invention.
According to the target detection method provided by the embodiment of the invention, the preset target detection model is obtained by adopting the training method of the target detection model provided by the embodiment, and then the small target detection is carried out based on the preset target detection model, so that the accuracy of the small target detection result is improved.
Example four
Fig. 8 is a structural diagram of a model training apparatus according to a fourth embodiment of the present invention, which may execute the model training method according to the foregoing embodiment, and referring to fig. 8, the apparatus may include:
a training sample input module 41, configured to input an acquired image training sample into a first target detection model, so as to obtain an initial reference image, a first reference image, and a second reference image, where a resolution of the initial reference image is greater than a resolution of the first reference image, and a resolution of the first reference image is greater than a resolution of the second reference image;
a loss value determining module 42, configured to determine a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image; the first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image;
a model training module 43, configured to perform back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value, so as to obtain a second target detection model, so as to implement training of the first target detection model.
The fourth embodiment of the present invention provides a model training apparatus, where an acquired image training sample is input into a first target detection model to obtain an initial reference image, a first reference image, and a second reference image, a first loss value of the second reference image and a first fused image and a second loss value of the first reference image and a second fused image are determined, and the first target detection model is subjected to back propagation based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model, so as to implement training of the first target detection model. By adopting the technical scheme, the first reference image and the second reference image which are obtained by the first target detection model can be fused to obtain the first fused image, the first fused image and the initial reference image are fused to obtain the second fused image, and then the first target detection model is trained according to the first loss value of the second reference image and the first fused image and the second loss value of the first reference image and the second fused image, so that the trained model is more accurate, and the accuracy of a small target detection result can be improved when the obtained model is used for small target detection.
On the basis of the above embodiment, the determination process of the first fused image is as follows:
performing upsampling processing on the second reference image to obtain a first upsampled image;
performing texture feature extraction on a first spliced image obtained based on the first up-sampled image and the first reference image to obtain a texture feature image; performing content feature extraction on the second reference image to obtain a content feature image;
performing upsampling processing on the content characteristic image to obtain a second upsampled image;
and fusing the texture feature image and the second up-sampling image to obtain the first fused image.
On the basis of the above embodiment, the determination process of the second fused image is as follows:
performing upsampling processing on the first fusion image to obtain a third upsampled image;
and fusing the third up-sampling image and the initial reference image to obtain the second fused image.
On the basis of the above embodiment, the loss value determination module 42 includes:
a detection result acquisition unit configured to acquire a detection result of the target object output by the first target detection model;
a balance loss function determining unit, configured to determine a balance loss function according to the detection result of the target object, the real result of the target object in the image training sample, the second fusion image, and the corresponding image training sample;
a first loss value determining unit, configured to obtain a first loss value by combining the second reference image and the first fused image according to the balance loss function;
and the second loss value determining unit is used for combining the first reference image and the second fusion image according to the balance loss function to obtain a second loss value.
On the basis of the foregoing embodiment, the balance loss function determining unit is specifically configured to:
determining a target foreground loss value of the target object by combining a target foreground loss function according to the detection result of the target object and the real result of the target object in the image training sample;
determining a global background loss value according to the second fusion image and the corresponding image training sample by combining a global background loss function;
determining a target weight corresponding to the target foreground loss function and a global weight corresponding to the global background loss function according to the target foreground loss value and the global background loss value;
and determining the balance loss function according to the target foreground loss function and the corresponding target weight as well as the global loss function and the corresponding global weight.
On the basis of the above embodiment, the first target detection model is constructed based on a feature pyramid network and a texture migration network.
The model training device provided by the embodiment of the invention can be used for executing the model training method provided by the embodiment, and has corresponding functions and beneficial effects.
EXAMPLE five
Fig. 9 is a structural diagram of an object detection apparatus according to a fifth embodiment of the present invention, which may perform the object detection method according to the foregoing embodiment, and referring to fig. 9, the apparatus may include:
an image obtaining module 51, configured to obtain an original image to be detected;
the image input module 52 is configured to input the original image into a preset target detection model, and obtain a position coordinate of a target object in the original image output by the preset target detection model, where the preset target detection model is obtained by training using the training method of the target detection model provided in the embodiment of the present invention.
According to the target detection device provided by the embodiment of the invention, the preset target detection model is obtained by adopting the training method of the target detection model provided by the embodiment, and then the small target detection is carried out based on the preset target detection model, so that the accuracy of the small target detection result is improved.
EXAMPLE six
Fig. 10 is a structural diagram of an electronic device according to a sixth embodiment of the present invention, and referring to fig. 10, the electronic device may include: the electronic device comprises a processor 61, a memory 62, an input device 63 and an output device 64, the number of the processors 61 in the electronic device can be one or more, one processor 61 is taken as an example in fig. 10, the processor 61, the memory 62, the input device 63 and the output device 64 in the electronic device can be connected through a bus or in other ways, and the connection through the bus is taken as an example in fig. 10.
The memory 62 is a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the model training method or the object detection method in the embodiments of the present invention. The processor 61 executes various functional applications and data processing of the electronic device, namely, the model training method or the target detection method of the above-described embodiments, by executing the software programs, instructions, and modules stored in the memory 62.
The memory 62 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 62 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 62 may further include memory located remotely from the processor 61, which may be connected to the electronic device through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 63 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus. The output device 64 may include a display device such as a display screen, and an audio device such as a speaker and a buzzer.
The electronic device provided by the embodiment of the present invention is the same as the model training method or the target detection method provided by the above embodiment, and the technical details that are not described in detail in the embodiment can be referred to the above embodiment.
EXAMPLE seven
A seventh embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is used to execute a model training method when executed by a processor, and the method includes:
inputting an obtained image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, wherein the resolution of the initial reference image is greater than that of the first reference image, and the resolution of the first reference image is greater than that of the second reference image;
determining a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image; the first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image;
and performing back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to realize the training of the first target detection model.
The program, when executed by a processor, is further for performing an object detection method, the method comprising:
acquiring an original image to be detected;
inputting the original image into a preset target detection model, and obtaining the position coordinates of the target object in the original image output by the preset target detection model, wherein the preset target detection model is obtained by adopting the training method of the target detection model provided by the embodiment of the invention.
Storage media for embodiments of the present invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), an Erasable Programmable Read Only Memory (EPROM), a flash Memory, an optical fiber, a portable CD-ROM, an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. A computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take a variety of forms, including, but not limited to: an electromagnetic signal, an optical signal, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, Radio Frequency (RF), etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method of model training, comprising:
inputting an obtained image training sample into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, wherein the resolution of the initial reference image is greater than that of the first reference image, and the resolution of the first reference image is greater than that of the second reference image;
determining a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image; the first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image;
and performing back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to realize the training of the first target detection model.
2. The method of claim 1, wherein the first fused image is determined as follows:
performing upsampling processing on the second reference image to obtain a first upsampled image;
performing texture feature extraction on a first spliced image obtained based on the first up-sampled image and the first reference image to obtain a texture feature image; performing content feature extraction on the second reference image to obtain a content feature image;
performing upsampling processing on the content characteristic image to obtain a second upsampled image;
and fusing the texture feature image and the second up-sampling image to obtain the first fused image.
3. The method of claim 1, wherein the second fused image is determined as follows:
performing upsampling processing on the first fusion image to obtain a third upsampled image;
and fusing the third up-sampling image and the initial reference image to obtain the second fused image.
4. The method of claim 1, wherein determining the first loss value for the second reference picture and the first fused picture and the second loss value for the first reference picture and the second fused picture comprises:
acquiring a detection result of the target object output by the first target detection model;
determining a balance loss function according to the detection result of the target object, the real result of the target object in the image training sample, the second fusion image and the corresponding image training sample;
according to the balance loss function, combining the second reference image and the first fusion image to obtain a first loss value;
and according to the balance loss function, combining the first reference image and the second fusion image to obtain a second loss value.
5. The method of claim 4, wherein determining the balance loss function according to the detection result of the target object, the real result of the target object in the image training sample, the second fused image and the corresponding image training sample comprises:
determining a target foreground loss value of the target object by combining a target foreground loss function according to the detection result of the target object and the real result of the target object in the image training sample;
determining a global background loss value according to the second fusion image and the corresponding image training sample by combining a global background loss function;
determining a target weight corresponding to the target foreground loss function and a global weight corresponding to the global background loss function according to the target foreground loss value and the global background loss value;
and determining the balance loss function according to the target foreground loss function and the corresponding target weight as well as the global loss function and the corresponding global weight.
6. A method of object detection, comprising:
acquiring an original image to be detected;
inputting the original image into a preset target detection model, and acquiring the position coordinates of a target object in the original image output by the preset target detection model, wherein the preset target detection model is obtained by training according to the method of any one of claims 1 to 5.
7. A model training apparatus, comprising:
the training sample input module is used for inputting the acquired image training samples into a first target detection model to obtain an initial reference image, a first reference image and a second reference image, wherein the resolution of the initial reference image is greater than that of the first reference image, and the resolution of the first reference image is greater than that of the second reference image;
a loss value determining module for determining a first loss value of the second reference image and the first fused image and a second loss value of the first reference image and the second fused image; the first fused image is obtained by fusing the first reference image and the second reference image, and the second fused image is obtained by fusing the first fused image and the initial reference image;
and the model training module is used for carrying out back propagation on the first target detection model based on a comprehensive loss value formed by the first loss value and the second loss value to obtain a second target detection model so as to realize the training of the first target detection model.
8. An object detection device, comprising:
the image acquisition module is used for acquiring an original image to be detected;
an image input module, configured to input the original image into a preset target detection model, and obtain position coordinates of a target object in the original image output by the preset target detection model, where the preset target detection model is obtained by training according to the method of any one of claims 1 to 5.
9. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, implement the method of any of claims 1-6.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202011459601.0A 2020-12-11 2020-12-11 Model training method, target detection method, device, equipment and storage medium Pending CN112434753A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011459601.0A CN112434753A (en) 2020-12-11 2020-12-11 Model training method, target detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011459601.0A CN112434753A (en) 2020-12-11 2020-12-11 Model training method, target detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112434753A true CN112434753A (en) 2021-03-02

Family

ID=74691570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011459601.0A Pending CN112434753A (en) 2020-12-11 2020-12-11 Model training method, target detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112434753A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927172A (en) * 2021-05-10 2021-06-08 北京市商汤科技开发有限公司 Training method and device of image processing network, electronic equipment and storage medium
CN113033715A (en) * 2021-05-24 2021-06-25 禾多科技(北京)有限公司 Target detection model training method and target vehicle detection information generation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046971A (en) * 2019-12-24 2020-04-21 上海眼控科技股份有限公司 Image recognition method, device, equipment and computer readable storage medium
US20200134772A1 (en) * 2018-10-31 2020-04-30 Kabushiki Kaisha Toshiba Computer vision system and method
CN111325681A (en) * 2020-01-20 2020-06-23 南京邮电大学 Image style migration method combining meta-learning mechanism and feature fusion

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134772A1 (en) * 2018-10-31 2020-04-30 Kabushiki Kaisha Toshiba Computer vision system and method
CN111046971A (en) * 2019-12-24 2020-04-21 上海眼控科技股份有限公司 Image recognition method, device, equipment and computer readable storage medium
CN111325681A (en) * 2020-01-20 2020-06-23 南京邮电大学 Image style migration method combining meta-learning mechanism and feature fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DENG, CHUNFANG等: "Extended Feature Pyramid Network for Small Object Detection", 《ARXIV:2003.07021V1》, pages 1 - 16 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112927172A (en) * 2021-05-10 2021-06-08 北京市商汤科技开发有限公司 Training method and device of image processing network, electronic equipment and storage medium
CN112927172B (en) * 2021-05-10 2021-08-24 北京市商汤科技开发有限公司 Training method and device of image processing network, electronic equipment and storage medium
CN113033715A (en) * 2021-05-24 2021-06-25 禾多科技(北京)有限公司 Target detection model training method and target vehicle detection information generation method

Similar Documents

Publication Publication Date Title
CN111626208B (en) Method and device for detecting small objects
US20230394671A1 (en) Image segmentation method and apparatus, and device, and storage medium
CN109086668B (en) Unmanned aerial vehicle remote sensing image road information extraction method based on multi-scale generation countermeasure network
CN110378222B (en) Method and device for detecting vibration damper target and identifying defect of power transmission line
CN109003297B (en) Monocular depth estimation method, device, terminal and storage medium
CN111597953A (en) Multi-path image processing method and device and electronic equipment
CN104101348A (en) Navigation system and method for displaying map on navigation system
CN112434753A (en) Model training method, target detection method, device, equipment and storage medium
CN112308856A (en) Target detection method and device for remote sensing image, electronic equipment and medium
CN114612872A (en) Target detection method, target detection device, electronic equipment and computer-readable storage medium
CN116597413A (en) Real-time traffic sign detection method based on improved YOLOv5
CN112712036A (en) Traffic sign recognition method and device, electronic equipment and computer storage medium
CN116844129A (en) Road side target detection method, system and device for multi-mode feature alignment fusion
CN113743163A (en) Traffic target recognition model training method, traffic target positioning method and device
CN114385662A (en) Road network updating method and device, storage medium and electronic equipment
CN114529890A (en) State detection method and device, electronic equipment and storage medium
CN117058647B (en) Lane line processing method, device and equipment and computer storage medium
CN116258756B (en) Self-supervision monocular depth estimation method and system
CN116188587A (en) Positioning method and device and vehicle
CN112052863B (en) Image detection method and device, computer storage medium and electronic equipment
CN113225586B (en) Video processing method and device, electronic equipment and storage medium
CN114627400A (en) Lane congestion detection method and device, electronic equipment and storage medium
CN115546769B (en) Road image recognition method, device, equipment and computer readable medium
CN115049895B (en) Image attribute identification method, attribute identification model training method and device
CN116580384A (en) Target detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination