CN112288003B - Neural network training and target detection method and device - Google Patents

Neural network training and target detection method and device Download PDF

Info

Publication number
CN112288003B
CN112288003B CN202011174690.4A CN202011174690A CN112288003B CN 112288003 B CN112288003 B CN 112288003B CN 202011174690 A CN202011174690 A CN 202011174690A CN 112288003 B CN112288003 B CN 112288003B
Authority
CN
China
Prior art keywords
neural network
target
image
similarity
input image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011174690.4A
Other languages
Chinese (zh)
Other versions
CN112288003A (en
Inventor
赵翔
彭韬
陈甜甜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN202011174690.4A priority Critical patent/CN112288003B/en
Publication of CN112288003A publication Critical patent/CN112288003A/en
Application granted granted Critical
Publication of CN112288003B publication Critical patent/CN112288003B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a neural network training and target detection method and device, wherein the method comprises the following steps: inputting an input image into a neural network to obtain a predicted position of a target in the input image; calculating the similarity between the image area at the predicted position and a target in a pre-acquired target type library; judging whether the similarity meets a preset convergence condition or not; and if the similarity is not met, adjusting parameters in the neural network based on the similarity, taking the image area as an input image, and returning to the step of inputting the input image into the neural network until the similarity meets a preset convergence condition, so as to obtain a trained neural network model. Therefore, in the scheme, the training of the neural network is guided through the similarity between the image area at the predicted position and the target in the pre-acquired target type library, and the manual labeling is not needed in the training process, so that the labor consumption is reduced.

Description

Neural network training and target detection method and device
Technical Field
The invention relates to the technical field of deep learning, in particular to a neural network training and target detection method and device.
Background
In some scenarios, it is desirable to identify objects in an image, e.g., to identify a logo, trademark, logo, etc. in an image. Currently, recognition models are typically trained based on neural networks, with the recognition models being utilized to recognize objects in the image. In general, a scheme for training a neural network includes: the method comprises the steps of obtaining a sample image, manually labeling the sample image, such as position labeling, type labeling and the like, inputting the sample image into a neural network, and iteratively adjusting the neural network based on output of the neural network and manually labeled data.
However, in the above-mentioned scheme, the training process includes manual labeling, which consumes much manpower.
Disclosure of Invention
The embodiment of the invention aims to provide a neural network training and target detection method and device so as to reduce manpower consumption. The specific technical scheme is as follows:
to achieve the above object, an embodiment of the present invention provides a neural network training method, including:
inputting an input image into a neural network to obtain a predicted position of a target in the input image;
calculating the similarity between the image area at the predicted position and a target in a pre-acquired target type library;
Judging whether the similarity meets a preset convergence condition or not;
and if the similarity is not met, adjusting parameters in the neural network based on the similarity, taking the image area as an input image, and returning to the step of inputting the input image into the neural network until the similarity meets a preset convergence condition, so as to obtain a trained neural network model.
Optionally, the inputting the input image into the neural network to obtain the predicted position of the target in the input image includes:
extracting image features of the input image as first image features;
and inputting the first image characteristic into the neural network to obtain the position of a predicted target output by the neural network in the input image.
Optionally, the obtaining the position of the predicted target output by the neural network in the input image includes:
judging whether the neural network outputs the positions of a plurality of prediction targets in the input image or not;
if yes, judging the type and probability of the predicted target, and outputting the position of the predicted target with the highest probability in the input image;
if not, outputting the position of the prediction target in the input image.
Optionally, the calculating the similarity between the image area at the position and the target in the pre-acquired target type library includes:
based on the position, intercepting an image area where the prediction target is located in the input image;
extracting image features of the image area as second image features;
and calculating the similarity between the second image feature and the feature of the target in the pre-acquired target type library.
Optionally, after the input image is input to the neural network to obtain the predicted position of the target in the input image, the method further includes:
determining the prediction times corresponding to the target;
setting negative excitation for adjusting the neural network based on the prediction times;
the adjusting parameters in the neural network based on the similarity includes:
based on the similarity and the negative stimulus, parameters in the neural network are adjusted.
Optionally, before the calculating the similarity between the image area at the position and the target in the pre-acquired target type library, the method further includes:
judging whether the predicted position is matched with the position predicted last time;
If not, executing the step of calculating the similarity between the image area at the position and the target in the pre-acquired target type library;
if so, parameters in the neural network are adjusted based on the similarity and the negative stimulus.
In order to achieve the above object, an embodiment of the present invention further provides a target detection method, including:
acquiring an image to be detected;
and inputting the image to be detected into the neural network model obtained by any one of the methods, and obtaining a detection result output by the neural network model.
To achieve the above object, an embodiment of the present invention further provides a neural network training device, including:
the prediction module is used for inputting an input image into the neural network to obtain a predicted position of a target in the input image;
the calculation module is used for calculating the similarity between the image area at the predicted position and a target in a target type library acquired in advance;
the judging module is used for judging whether the similarity meets a preset convergence condition or not;
and the updating module is used for adjusting parameters in the neural network based on the similarity, taking the image area as an input image, and triggering the prediction module until the similarity meets a preset convergence condition to obtain a trained neural network model.
Optionally, the prediction module includes:
the first extraction submodule is used for extracting image features of the input image to serve as first image features;
and the prediction sub-module is used for inputting the first image characteristic into the neural network to obtain the position of a predicted target output by the neural network in the input image.
Optionally, the prediction submodule is specifically configured to:
judging whether the neural network outputs the positions of a plurality of prediction targets in the input image or not;
if yes, judging the type and probability of the predicted target, and outputting the position of the predicted target with the highest probability in the input image;
if not, outputting the position of the prediction target in the input image.
Optionally, the computing module includes:
the intercepting sub-module is used for intercepting an image area where the prediction target is located in the input image based on the position;
the second extraction submodule is used for extracting the image characteristics of the image area and taking the image characteristics as second image characteristics;
and the computing sub-module is used for computing the similarity between the second image characteristic and the characteristic of the target in the pre-acquired target type library.
Optionally, the apparatus further includes:
the determining module is used for determining the prediction times corresponding to the target; setting negative excitation for adjusting the neural network based on the prediction times;
the updating module is specifically used for: based on the similarity and the negative stimulus, parameters in the neural network are adjusted.
Optionally, the apparatus further includes:
the selection module is used for judging whether the predicted position is matched with the position predicted last time; if the two types of the data are not matched, triggering the computing module; and if the two types of data are matched, triggering the updating module.
In order to achieve the above object, an embodiment of the present invention further provides an object detection apparatus, including:
the acquisition module is used for acquiring the image to be detected;
the detection module is used for inputting the image to be detected into the neural network model obtained by any device to obtain a detection result output by the neural network model.
By applying the embodiment of the invention, an input image is input into a neural network to obtain the predicted position of a target in the input image; calculating the similarity between an image area at a predicted position and a target in a target type library acquired in advance, and carrying out iterative adjustment on parameters in the neural network based on the similarity; therefore, in the scheme, the training of the neural network is guided through the similarity between the image area at the predicted position and the target in the pre-acquired target type library, and the manual labeling is not needed in the training process, so that the labor consumption is reduced.
Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a neural network training method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a structural model of a neural network training method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a second flow of a neural network training method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a target detection method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a neural network training device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the technical solutions according to the embodiments of the present invention will be given with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to achieve the above objective, embodiments of the present invention provide a method and apparatus for training a neural network and detecting a target, which can be applied to various electronic devices, and is not limited in particular. The neural training network for training the target detection, such as identifying the station logo, trademark, etc., is not particularly limited. The neural network training method will be described in detail first.
Fig. 1 is a schematic flow chart of a neural network training method according to an embodiment of the present invention, including:
s101: and inputting the input image into a neural network to obtain the predicted position of the target in the input image.
The input image is an image containing a station logo, a trademark, a car logo and the like, and the neural network is a neural network to be trained and is used for target detection by taking the station logo, the trademark and the like as targets.
In one embodiment, S101 may include: extracting image features of the input image, and inputting the first image features into the neural network as first image features to obtain the position of a predicted target output by the neural network in the input image, wherein the image features can comprise any one of the following components, for example: color features, texture features, shape features, spatial relationship features, and the like, are not particularly limited. The image features may be extracted using a convolutional network, such as ResNeXt50 (a sort network), the particular convolutional network not being limited.
For example, the image features may be input to a fully connected layer of the neural network, and the position of the prediction target in the input image is obtained by a Sigmoid function (S-type function) in the fully connected layer.
In one embodiment, the obtaining the position of the predicted target output by the neural network in the input image may include: judging whether the neural network outputs the positions of a plurality of prediction targets in the input image or not; if yes, judging the type and probability of the predicted target, and outputting the position of the predicted target with the highest probability in the input image; if not, directly outputting the position of the prediction target in the input image.
For example, if there is one object in the input image, the extracted image features are input into the neural network by performing image feature extraction on the input image, and the neural network outputs two predicted positions including the object. At this time, the target class and the probability are determined for the two predicted positions including the target, and if the probabilities of the two predicted positions are 0.8 and 0.9 when the target classes are the same, the predicted position having the probability of 0.9 is output as the predicted target position. The number of positions of the plurality of predicted targets in the input image may be 2,3,4, etc., and is not particularly limited. The prediction target position obtained by processing the plurality of prediction positions may be processed by non-maximum suppression processing, that is, suppression of elements other than maximum values, and the like, and the specific processing mode is not limited.
In one embodiment, a determination is made as to whether the predicted position matches the last predicted position; if not, it means that the target in the current predicted position is inconsistent with the target in the last predicted position, or that the similarity between the image area at the current predicted position and the target in the pre-acquired target type library is different from the similarity obtained by the last calculation, so that S102 is executed; if the image area at the predicted position is matched with the target in the pre-acquired target type library, the similarity between the image area and the target is the same as the similarity obtained by the last calculation, and S104 is executed.
S102: and calculating the similarity between the image area at the predicted position and the target in the pre-acquired target type library.
For example, in the case that the neural network is used to identify a station caption, the target type library is an image library including a station caption of a television station or a station caption of a video website, and the target is a station caption; the neural network is used for identifying trademarks, the target type library is an image library containing trademarks of various commodities, and the target is the trademark; the neural network is used for identifying the vehicle logo, the target type library is an image library containing the vehicle logo of each brand of vehicle, and the target is the vehicle logo.
In one embodiment, S102 may include: based on the predicted position, intercepting an image area where the predicted target is located in the input image; extracting image features of the image area as second image features; and calculating the similarity between the second image feature and the feature of the target in the pre-acquired target type library.
For example, the neural network may output the predicted position in the form of an identification frame, which may be rectangular, circular, trapezoidal, etc., and the specific shape is not limited. Taking a rectangular identification frame as an example, the predicted position includes four angular coordinate values, namely an upper left-hand abscissa x1, an upper left-hand ordinate y1, a lower right-hand abscissa x2 and a lower right-hand ordinate y2, and the four coordinate values are taken as vertexes in an input image to intercept the image as an image area where a target is located. And extracting the image characteristics of the image area, wherein the image characteristics of the image area can be extracted by utilizing a convolution network, and the extracted image characteristics are used as second image characteristics. The image features may be color features, texture features, shape features, spatial relationship features, and the like, and are not particularly limited. The convolution network for extracting the image features may be ResNeXt50, etc., and is not particularly limited. There are various ways of calculating the similarity between the second image feature and the feature of the object in the object type library, for example, the calculating the similarity may be calculating cosine similarity between features, euclidean distance, manhattan distance, etc., and the specific way of calculating the similarity is not limited.
In one embodiment, after obtaining the predicted position of the target in the input image, determining the number of predictions corresponding to the target; and setting negative excitation for adjusting the neural network based on the prediction times.
The image area where the target is located is input to the neural network, and it can be understood that the position of the target is predicted by using the neural network, where the number of predictions is the number of predictions for the same target by using the neural network.
In one case, the negative excitation may be the product of the number of predictions and a preset negative excitation coefficient, namely:
negative excitation = preset negative excitation coefficient x number of predictions
For example, the number of predictions is 3, the preset negative excitation coefficient is 0.1, then the negative excitation is 0.3 (0.1×3). The number of predictions may be 1,2,3, etc., and is not particularly limited; the preset negative excitation coefficient may be 0.1,0.01, etc., and is not particularly limited.
In another case, the negative excitation may be the product of the number of predictions, a preset negative excitation coefficient, and a preset weight, namely:
negative excitation = preset negative excitation coefficient x prediction number x preset weight
For example, the number of predictions is 3, the preset negative excitation coefficient is 0.1, the weight is 0.2, then the negative excitation is 0.06 (0.1×3×0.2). The number of predictions may be 1,2,3, etc., and is not particularly limited; the preset negative excitation coefficient may be 0.1,0.01, etc., which is not specifically limited; the preset weight may be 0.1,0.2, etc., which is not particularly limited.
S103: judging whether the similarity meets a preset convergence condition, and if so, executing S105; if not, S104 is performed.
For example, S103 may include: the similarity convergence condition may be whether the similarity is greater than a set similarity threshold, such as 0.8, and if the similarity between the second image feature and the feature of the target in the target type library obtained in advance is calculated to be 0.9, the similarity is greater than the set similarity threshold by 0.8, the similarity convergence condition is satisfied; if the similarity between the second image feature and the feature of the target in the pre-acquired target type library is calculated to be 0.5 and is not more than the set similarity threshold value of 0.8, the similarity convergence condition is not satisfied.
S104: parameters in the neural network are adjusted based on the similarity, and the image area is taken as an input image.
And adjusting parameters of the neural network based on the similarity and the negative excitation, specifically adjusting the score in the neural network loss function. The neural network loss function is:
s represents the input image of this time, a is the predicted position output by the neural network and the result of judging whether training is stopped, θ is the parameter of the neural network, R is the score, pi (s, a, θ) represents the strategy function, R (τ) represents the score function, pi represents the probability of taking a under the current state and network parameters. The calculation formula of the score is as follows:
Score = similarity-negative incentive
S105: and (5) completing the neural network training to obtain a neural network model.
The neural network model can be used for identifying station marks, trademarks and the like, and is not particularly limited. For example, the neural network model can be used to perform station logo detection on video images in a video website, so that copyright risks can be avoided.
A specific embodiment is described below, with reference to fig. 2:
inputting an input image into a neural network composed of a ResNeXt50 convolution network and an Agent, and extracting image features of the input image as first image features through the ResNeXt50 convolution network; the Agent outputs a predicted target position based on the first image feature; if the output predicted target positions are a plurality of, the discriminator judges the target category and the probability and outputs the predicted target position with the highest probability; intercepting the input image based on the predicted target position, and taking the obtained intercepted image as a new input image; calculating to obtain a second image characteristic of the new input image through a ResNeXt50 convolution network; inputting the second image feature into the neural network, and calculating the similarity of the second image feature and the target in the pre-acquired target type library by a discriminator, and outputting a score (review) consisting of a similarity score and negative stimulus; judging whether training of the neural network is completed or not based on Reward, and if the convergence condition is met, completing training; if the convergence condition is not met, the Reward is input into the Agent intelligent Agent, and parameters in the Agent intelligent Agent are adjusted based on the Reward; and outputting a predicted target position by the adjusted Agent based on the second image characteristic, and repeating the training process until the convergence condition is met.
The structural model of the neural network is shown with reference to fig. 2:
the neural network is composed of a convolutional neural network and an Agent, wherein the convolutional neural network is used for extracting characteristics of an input image; the Agent is used for outputting the predicted position of the target and judging whether training is stopped or not; the discriminator is a similarity comparison model and is used for calculating the similarity between the image area at the predicted position output by the Agent and the target in the pre-acquired target type library and feeding the similarity back to the Agent as a score.
Applying the embodiment of the invention shown in fig. 1, inputting an input image into a neural network to obtain a predicted position of a target in the input image; calculating the similarity between an image area at a predicted position and a target in a target type library acquired in advance, and carrying out iterative adjustment on parameters in the neural network based on the similarity; therefore, in the scheme, on one hand, the training of the neural network is guided through the similarity between the image area at the predicted position and the target in the pre-acquired target type library, and the manual labeling is not needed in the training process, so that the labor consumption is reduced; on the other hand, because the time for labeling data is reduced, the time for training the neural network model is shortened.
Fig. 3 is a second flowchart of a neural network training method according to an embodiment of the present invention, including:
s301: image features of an input image are extracted as first image features.
The input image is an image containing a station logo, a trademark, a car logo and the like, and the image features can comprise any one of the following, for example: color features, texture features, shape features, spatial relationship features, and the like, are not particularly limited. The image features may be extracted using a convolutional network, such as ResNeXt50, the particular convolutional network not being limited.
S302: and inputting the first image characteristic into the neural network to obtain the position of the predicted target output by the neural network in the input image.
For example, the image features may be input to a fully connected layer of the neural network, and the position of the prediction target in the input image is obtained by a Sigmoid function (S-type function) in the fully connected layer.
For example, the neural network may output the predicted position in the form of an identification frame, which may be rectangular, circular, trapezoidal, etc., and the specific shape is not limited. Taking a rectangular recognition frame as an example, the position of the predicted target in the input image includes four corner coordinate values, namely an upper left corner abscissa x1, an upper left corner ordinate y1, a lower right corner abscissa x2 and a lower right corner ordinate y2.
S303: and judging whether the neural network outputs the positions of a plurality of prediction targets in the input image.
In one implementation, determining whether the neural network outputs the positions of the plurality of prediction targets in the input image, if yes, executing S304; if not, S305 is performed.
S304: and determining the type and probability of the predicted target, and obtaining the position of the predicted target with the highest probability in the input image.
For example, if there is one object in the input image, the extracted image features are input into the neural network by performing image feature extraction on the input image, and the neural network outputs two predicted positions including the object. At this time, the target class and the probability of the two predicted positions including the target are determined, and if the probabilities of the two predicted positions are 0.8 and 0.9, respectively, in the case where the target class is the same, the predicted position having the probability of 0.9 is output as the predicted target position. The number of positions of the plurality of predicted targets in the input image may be 2,3,4, etc., and is not particularly limited. The prediction target position obtained by processing the plurality of prediction positions may be processed by non-maximum suppression processing, that is, suppression of elements other than maximum values, and the like, and the specific processing mode is not limited.
S305: and determining the prediction times corresponding to the target, and setting negative excitation for adjusting the neural network based on the prediction times.
In one embodiment, after obtaining the predicted position of the target in the input image, determining the number of predictions corresponding to the target; and setting negative excitation for adjusting the neural network based on the prediction times.
The image area where the target is located is input to the neural network, and it can be understood that the position of the target is predicted by using the neural network, where the number of predictions is the number of predictions for the same target by using the neural network.
In one case, the negative excitation may be the product of the number of predictions and a preset negative excitation coefficient, namely:
negative excitation = preset negative excitation coefficient x number of predictions
For example, the number of predictions is 3, the preset negative excitation coefficient is 0.1, then the negative excitation is 0.3 (0.1×3). The number of predictions may be 1,2,3, etc., and is not particularly limited; the preset negative excitation coefficient may be 0.1,0.01, etc., and is not particularly limited.
In another case, the negative excitation may be the product of the number of predictions, a preset negative excitation coefficient, and a preset weight, namely:
Negative excitation = preset negative excitation coefficient x prediction number x preset weight
For example, the number of predictions is 3, the preset negative excitation coefficient is 0.1, the weight is 0.2, then the negative excitation is 0.06 (0.1×3×0.2). The number of predictions may be 1,2,3, etc., and is not particularly limited; the preset negative excitation coefficient may be 0.1,0.01, etc., which is not specifically limited; the preset weight may be 0.1,0.2, etc., which is not particularly limited.
S306: and judging whether the predicted position is matched with the position predicted last time.
In one embodiment, it is determined whether the predicted position matches the last predicted position; if not, it means that the target in the current predicted position is inconsistent with the target in the last predicted position, or that the similarity between the image area at the current predicted position and the target in the pre-acquired target type library is different from the similarity obtained by the last calculation, so that S308 is executed; if so, S307 is performed, where the similarity between the image area at the predicted position and the target in the target type library obtained in advance is the same as the similarity calculated last time.
S307: parameters in the neural network are adjusted based on the similarity and the negative stimulus, and the image area is taken as an input image.
And adjusting parameters of the neural network based on the similarity and the negative excitation, specifically adjusting the score in the neural network loss function. The neural network loss function is:
s represents the input image of this time, a is the predicted position output by the neural network and the result of judging whether training is stopped, θ is the parameter of the neural network, R is the score, pi (s, a, θ) represents the strategy function, R (τ) represents the score function, pi represents the probability of taking a under the current state and network parameters. The calculation formula of the score is as follows:
score = similarity-negative incentive
S308: and calculating the similarity between the image area obtained at the predicted position and the target in the pre-acquired target type library.
For example, in the case that the neural network is used to identify a station caption, the target type library is an image library including a station caption of a television station or a station caption of a video website, and the target is a station caption; the neural network is used for identifying trademarks, the target type library is an image library containing trademarks of various commodities, and the target is the trademark; the neural network is used for identifying the vehicle logo, the target type library is an image library containing the vehicle logo of each brand of vehicle, and the target is the vehicle logo.
In one embodiment, S308 may include: based on the predicted position, intercepting an image area where the predicted target is located in the input image; extracting image features of the image area as second image features; and calculating the similarity between the second image feature and the feature of the target in the pre-acquired target type library.
For example, the neural network may output the predicted position in the form of an identification frame, which may be rectangular, circular, trapezoidal, etc., and the specific shape is not limited. Taking a rectangular identification frame as an example, the predicted position includes four angular coordinate values, namely an upper left-hand abscissa x1, an upper left-hand ordinate y1, a lower right-hand abscissa x2 and a lower right-hand ordinate y2, and the four coordinate values are taken as vertexes in an input image to intercept the image as an image area where a target is located. And extracting the image characteristics of the image area, wherein the image characteristics of the image area can be extracted by utilizing a convolution network, and the extracted image characteristics are used as second image characteristics. The image features may be color features, texture features, shape features, spatial relationship features, and the like, and are not particularly limited. The convolution network for extracting the image features may be ResNeXt50, etc., and is not particularly limited. There are various ways of calculating the similarity between the second image feature and the feature of the object in the object type library, for example, the calculating the similarity may be calculating cosine similarity between features, euclidean distance, manhattan distance, etc., and the specific way of calculating the similarity is not limited.
S309: and judging whether the similarity meets a preset convergence condition.
Judging whether the similarity meets a preset convergence condition, and executing S310 if the similarity meets the preset convergence condition; if not, S307 is performed.
For example, S309 may include: the similarity convergence condition may be whether the similarity is greater than a set similarity threshold, such as 0.8, and if the similarity between the second image feature and the feature of the target in the target type library obtained in advance is calculated to be 0.9, the similarity is greater than the set similarity threshold by 0.8, the similarity convergence condition is satisfied; if the similarity between the second image feature and the feature of the target in the pre-acquired target type library is calculated to be 0.5 and is not more than the set similarity threshold value of 0.8, the similarity convergence condition is not satisfied.
S310: and (5) completing the neural network training to obtain a neural network model.
The neural network model can be used for identifying station marks, trademarks and the like, and is not particularly limited. For example, the neural network model can be used to perform station logo detection on video images in a video website, so that copyright risks can be avoided.
A specific embodiment is described below, with reference to fig. 2:
inputting an input image into a neural network composed of a ResNeXt50 convolution network and an Agent, and extracting image features of the input image as first image features through the ResNeXt50 convolution network; the Agent outputs a predicted target position based on the first image feature; if the output predicted target positions are a plurality of, the discriminator judges the target category and the probability and outputs the predicted target position with the highest probability; intercepting the input image based on the predicted target position, and taking the obtained intercepted image as a new input image; calculating to obtain a second image characteristic of the new input image through a ResNeXt50 convolution network; inputting the second image feature into the neural network, and calculating the similarity of the second image feature and the target in the pre-acquired target type library by a discriminator, and outputting a score (review) consisting of a similarity score and negative stimulus; judging whether training of the neural network is completed or not based on Reward, and if the convergence condition is met, completing training; if the convergence condition is not met, the Reward is input into the Agent intelligent Agent, and parameters in the Agent intelligent Agent are adjusted based on the Reward; and outputting a predicted target position by the adjusted Agent based on the second image characteristic, and repeating the training process until the convergence condition is met.
The structural model of the neural network is shown with reference to fig. 2:
the neural network is composed of a convolutional neural network and an Agent, wherein the convolutional neural network is used for extracting characteristics of an input image; the Agent is used for outputting the predicted position of the target and judging whether training is stopped or not; the discriminator is a similarity comparison model and is used for calculating the similarity between the image area at the predicted position output by the Agent and the target in the pre-acquired target type library and feeding the similarity back to the Agent as a score.
Applying the embodiment of the invention shown in fig. 3, inputting the input image into the neural network to obtain the predicted position of the target in the input image; calculating the similarity between an image area at a predicted position and a target in a target type library acquired in advance, and carrying out iterative adjustment on parameters in the neural network based on the similarity; therefore, in the scheme, on one hand, the training of the neural network is guided through the similarity between the image area at the predicted position and the target in the pre-acquired target type library, and the manual labeling is not needed in the training process, so that the labor consumption is reduced; on the other hand, because the time for labeling data is reduced, the time for training the neural network model is shortened.
The embodiment of the invention also provides a target detection method, as shown in fig. 4, comprising the following steps:
s401: and acquiring an image to be detected.
S402: and inputting the image to be detected into a neural network model to obtain a detection result output by the neural network model.
The process of training the neural network model refers to the embodiments shown in fig. 1-3 and is not described here.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a neural network training device, as shown in fig. 5, including:
a prediction module 501, configured to input an input image to a neural network, and obtain a predicted position of a target in the input image;
a calculating module 502, configured to calculate a similarity between an image area at the predicted position and a target in a target type library acquired in advance;
a judging module 503, configured to judge whether the similarity meets a preset convergence condition;
and the updating module 504 is configured to adjust parameters in the neural network based on the similarity, and trigger the prediction module with the image area as an input image until the similarity meets a preset convergence condition, so as to obtain a trained neural network model.
In one embodiment, the prediction module 501 further includes: a first extraction sub-module and a prediction sub-module (not shown in the figure), wherein,
The first extraction submodule is used for extracting image features of the input image to serve as first image features;
and the prediction sub-module is used for inputting the first image characteristic into the neural network to obtain the position of a predicted target output by the neural network in the input image.
In one embodiment, the prediction submodule is specifically configured to:
judging whether the neural network outputs the positions of a plurality of prediction targets in the input image or not;
if yes, judging the type and probability of the predicted target, and outputting the position of the predicted target with the highest probability in the input image;
if not, outputting the position of the prediction target in the input image.
In one embodiment, the computing module 502 further includes: a intercept sub-module, a second extraction sub-module, and a computation sub-module (not shown), wherein,
the intercepting sub-module is used for intercepting an image area where the prediction target is located in the input image based on the position;
the second extraction submodule is used for extracting the image characteristics of the image area and taking the image characteristics as second image characteristics;
and the computing sub-module is used for computing the similarity between the second image characteristic and the characteristic of the target in the pre-acquired target type library.
In one embodiment, the apparatus further comprises: a determination module (not shown in the figures), wherein,
the determining module is used for determining the prediction times corresponding to the target; setting negative excitation for adjusting the neural network based on the prediction times;
the updating module is specifically used for: based on the similarity and the negative stimulus, parameters in the neural network are adjusted.
In one embodiment, the apparatus further comprises: a selection module (not shown) for determining whether the predicted position matches the last predicted position; if the two types of the data are not matched, triggering the computing module; and if the two types of data are matched, triggering the updating module.
Applying the embodiment of the invention shown in fig. 5, inputting the input image into the neural network to obtain the predicted position of the target in the input image; calculating the similarity between an image area at a predicted position and a target in a target type library acquired in advance, and carrying out iterative adjustment on parameters in the neural network based on the similarity; therefore, in the scheme, on one hand, the training of the neural network is guided through the similarity between the image area at the predicted position and the target in the pre-acquired target type library, and the manual labeling is not needed in the training process, so that the labor consumption is reduced; on the other hand, because the time for labeling data is reduced, the time for training the neural network model is shortened.
Corresponding to the above method embodiment, the embodiment of the present invention further provides an object detection device, as shown in fig. 6, including:
an acquisition module 601, configured to acquire an image to be detected;
the detection module 602 is configured to input an image to be detected to a neural network model, and obtain a detection result output by the neural network model.
The embodiment of the invention also provides an electronic device, as shown in fig. 7, comprising a processor 701 and a memory 702,
a memory 702 for storing a computer program;
the processor 701 is configured to implement any of the neural network training and target detection methods described above when executing the program stored in the memory 702.
The Memory mentioned in the electronic device may include a random access Memory (Random Access Memory, RAM) or may include a Non-Volatile Memory (NVM), such as at least one magnetic disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer readable storage medium is provided, where a computer program is stored, where the computer program, when executed by a processor, implements the neural network training and target detection method according to any of the foregoing embodiments.
In yet another embodiment of the present invention, a computer program product comprising instructions that, when executed on a computer, cause the computer to perform the neural network training and target detection method of any of the above embodiments is also provided.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus embodiments, device embodiments, computer-readable storage medium embodiments, and computer program product embodiments, the description is relatively simple, as relevant to the method embodiments in part.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (14)

1. A neural network training method, comprising:
inputting an input image into a neural network to obtain a predicted position of a target in the input image;
calculating the similarity between the image area at the predicted position and a target in a pre-acquired target type library;
judging whether the similarity meets a preset convergence condition or not;
if the similarity is not met, parameters in the neural network are adjusted based on the similarity, the image area is used as an input image, the step of inputting the input image into the neural network is carried out again until the similarity meets a preset convergence condition, and a trained neural network model is obtained;
the input image is input to a neural network, and after the predicted position of the target in the input image is obtained, the method further comprises the steps of:
determining the prediction times corresponding to the target;
setting negative excitation for adjusting the neural network based on the prediction times;
Parameters in the neural network are adjusted based on the similarity and the negative stimulus.
2. The method of claim 1, wherein inputting the input image to a neural network results in a predicted location of an object in the input image, comprising:
extracting image features of the input image as first image features;
and inputting the first image characteristic into the neural network to obtain the position of a predicted target output by the neural network in the input image.
3. The method of claim 2, wherein the deriving the location of the predicted target of the neural network output in the input image comprises:
judging whether the neural network outputs the positions of a plurality of prediction targets in the input image or not;
if yes, judging the type and probability of the predicted target, and outputting the position of the predicted target with the highest probability in the input image;
if not, outputting the position of the prediction target in the input image.
4. The method of claim 1, wherein said calculating the similarity of the image area at the location to the object in the pre-acquired object type library comprises:
Based on the position, intercepting an image area where a prediction target is located in the input image;
extracting image features of the image area as second image features;
and calculating the similarity between the second image feature and the feature of the target in the pre-acquired target type library.
5. The method of claim 1, the negative excitation being a product of the predicted times and a preset negative excitation coefficient; or the negative excitation is the product of the predicted times, a preset negative excitation coefficient and a preset weight;
the adjusting parameters in the neural network based on the similarity and the negative excitation includes:
based on the similarity and the negative stimulus, a score of a loss function in the neural network is adjusted, the score being a difference between the similarity and the negative stimulus.
6. The method of claim 5, further comprising, prior to said calculating the similarity of the image region at the location to a target in a library of pre-acquired target types:
judging whether the predicted position is matched with the position predicted last time;
if not, executing the step of calculating the similarity between the image area at the position and the target in the pre-acquired target type library;
If so, parameters in the neural network are adjusted based on the similarity and the negative stimulus.
7. A method of detecting an object, comprising:
acquiring an image to be detected;
inputting the image to be detected into a neural network model obtained by the method according to any one of claims 1-6, and obtaining a detection result output by the neural network model.
8. A neural network training device, comprising:
the prediction module is used for inputting an input image into the neural network to obtain a predicted position of a target in the input image;
the calculation module is used for calculating the similarity between the image area at the predicted position and a target in a target type library acquired in advance;
the judging module is used for judging whether the similarity meets a preset convergence condition or not;
the updating module is used for adjusting parameters in the neural network based on the similarity, taking the image area as an input image, triggering the prediction module until the similarity meets a preset convergence condition, and obtaining a trained neural network model;
the apparatus further comprises:
the determining module is used for determining the prediction times corresponding to the target; setting negative excitation for adjusting the neural network based on the prediction times;
The updating module is specifically configured to adjust parameters in the neural network based on the similarity and the negative excitation.
9. The apparatus of claim 8, wherein the prediction module comprises:
the first extraction submodule is used for extracting image features of the input image to serve as first image features;
and the prediction sub-module is used for inputting the first image characteristic into the neural network to obtain the position of a predicted target output by the neural network in the input image.
10. The apparatus of claim 9, wherein the prediction submodule is specifically configured to:
judging whether the neural network outputs the positions of a plurality of prediction targets in the input image or not;
if yes, judging the type and probability of the predicted target, and outputting the position of the predicted target with the highest probability in the input image;
if not, outputting the position of the prediction target in the input image.
11. The apparatus of claim 8, wherein the computing module comprises:
the intercepting sub-module is used for intercepting an image area where a predicted target is located in the input image based on the position;
The second extraction submodule is used for extracting the image characteristics of the image area and taking the image characteristics as second image characteristics;
and the computing sub-module is used for computing the similarity between the second image characteristic and the characteristic of the target in the pre-acquired target type library.
12. The apparatus of claim 8, wherein the negative excitation is a product of the predicted number of times and a preset negative excitation coefficient; or the negative excitation is the product of the predicted times, a preset negative excitation coefficient and a preset weight;
the updating module is further configured to:
based on the similarity and the negative stimulus, a score of a loss function in the neural network is adjusted, the score being a difference between the similarity and the negative stimulus.
13. The apparatus of claim 12, wherein the apparatus further comprises:
the selection module is used for judging whether the predicted position is matched with the position predicted last time; if the two types of the data are not matched, triggering the computing module; and if the two types of data are matched, triggering the updating module.
14. An object detection apparatus, comprising:
the acquisition module is used for acquiring the image to be detected;
the detection module is used for inputting the image to be detected into the neural network model obtained by the device according to any one of claims 8-13, and obtaining a detection result output by the neural network model.
CN202011174690.4A 2020-10-28 2020-10-28 Neural network training and target detection method and device Active CN112288003B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011174690.4A CN112288003B (en) 2020-10-28 2020-10-28 Neural network training and target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011174690.4A CN112288003B (en) 2020-10-28 2020-10-28 Neural network training and target detection method and device

Publications (2)

Publication Number Publication Date
CN112288003A CN112288003A (en) 2021-01-29
CN112288003B true CN112288003B (en) 2023-07-25

Family

ID=74373334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011174690.4A Active CN112288003B (en) 2020-10-28 2020-10-28 Neural network training and target detection method and device

Country Status (1)

Country Link
CN (1) CN112288003B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229318A (en) * 2017-11-28 2018-06-29 北京市商汤科技开发有限公司 The training method and device of gesture identification and gesture identification network, equipment, medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120230583A1 (en) * 2009-11-20 2012-09-13 Nec Corporation Object region extraction device, object region extraction method, and computer-readable medium
CN108491827B (en) * 2018-04-13 2020-04-10 腾讯科技(深圳)有限公司 Vehicle detection method and device and storage medium
CN109977943B (en) * 2019-02-14 2024-05-07 平安科技(深圳)有限公司 Image target recognition method, system and storage medium based on YOLO
JP7192966B2 (en) * 2019-03-26 2022-12-20 日本電信電話株式会社 SEARCH DEVICE, LEARNING DEVICE, SEARCH METHOD, LEARNING METHOD AND PROGRAM
CN110060274A (en) * 2019-04-12 2019-07-26 北京影谱科技股份有限公司 The visual target tracking method and device of neural network based on the dense connection of depth
CN110443366B (en) * 2019-07-30 2022-08-30 上海商汤智能科技有限公司 Neural network optimization method and device, and target detection method and device
CN110675361B (en) * 2019-08-16 2022-03-25 北京百度网讯科技有限公司 Method and device for establishing video detection model and video detection
CN111476306B (en) * 2020-04-10 2023-07-28 腾讯科技(深圳)有限公司 Object detection method, device, equipment and storage medium based on artificial intelligence

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229318A (en) * 2017-11-28 2018-06-29 北京市商汤科技开发有限公司 The training method and device of gesture identification and gesture identification network, equipment, medium

Also Published As

Publication number Publication date
CN112288003A (en) 2021-01-29

Similar Documents

Publication Publication Date Title
CN108009543B (en) License plate recognition method and device
WO2020239015A1 (en) Image recognition method and apparatus, image classification method and apparatus, electronic device, and storage medium
CN108960211A (en) A kind of multiple target human body attitude detection method and system
CN108805016B (en) Head and shoulder area detection method and device
CN112001406B (en) Text region detection method and device
US20120201464A1 (en) Computer readable medium, image processing apparatus, and image processing method
CN113657202B (en) Component identification method, training set construction method, device, equipment and storage medium
CN110956080B (en) Image processing method and device, electronic equipment and storage medium
CN110909784B (en) Training method and device of image recognition model and electronic equipment
CN109740416B (en) Target tracking method and related product
CN111814828B (en) Intelligent space planning method, device, equipment and storage medium
CN111985469B (en) Method and device for recognizing characters in image and electronic equipment
CN109726746A (en) A kind of method and device of template matching
CN111027412A (en) Human body key point identification method and device and electronic equipment
CN117057443B (en) Prompt learning method of visual language model and electronic equipment
CN111814846B (en) Training method and recognition method of attribute recognition model and related equipment
CN110909664A (en) Human body key point identification method and device and electronic equipment
CN111178364A (en) Image identification method and device
CN110969100A (en) Human body key point identification method and device and electronic equipment
CN110222704B (en) Weak supervision target detection method and device
CN115457391A (en) Magnetic flux leakage internal detection method and system for pipeline and related components
CN115797735A (en) Target detection method, device, equipment and storage medium
CN110880018A (en) Convolutional neural network target classification method based on novel loss function
CN112884866B (en) Coloring method, device, equipment and storage medium for black-and-white video
CN111583159B (en) Image complement method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant