CN110751163B - Target positioning method and device, computer readable storage medium and electronic equipment - Google Patents

Target positioning method and device, computer readable storage medium and electronic equipment Download PDF

Info

Publication number
CN110751163B
CN110751163B CN201810821904.9A CN201810821904A CN110751163B CN 110751163 B CN110751163 B CN 110751163B CN 201810821904 A CN201810821904 A CN 201810821904A CN 110751163 B CN110751163 B CN 110751163B
Authority
CN
China
Prior art keywords
neural network
image
target
channel
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810821904.9A
Other languages
Chinese (zh)
Other versions
CN110751163A (en
Inventor
张鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201810821904.9A priority Critical patent/CN110751163B/en
Publication of CN110751163A publication Critical patent/CN110751163A/en
Application granted granted Critical
Publication of CN110751163B publication Critical patent/CN110751163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target positioning method and a device thereof, a computer readable storage medium and electronic equipment, wherein the target positioning method comprises the following steps: extracting features of the image to be identified to obtain a plurality of channel features; determining a weighting parameter corresponding to the channel characteristic aiming at any channel characteristic, wherein the weighting parameter corresponding to the channel characteristic is used for representing the relativity of the channel characteristic and a target position positioned from the image to be identified; correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics; and positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics. The target positioning method can improve the accuracy of target positioning.

Description

Target positioning method and device, computer readable storage medium and electronic equipment
Technical Field
The present invention relates to the field of image recognition technology, and in particular, to a target positioning method and apparatus, a computer readable storage medium, and an electronic device.
Background
Image recognition refers to a technique of processing, analyzing, and understanding an image with a computer to detect and recognize targets of various modes in the image.
Image recognition-based target positioning methods refer to methods for recognizing a specific target from an image and determining the position of the specific target in the image, and currently, a neural network can be used for target positioning.
The accuracy of the existing method for performing target positioning by using the neural network needs to be further improved.
Disclosure of Invention
The invention provides a target positioning method and a device thereof, a computer readable storage medium and electronic equipment, which solve the defects in the related art.
According to a first aspect of an embodiment of the present invention, there is provided a target positioning method, including:
extracting features of the image to be identified to obtain a plurality of channel features;
determining a weighting parameter corresponding to the channel characteristic aiming at any channel characteristic, wherein the weighting parameter corresponding to the channel characteristic is used for representing the relativity of the channel characteristic and a target position positioned from the image to be identified;
correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics;
and positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics.
Optionally, the extracting features of the image to be identified to obtain multi-channel features includes:
inputting the image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features;
the neural network is obtained through training the following steps:
building a neural network, wherein the neural network comprises a convolution layer, a pooling layer and a full connection layer;
acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;
inputting the training sample into the neural network to output a target type identification result of the marker image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample;
and training the neural network through a training sample to obtain a trained neural network.
Optionally, after obtaining the training sample including the marker image marked with the target type, the method further includes:
and carrying out shielding pretreatment on a partial region of the marked image.
Optionally, the determining the weighting parameter corresponding to the channel feature includes:
and inputting the multiple channel characteristics output by the convolution layer into the full-connection layer, and determining the weighting parameters corresponding to the channel characteristics by the full-connection layer.
Optionally, the determining the weighting parameter corresponding to the channel feature includes:
deriving each feature in the channel features to obtain derivatives of each feature;
and taking the average value of the calculated derivative of each characteristic as a weighting parameter corresponding to the channel characteristic.
Optionally, the positioning the target position of the target included in the image to be identified by using the corrected channel features includes:
acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions;
and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.
According to a second aspect of an embodiment of the present invention, there is provided an object positioning apparatus including:
the feature extraction module is used for extracting features of the image to be identified to obtain a plurality of channel features;
the weighting parameter determining module is used for determining a weighting parameter corresponding to the channel characteristic aiming at any channel characteristic, wherein the weighting parameter corresponding to the channel characteristic is used for representing the correlation degree between the channel characteristic and a target position positioned from the image to be identified;
the characteristic correction module is used for correcting the channel characteristic according to the weighting parameter corresponding to the channel characteristic;
and the target position positioning module is used for positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics.
Optionally, the feature extraction module is specifically configured to:
inputting the image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features;
the device also comprises a training module for:
building a neural network, wherein the neural network comprises a convolution layer, a pooling layer and a full connection layer;
acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;
inputting the training sample into the neural network to output a target type identification result of the marker image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample;
and training the neural network through a certain number of training samples to obtain a trained neural network.
Optionally, the weighting parameter determining module is specifically configured to:
and inputting the multiple channel characteristics output by the convolution layer into the full connection layer, and determining the weighting parameters corresponding to each channel characteristic by the full connection layer.
Optionally, the weighting parameter determining module is specifically configured to:
deriving each feature in each channel feature to obtain a derivative of each feature;
and taking the calculated average value of the derivative of each characteristic of each channel as a weighting parameter corresponding to each channel characteristic.
Optionally, the target position positioning module is specifically configured to:
acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions;
and determining the position corresponding to the response value larger than the threshold value, and taking the region comprising the position corresponding to the response value larger than the threshold value as the position of the target.
According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements any of the methods described above.
According to a fourth aspect of embodiments of the present invention there is provided an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause the execution of any one of the methods described above.
According to the technical scheme, the target positioning method can improve the accuracy of target positioning.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a workflow diagram of a target positioning method provided in accordance with an exemplary embodiment of the present invention;
FIG. 2 is a workflow diagram of a target positioning method provided in accordance with another exemplary embodiment of the invention;
fig. 3A to 3C are effect diagrams of a target position located from an image to be identified by using the target location method provided by the embodiment of the present invention;
FIG. 4 is a schematic diagram of a target location process according to a target location method provided in an exemplary embodiment of the present invention;
FIG. 5 is a schematic diagram of a visual analysis of a multi-channel feature provided in accordance with an exemplary embodiment of the present invention;
FIG. 6 is a block diagram of an object positioning apparatus provided in accordance with yet another embodiment of the present invention;
fig. 7 is a hardware configuration diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.
The target positioning method is a positioning method based on an image recognition technology, and the position of a specific target in the image is recognized and positioned by the image.
Several specific examples are given below to describe the technical solutions of the present application in detail. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
Fig. 1 is a flowchart of a target positioning method according to an exemplary embodiment of the present invention, and referring to fig. 1, the target positioning method includes:
step S10, extracting features of an image to be identified to obtain a plurality of channel features;
step S20, aiming at any channel feature, determining a weighting parameter corresponding to each channel feature, wherein the weighting parameter corresponding to the channel feature is used for representing the relativity of the channel feature and a target position positioned from the image to be identified;
step S30, correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics;
and S40, positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics.
The invention belongs to the field of target positioning/detection in a weak supervision scene in machine vision, namely information used for training a positioning/detection algorithm is not rectangular frame calibration information (Bounding Box Annotation) which is commonly used, but is category information of pictures. Although only the category information of the picture is used, the method can more accurately position the target through operations such as random shielding of a data layer and weighting of a feature map, and can be used as a front module of tasks such as sample calibration, target classification and recognition, and the learning difficulty of the tasks is reduced.
According to the target positioning method under the weak supervision scene, network parameters are regulated under the guidance of weak supervision information (picture type information), and representative characteristics of a target are extracted; the characteristic can generate specific responses to different parts of the target, and the type of the target can be obtained by weighting the characteristic map so as to obtain the position information of the target.
The image to be identified may be an image acquired in real time by an image acquisition device (e.g. a camera or a video camera, etc.), or an image pre-stored by a device applying the method.
The image recognition algorithm or the neural network based on deep learning can be utilized to recognize an image to be recognized (hereinafter referred to as an image), a plurality of channel characteristics are extracted from the image, the channel characteristics refer to an output result of performing characteristic detection on the image, and one channel characteristic refers to an output result of detecting a certain characteristic; for feature extraction of an image through a neural network, channel features are output results obtained after filtering processing by adopting a convolution filter, and can also be called a feature map, and the number of the channel features is related to the number of the convolution filters adopted.
The channel features may represent all features of the image (i.e., object-level features), such as texture features, color features, spatial relationship features, etc., of the image, or local features (i.e., features of a portion of the object, such as head features, torso features, etc.), such as shape features of portions of the image that include the object, edge features of the object, etc.
The target refers to a specific object pre-identified from the image, and for what type of object needs to be identified, namely, the target is related to the channel characteristics which can be extracted and the algorithm for classifying based on the channel characteristics, the target position is the region where the target is located in the image, for image identification, the region where the target is located can be defined in the image, for example, the region where the target is located is defined by a square frame, a polygonal frame or frames with other shapes, and the position of the frame is the target position.
The weighting parameters corresponding to the channel features are used for the correlation between the channel features and the target position located in the image to be identified, that is to say, the weighting parameters may represent the influence of the corresponding channel features on the target position identification result, the larger the weighting parameters represent the influence of the channel features on the target position identification result, for example, the influence of the channel features capable of representing the wheel part, the window part and the logo part of the vehicle on the vehicle position identification result is larger for identifying the vehicle in the image, so that the weighting parameters of the channel features are larger, the influence of the channel features representing the color or the texture of the vehicle on the reasonable identification result is smaller, and the weighting parameters of the channel features are smaller.
After the channel characteristics are corrected through the weighting parameters corresponding to the channel characteristics, the weight of the channel characteristics with large influence on the target identification result can be enhanced, and the weight of the channel characteristics with large influence on the target identification result is weakened, so that the type of the target can be accurately identified, the identification of the position of the target is facilitated, and the accuracy of target positioning is improved.
In an optional embodiment, the extracting the features of the image to be identified in step S10 to obtain the multi-channel features includes:
inputting the image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features.
In this embodiment, feature extraction is performed by using a neural network, where the neural network includes a convolution layer, and a plurality of channel features may be obtained after convolution processing of the convolution layer, and may include one or more convolution layers, where each convolution layer may include one or more convolution kernels, and the convolution kernels may slide in a certain step to perform convolution processing on each region of the image, and one channel feature may be obtained after convolution processing of each convolution kernel, where the number of channel features obtained finally depends on the number of convolution kernels of the last convolution layer.
For example, assuming that the size of the convolution kernel is, for example, 4×4, the size of the image is, for example, 16×16, the step size may be 1, 2, 3, 4, or 6, and the convolution kernel slides with the step size, the convolution processing is sequentially performed on each region of the image, one channel feature is obtained after the convolution of the entire image is completed, and a plurality of channel features are obtained through the convolution processing of a plurality of convolution kernels.
Wherein, for a plurality of channel features, the plurality of channel features may form a three-dimensional matrix, and the size of the matrix may be expressed as h×w×c, where H is the height of the channel feature and indicates the number of pixels divided in the longitudinal direction of the channel feature; w is the width of the channel feature and represents the number of pixels divided in the transverse direction of the channel feature; c represents the number of channels, which is determined by the number of convolution kernels of the last convolution layer of the underlying convolution network, each convolution kernel of the last convolution layer calculating a signature of a channel. It should be noted that, when convolution is performed, a plurality of convolution kernels may be used, where each convolution kernel calculates a channel characteristic corresponding to one channel, and one channel characteristic may be represented by h×w×1, and one channel characteristic corresponds to one channel.
The neural network is a depth-based neural network, such as convolutional neural network CNN
(Convolutional Neural Network, abbreviated as CNN), which is a feedforward artificial neural network, neurons of which can respond to surrounding units within a limited coverage range, and effectively extract feature information of an image through weight sharing and feature aggregation.
When the neural network is trained, the neural network is trained in a weak supervision mode, and the training process comprises the following steps:
step S01, building a neural network, wherein the neural network comprises a convolution layer;
the neural network may include one or more convolutional layers.
Step S02, acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;
in this step, the marked image is used as a training sample, the marked image is an image marked with the type of the target, only the type of the target in the image is needed to be marked, the position of the target is not needed to be marked, and the marked image is an image marked with a rough mark, for example, the marked image is an image with a 'cow', 'grass' and 'sky' label, and the neural network only knows the objects with the labels in the image, but does not know the specific positions of the objects, so that the objects can be 'cow', 'grass' or 'sky' for each pixel of the image.
Step S03, inputting the training sample into the neural network so as to output a target type identification result of the marked image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample.
And step S04, training the neural network through a training sample to obtain a trained neural network.
Specifically, the training sample is, for example, a marked image X, and feature extraction is performed through a neural network to obtain a plurality of channel features, where the channel features can effectively preserve the spatial relative relationship of the target, and y=f (X) is used to represent the target type recognition result output by the neural network, where f is a set description of the neural network operation (including convolution, pooling, full connection, etc.), and if the recognition of the weak supervision task is classified, the target type recognition result Y output by the neural network represents the probability that the marked image X belongs to the target type; if the identification of the weak supervision task is classified as image annotation, Y represents the probability that the marked image X has the image annotation.
Parameters in the neural network are supervised and updated through the difference between the Y and the target type in the training sample, so that the neural network can be trained end to end.
The parameters include, for example, parameters in the neural network that are related to the correlation function, and the parameters can be modified by means of gradient back propagation so as to minimize the difference between the target type recognition result output by the neural network and the target type in the training sample.
The training samples of a certain number can be input into the neural network to train the network, and after the neural network is trained by the samples of a certain number, the trained neural network is obtained.
When the neural network is trained, the neural network is trained in a weak supervision mode, and the required workload is far less than the workload required to mark the specific position of each target as only the target type is required to mark the training sample.
The weak supervision mode refers to: training is performed solely on image-level tags with weak supervision, and the type of object contained in the image is used to identify and locate objects in the image without knowing the specific location of the object in the image.
In an alternative embodiment, before obtaining the training sample including the marker image marked with the target type, the method further includes:
and carrying out shielding treatment on the partial area of the marked image.
For training the neural network in the weak supervision mode, the features learned by the neural network are mainly the features of the salient region of the target, and the features of the non-salient region of the target are difficult to learn, so that the neural network is forced to pay attention to the non-salient region, the general features of the non-salient region can be learned, and the neural network can learn not only the salient features in the sample image but also the general features in the sample image by randomly shielding the sample image, so that the positioning accuracy is improved. For a large number of training samples, the partial areas of each marking image can be randomly shielded, for example, the marking image can be divided into areas with different sizes (such as 32 x 32 or 64 x 64), and the color of one or the partial areas is converted into black with a certain probability, so that the partial areas are shielded.
In some examples, the determining the image recognition coefficients corresponding to the respective target image features in the step S20 includes:
and S21, inputting the plurality of channel characteristics output by the convolution layer into the full-connection layer, and determining the weighting parameters corresponding to the channel characteristics by the full-connection layer.
After the convolution processing of the convolution layer, a plurality of channel characteristics are output, each channel characteristic can represent all area characteristics, local area characteristics and the like of the shape and the color of the target, after the plurality of channel characteristics are input into the full-connection layer, the full-connection layer can screen the plurality of channel characteristics according to a certain rule to determine a key area and a non-key area of the target, accordingly, the weighting parameters corresponding to each channel characteristic are determined, the weighting parameters of the corresponding channel characteristics are larger for the key area of the target, and the weighting parameters of the corresponding channel characteristics are smaller for the non-key area of the target.
In some examples, the weighting parameters corresponding to each channel may also be determined by a method comprising:
step S22, deriving each feature in the channel features to obtain the derivative of each feature;
step S23, taking the calculated weighted average value of the derivative of each characteristic as the weighted parameter corresponding to the channel characteristic.
The above embodiment is another method for determining the weighting parameter corresponding to each channel feature through the fully connected layer of the neural network, specifically, each channel feature may include a feature of a plurality of positions, for each channel feature, derivative of each feature is obtained by deriving the feature of each position, and then the weighted average of the derivatives is calculated, and the average is used as the weighting parameter corresponding to the channel feature.
For each channel feature, it may be represented by a function, for each feature of each position on the function, the derivative of each point on the function may be calculated, i.e. the derivative of each feature may be obtained, and the derivative of the function at a certain point, in particular the tangential slope of the curve represented by the function at that point.
In this embodiment, the weighted parameters corresponding to the channel features are determined by deriving the channel features output by the convolution layer, which is favorable for obtaining the contour features and texture features of the target through the derivation operation, and the influence of the image illumination on the target identification can be weakened, so that the accuracy of target positioning is favorable to be improved.
In an alternative embodiment, as shown in fig. 2, locating the target position of the target included in the image to be identified using the corrected channel features in step S40 includes:
step S41, obtaining response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions;
step S42, determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.
The plurality of channel features output by the convolution layer are data of a plurality of dimensions, for example, the plurality of channel features F (x_o) form a matrix with a shape of h×w×c, where H represents a height of the matrix, W represents a width of the matrix, and C represents a number of channels, and values at each position in the matrix correspond to one channel feature respectively.
After weighting the channel characteristics according to the weighting parameters, each channel characteristic after correction can represent a response value of each position of the image, the response value represents the probability of the position having the target, namely, the probability of the position having the target is larger, the response value is larger, the probability of the position having the target is larger, after correction processing is carried out on each channel characteristic through the weighting parameters corresponding to each channel characteristic, the weight of the channel characteristic with great influence on the target recognition result can be strengthened, the weight of the channel characteristic with great influence on the target recognition result is weakened, and the target position is positioned more accurately.
The response values of the positions may not be the same, i.e. the probability that the targets exist in the positions of the images are different, in order to further locate the positions of the targets, a threshold is set, only the positions with response values larger than the threshold are reserved, and the positions are positions with high possibility of existence of the targets, so that the positions of the targets in the positions can be screened out from the images, the background is filtered out, for example, if the targets are people, the positions with the response values larger than the threshold can comprise the positions of the parts of the head, the body, the feet, the arms and the like, the positions can represent the positions of the parts of the targets, the areas containing the positions are used as target positions, for example, an external rectangle containing the positions is drawn as a labeling frame, and the area where the labeling frame is the target position, so that the positioning of the targets is realized.
Fig. 3A-3C show effect diagrams of locating a target position from each picture to be identified by adopting the target locating method, fig. 3A is a position of a located vehicle, fig. 3B is a position of a located airplane, and fig. 3C is a position of a located bird.
The above-mentioned target positioning method is described below by taking an image to be identified as an image including a person and a dog as an example, and referring to fig. 4, the image including the person and the dog is the image to be identified, and the dog is the target, and the specific identification process is as follows:
inputting the image into a neural network, wherein the neural network comprises a plurality of convolution layers, and the convolution layers are used for carrying out convolution processing on the image to obtain a plurality of channel characteristics;
determining weighting parameters corresponding to the characteristics of each channel;
in the training process of the neural network, the multiple channel features obtained by the convolution layer are processed by the pooling layer and then input into the full-connection layer, and the weighting parameters corresponding to the channel features are determined by the full-connection layer, for example, the weighting parameters of the channel features in fig. 4 are respectively w 1 、w 2 、…、w n
Carrying out correction processing on each channel characteristic according to the weighting parameter corresponding to each channel characteristic;
acquiring response values of all positions of the image to be identified according to the corrected channel characteristics;
referring to fig. 5, a schematic diagram of a visual analysis of channel features is shown in fig. 5, response values of each position of an image to be identified may be obtained according to each channel feature, for example, each graph on the left of an equal sign in fig. 5 shows a response graph of each channel feature to a target, it can be seen from fig. 5 that each channel feature may indicate a probability that a target exists at each position in the image, a region with higher brightness in the graph is a region with a high probability that the target exists in the channel feature, and a position with a higher response value indicates a high probability that the target exists.
Finally, the target position can be obtained by carrying out threshold screening on each position, the position corresponding to the response value larger than the threshold value is determined, the region comprising the position corresponding to the response value larger than the threshold value is taken as the target position, each local position of the target can be shown in each position with the response value larger than the threshold value, the screened region comprising each position can be identified through the marking frame, the region is the position of the target, for example, the marking frame is used for identifying the target position in the right graph of the equal sign in fig. 5.
The embodiment of the present invention also provides a target positioning device, as shown in fig. 6, where the target positioning device 06 includes:
the feature extraction module 61 is configured to perform feature extraction on an image to be identified to obtain a plurality of channel features;
the weighted parameter determining module 62 is configured to determine, for any channel feature, a weighted parameter corresponding to the channel feature, where the weighted parameter corresponding to the channel feature is used to characterize a correlation between the channel feature and a target position located in the image to be identified;
the feature correction module 63 is configured to perform correction processing on the channel feature according to the weighting parameter corresponding to the channel feature;
the target position locating module 64 is configured to locate a target position of the target included in the image to be identified using the corrected respective channel features.
In some examples, the feature extraction module is specifically configured to:
inputting the image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features;
the device also comprises a training module for:
building a neural network, wherein the neural network comprises a convolution layer, a pooling layer and a full connection layer;
acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;
inputting the training sample into the neural network to output a target type identification result of the marker image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample;
and training the neural network through a certain number of training samples to obtain a trained neural network.
In an alternative embodiment, after acquiring the training sample including the marker image marked with the target type, the method further includes:
and carrying out shielding pretreatment on a partial region of the marked image.
In an alternative embodiment, the weighting parameter determination module is specifically configured to:
and inputting the multiple channel characteristics output by the convolution layer into the full connection layer, and determining the weighting parameters corresponding to each channel characteristic by the full connection layer.
For example, the weighting parameter determining module is specifically configured to:
deriving each feature in the channel features to obtain derivatives of each feature;
and taking the average value of the calculated derivative of each characteristic as a weighting parameter corresponding to the channel characteristic.
In some examples, the target position location module is specifically configured to:
acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions;
and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.
Corresponding to the embodiment of the target positioning method, the target positioning device provided by the invention can improve the accuracy of target positioning.
For the embodiment of the apparatus, the implementation process of the functions and roles of each unit is specifically detailed in the implementation process of the corresponding steps in the above method, which is not described herein again.
For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the units may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
The apparatus of this embodiment may be implemented by software or by software plus necessary general-purpose hardware through the description of the above embodiments, and may of course also be implemented by hardware. Based on such understanding, the technical solution of the present invention may be essentially or partly contributing to the prior art in the form of a software product, which is implemented as a device in a logical sense by reading corresponding computer program instructions in a non-volatile memory into a memory by a processor where an apparatus applying the device is located.
The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the embodiments described above.
Referring to fig. 7, the present invention also provides a hardware architecture diagram of an electronic device, the electronic device including: a communication interface 101, a processor 102, a machine-readable storage medium 103, a non-volatile storage medium 104, and a bus 105; wherein the communication interface 101, the processor 102, the machine-readable storage medium 103, and the non-volatile storage medium 104 communicate with each other via a bus 105. The processor 102 may perform the object localization method described above by reading and executing machine-executable instructions in the machine-readable storage medium 103 corresponding to the control logic of the object localization method.
The machine-readable storage medium 103 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
In addition, the electronic device may be various terminal devices or backend devices, such as a video camera, a server, a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims (8)

1. A method of locating a target, comprising:
inputting an image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features; when training the neural network, taking a marked image marked with a target type as training input and taking a target type identification result of the marked image as training output;
inputting the channel characteristics to a full-connection layer of the neural network aiming at any channel characteristic output by the convolution layer, and determining a weighting parameter corresponding to the channel characteristics by the full-connection layer, wherein the weighting parameter corresponding to the channel characteristics is used for representing the relativity of the channel characteristics and a target position positioned from the image to be identified;
correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics;
positioning a target position containing a target in the image to be identified by utilizing each corrected channel characteristic, wherein the method comprises the following steps: acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions; and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.
2. The method according to claim 1, wherein the neural network is trained by:
building a neural network, wherein the neural network comprises a convolution layer, a pooling layer and a full connection layer;
acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;
inputting the training sample into the neural network to output a target type identification result of the marker image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample;
and training the neural network through a training sample to obtain a trained neural network.
3. The method of claim 1, further comprising, after acquiring the training sample comprising the marker image marked with the target type:
and carrying out shielding pretreatment on a partial region of the marked image.
4. A target positioning device, comprising:
the feature extraction module is used for inputting the image to be identified into a trained neural network, and carrying out feature extraction on the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features; when training the neural network, taking a marked image marked with a target type as training input and taking a target type identification result of the marked image as training output;
the weighting parameter determining module is used for inputting the channel characteristic to the full-connection layer of the neural network aiming at any channel characteristic output by the convolution layer, determining a weighting parameter corresponding to the channel characteristic by the full-connection layer, wherein the weighting parameter corresponding to the channel characteristic is used for representing the relativity of the channel characteristic and a target position positioned from the image to be identified;
the characteristic correction module is used for correcting the channel characteristic according to the weighting parameter corresponding to the channel characteristic;
the target position locating module is used for locating the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics, and comprises the following steps: acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions; and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.
5. The apparatus of claim 4, further comprising a training module to:
building a neural network, wherein the neural network comprises a convolution layer, a pooling layer and a full connection layer;
acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;
inputting the training sample into the neural network to output a target type identification result of the marker image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample;
and training the neural network through a certain number of training samples to obtain a trained neural network.
6. The apparatus of claim 4, wherein the weighting parameter determination module is specifically configured to:
and inputting the multiple channel characteristics output by the convolution layer into the full connection layer, and determining the weighting parameters corresponding to each channel characteristic by the full connection layer.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.
8. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause the method of any one of claims 1 to 3 to be performed.
CN201810821904.9A 2018-07-24 2018-07-24 Target positioning method and device, computer readable storage medium and electronic equipment Active CN110751163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810821904.9A CN110751163B (en) 2018-07-24 2018-07-24 Target positioning method and device, computer readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810821904.9A CN110751163B (en) 2018-07-24 2018-07-24 Target positioning method and device, computer readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110751163A CN110751163A (en) 2020-02-04
CN110751163B true CN110751163B (en) 2023-05-26

Family

ID=69275586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810821904.9A Active CN110751163B (en) 2018-07-24 2018-07-24 Target positioning method and device, computer readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110751163B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763305B (en) * 2020-05-29 2023-08-04 杭州海康威视数字技术股份有限公司 Method and device for calibrating defect of article and electronic equipment
CN116310806B (en) * 2023-02-28 2023-08-29 北京理工大学珠海学院 Intelligent agriculture integrated management system and method based on image recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845357A (en) * 2016-12-26 2017-06-13 银江股份有限公司 A kind of video human face detection and recognition methods based on multichannel network
CN107515895A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 A kind of sensation target search method and system based on target detection
CN108010060A (en) * 2017-12-06 2018-05-08 北京小米移动软件有限公司 Object detection method and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3136290A1 (en) * 2015-08-28 2017-03-01 Thomson Licensing Method and device for determining the shape of an object represented in an image, corresponding computer program product and computer readable medium
GB2545661A (en) * 2015-12-21 2017-06-28 Nokia Technologies Oy A method for analysing media content
US10133955B2 (en) * 2015-12-31 2018-11-20 Adaptive Computation, Llc Systems and methods for object recognition based on human visual pathway
CN107038448B (en) * 2017-03-01 2020-02-28 中科视语(北京)科技有限公司 Target detection model construction method
CN108133489A (en) * 2017-12-21 2018-06-08 燕山大学 A kind of multilayer convolution visual tracking method of enhancing
CN108229379A (en) * 2017-12-29 2018-06-29 广东欧珀移动通信有限公司 Image-recognizing method, device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845357A (en) * 2016-12-26 2017-06-13 银江股份有限公司 A kind of video human face detection and recognition methods based on multichannel network
CN107515895A (en) * 2017-07-14 2017-12-26 中国科学院计算技术研究所 A kind of sensation target search method and system based on target detection
CN108010060A (en) * 2017-12-06 2018-05-08 北京小米移动软件有限公司 Object detection method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Squeeze-and-Excitation Networks;Jie Hu,et al;《arXiv:1709.01507》;20170905;第1-7页 *
卷积神经网络研究综述;周飞燕 等;《计算机学报》;20170122;第40卷(第6期);第1229-1251页 *
基于卷积神经网络的铁路桥梁高强螺栓缺失图像识别方法;赵欣欣 等;《中国铁道科学》;20180715;第39卷(第4期);第56-62页 *

Also Published As

Publication number Publication date
CN110751163A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
CN108229509B (en) Method and device for identifying object class and electronic equipment
CN108229489B (en) Key point prediction method, network training method, image processing method, device and electronic equipment
CN110569878B (en) Photograph background similarity clustering method based on convolutional neural network and computer
CN109960742B (en) Local information searching method and device
CN110135318B (en) Method, device, equipment and storage medium for determining passing record
CN107766864B (en) Method and device for extracting features and method and device for object recognition
CN112884782B (en) Biological object segmentation method, apparatus, computer device, and storage medium
CN110222572A (en) Tracking, device, electronic equipment and storage medium
WO2018100668A1 (en) Image processing device, image processing method, and image processing program
JP2021503139A (en) Image processing equipment, image processing method and image processing program
CN111928857B (en) Method and related device for realizing SLAM positioning in dynamic environment
CN108875500B (en) Pedestrian re-identification method, device and system and storage medium
CN111445496B (en) Underwater image recognition tracking system and method
CN110751163B (en) Target positioning method and device, computer readable storage medium and electronic equipment
CN115512238A (en) Method and device for determining damaged area, storage medium and electronic device
CN111985537A (en) Target image identification method, terminal, system and storage medium
WO2014205787A1 (en) Vehicle detecting method based on hybrid image template
CN114581709A (en) Model training, method, apparatus, and medium for recognizing target in medical image
CN114119970B (en) Target tracking method and device
CN116246161A (en) Method and device for identifying target fine type of remote sensing image under guidance of domain knowledge
CN115100469A (en) Target attribute identification method, training method and device based on segmentation algorithm
CN114332814A (en) Parking frame identification method and device, electronic equipment and storage medium
CN114842235A (en) Infrared dim and small target identification method based on shape prior segmentation and multi-scale feature aggregation
CN113240611A (en) Foreign matter detection method based on picture sequence
CN111524161A (en) Method and device for extracting track

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant