CN110751163B

CN110751163B - Target positioning method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN110751163B
Application number: CN201810821904.9A
Authority: CN
Inventors: 张鹏
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2023-05-26
Anticipated expiration: 2038-07-24
Also published as: CN110751163A

Abstract

The invention discloses a target positioning method and a device thereof, a computer readable storage medium and electronic equipment, wherein the target positioning method comprises the following steps: extracting features of the image to be identified to obtain a plurality of channel features; determining a weighting parameter corresponding to the channel characteristic aiming at any channel characteristic, wherein the weighting parameter corresponding to the channel characteristic is used for representing the relativity of the channel characteristic and a target position positioned from the image to be identified; correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics; and positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics. The target positioning method can improve the accuracy of target positioning.

Description

Target positioning method and device, computer readable storage medium and electronic equipment

Technical Field

The present invention relates to the field of image recognition technology, and in particular, to a target positioning method and apparatus, a computer readable storage medium, and an electronic device.

Background

Image recognition refers to a technique of processing, analyzing, and understanding an image with a computer to detect and recognize targets of various modes in the image.

Image recognition-based target positioning methods refer to methods for recognizing a specific target from an image and determining the position of the specific target in the image, and currently, a neural network can be used for target positioning.

The accuracy of the existing method for performing target positioning by using the neural network needs to be further improved.

Disclosure of Invention

The invention provides a target positioning method and a device thereof, a computer readable storage medium and electronic equipment, which solve the defects in the related art.

According to a first aspect of an embodiment of the present invention, there is provided a target positioning method, including:

extracting features of the image to be identified to obtain a plurality of channel features;

determining a weighting parameter corresponding to the channel characteristic aiming at any channel characteristic, wherein the weighting parameter corresponding to the channel characteristic is used for representing the relativity of the channel characteristic and a target position positioned from the image to be identified;

correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics;

and positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics.

Optionally, the extracting features of the image to be identified to obtain multi-channel features includes:

inputting the image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features;

the neural network is obtained through training the following steps:

building a neural network, wherein the neural network comprises a convolution layer, a pooling layer and a full connection layer;

acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;

inputting the training sample into the neural network to output a target type identification result of the marker image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample;

and training the neural network through a training sample to obtain a trained neural network.

Optionally, after obtaining the training sample including the marker image marked with the target type, the method further includes:

and carrying out shielding pretreatment on a partial region of the marked image.

Optionally, the determining the weighting parameter corresponding to the channel feature includes:

and inputting the multiple channel characteristics output by the convolution layer into the full-connection layer, and determining the weighting parameters corresponding to the channel characteristics by the full-connection layer.

deriving each feature in the channel features to obtain derivatives of each feature;

and taking the average value of the calculated derivative of each characteristic as a weighting parameter corresponding to the channel characteristic.

Optionally, the positioning the target position of the target included in the image to be identified by using the corrected channel features includes:

acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions;

and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.

According to a second aspect of an embodiment of the present invention, there is provided an object positioning apparatus including:

the feature extraction module is used for extracting features of the image to be identified to obtain a plurality of channel features;

the weighting parameter determining module is used for determining a weighting parameter corresponding to the channel characteristic aiming at any channel characteristic, wherein the weighting parameter corresponding to the channel characteristic is used for representing the correlation degree between the channel characteristic and a target position positioned from the image to be identified;

the characteristic correction module is used for correcting the channel characteristic according to the weighting parameter corresponding to the channel characteristic;

and the target position positioning module is used for positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics.

Optionally, the feature extraction module is specifically configured to:

the device also comprises a training module for:

and training the neural network through a certain number of training samples to obtain a trained neural network.

Optionally, the weighting parameter determining module is specifically configured to:

and inputting the multiple channel characteristics output by the convolution layer into the full connection layer, and determining the weighting parameters corresponding to each channel characteristic by the full connection layer.

deriving each feature in each channel feature to obtain a derivative of each feature;

and taking the calculated average value of the derivative of each characteristic of each channel as a weighting parameter corresponding to each channel characteristic.

Optionally, the target position positioning module is specifically configured to:

and determining the position corresponding to the response value larger than the threshold value, and taking the region comprising the position corresponding to the response value larger than the threshold value as the position of the target.

According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements any of the methods described above.

According to a fourth aspect of embodiments of the present invention there is provided an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause the execution of any one of the methods described above.

According to the technical scheme, the target positioning method can improve the accuracy of target positioning.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a workflow diagram of a target positioning method provided in accordance with an exemplary embodiment of the present invention;

FIG. 2 is a workflow diagram of a target positioning method provided in accordance with another exemplary embodiment of the invention;

fig. 3A to 3C are effect diagrams of a target position located from an image to be identified by using the target location method provided by the embodiment of the present invention;

FIG. 4 is a schematic diagram of a target location process according to a target location method provided in an exemplary embodiment of the present invention;

FIG. 5 is a schematic diagram of a visual analysis of a multi-channel feature provided in accordance with an exemplary embodiment of the present invention;

FIG. 6 is a block diagram of an object positioning apparatus provided in accordance with yet another embodiment of the present invention;

fig. 7 is a hardware configuration diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The target positioning method is a positioning method based on an image recognition technology, and the position of a specific target in the image is recognized and positioned by the image.

Several specific examples are given below to describe the technical solutions of the present application in detail. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

Fig. 1 is a flowchart of a target positioning method according to an exemplary embodiment of the present invention, and referring to fig. 1, the target positioning method includes:

step S10, extracting features of an image to be identified to obtain a plurality of channel features;

step S20, aiming at any channel feature, determining a weighting parameter corresponding to each channel feature, wherein the weighting parameter corresponding to the channel feature is used for representing the relativity of the channel feature and a target position positioned from the image to be identified;

step S30, correcting the channel characteristics according to the weighting parameters corresponding to the channel characteristics;

and S40, positioning the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics.

The invention belongs to the field of target positioning/detection in a weak supervision scene in machine vision, namely information used for training a positioning/detection algorithm is not rectangular frame calibration information (Bounding Box Annotation) which is commonly used, but is category information of pictures. Although only the category information of the picture is used, the method can more accurately position the target through operations such as random shielding of a data layer and weighting of a feature map, and can be used as a front module of tasks such as sample calibration, target classification and recognition, and the learning difficulty of the tasks is reduced.

According to the target positioning method under the weak supervision scene, network parameters are regulated under the guidance of weak supervision information (picture type information), and representative characteristics of a target are extracted; the characteristic can generate specific responses to different parts of the target, and the type of the target can be obtained by weighting the characteristic map so as to obtain the position information of the target.

The image to be identified may be an image acquired in real time by an image acquisition device (e.g. a camera or a video camera, etc.), or an image pre-stored by a device applying the method.

The image recognition algorithm or the neural network based on deep learning can be utilized to recognize an image to be recognized (hereinafter referred to as an image), a plurality of channel characteristics are extracted from the image, the channel characteristics refer to an output result of performing characteristic detection on the image, and one channel characteristic refers to an output result of detecting a certain characteristic; for feature extraction of an image through a neural network, channel features are output results obtained after filtering processing by adopting a convolution filter, and can also be called a feature map, and the number of the channel features is related to the number of the convolution filters adopted.

The channel features may represent all features of the image (i.e., object-level features), such as texture features, color features, spatial relationship features, etc., of the image, or local features (i.e., features of a portion of the object, such as head features, torso features, etc.), such as shape features of portions of the image that include the object, edge features of the object, etc.

The target refers to a specific object pre-identified from the image, and for what type of object needs to be identified, namely, the target is related to the channel characteristics which can be extracted and the algorithm for classifying based on the channel characteristics, the target position is the region where the target is located in the image, for image identification, the region where the target is located can be defined in the image, for example, the region where the target is located is defined by a square frame, a polygonal frame or frames with other shapes, and the position of the frame is the target position.

The weighting parameters corresponding to the channel features are used for the correlation between the channel features and the target position located in the image to be identified, that is to say, the weighting parameters may represent the influence of the corresponding channel features on the target position identification result, the larger the weighting parameters represent the influence of the channel features on the target position identification result, for example, the influence of the channel features capable of representing the wheel part, the window part and the logo part of the vehicle on the vehicle position identification result is larger for identifying the vehicle in the image, so that the weighting parameters of the channel features are larger, the influence of the channel features representing the color or the texture of the vehicle on the reasonable identification result is smaller, and the weighting parameters of the channel features are smaller.

After the channel characteristics are corrected through the weighting parameters corresponding to the channel characteristics, the weight of the channel characteristics with large influence on the target identification result can be enhanced, and the weight of the channel characteristics with large influence on the target identification result is weakened, so that the type of the target can be accurately identified, the identification of the position of the target is facilitated, and the accuracy of target positioning is improved.

In an optional embodiment, the extracting the features of the image to be identified in step S10 to obtain the multi-channel features includes:

inputting the image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features.

In this embodiment, feature extraction is performed by using a neural network, where the neural network includes a convolution layer, and a plurality of channel features may be obtained after convolution processing of the convolution layer, and may include one or more convolution layers, where each convolution layer may include one or more convolution kernels, and the convolution kernels may slide in a certain step to perform convolution processing on each region of the image, and one channel feature may be obtained after convolution processing of each convolution kernel, where the number of channel features obtained finally depends on the number of convolution kernels of the last convolution layer.

For example, assuming that the size of the convolution kernel is, for example, 4×4, the size of the image is, for example, 16×16, the step size may be 1, 2, 3, 4, or 6, and the convolution kernel slides with the step size, the convolution processing is sequentially performed on each region of the image, one channel feature is obtained after the convolution of the entire image is completed, and a plurality of channel features are obtained through the convolution processing of a plurality of convolution kernels.

Wherein, for a plurality of channel features, the plurality of channel features may form a three-dimensional matrix, and the size of the matrix may be expressed as h×w×c, where H is the height of the channel feature and indicates the number of pixels divided in the longitudinal direction of the channel feature; w is the width of the channel feature and represents the number of pixels divided in the transverse direction of the channel feature; c represents the number of channels, which is determined by the number of convolution kernels of the last convolution layer of the underlying convolution network, each convolution kernel of the last convolution layer calculating a signature of a channel. It should be noted that, when convolution is performed, a plurality of convolution kernels may be used, where each convolution kernel calculates a channel characteristic corresponding to one channel, and one channel characteristic may be represented by h×w×1, and one channel characteristic corresponds to one channel.

The neural network is a depth-based neural network, such as convolutional neural network CNN

(Convolutional Neural Network, abbreviated as CNN), which is a feedforward artificial neural network, neurons of which can respond to surrounding units within a limited coverage range, and effectively extract feature information of an image through weight sharing and feature aggregation.

When the neural network is trained, the neural network is trained in a weak supervision mode, and the training process comprises the following steps:

step S01, building a neural network, wherein the neural network comprises a convolution layer;

the neural network may include one or more convolutional layers.

Step S02, acquiring a training sample, wherein the training sample comprises a marked image marked with a target type;

in this step, the marked image is used as a training sample, the marked image is an image marked with the type of the target, only the type of the target in the image is needed to be marked, the position of the target is not needed to be marked, and the marked image is an image marked with a rough mark, for example, the marked image is an image with a 'cow', 'grass' and 'sky' label, and the neural network only knows the objects with the labels in the image, but does not know the specific positions of the objects, so that the objects can be 'cow', 'grass' or 'sky' for each pixel of the image.

Step S03, inputting the training sample into the neural network so as to output a target type identification result of the marked image by the neural network, and updating parameters in the neural network according to the difference between the target type identification result output by the neural network and the target type in the training sample.

And step S04, training the neural network through a training sample to obtain a trained neural network.

Specifically, the training sample is, for example, a marked image X, and feature extraction is performed through a neural network to obtain a plurality of channel features, where the channel features can effectively preserve the spatial relative relationship of the target, and y=f (X) is used to represent the target type recognition result output by the neural network, where f is a set description of the neural network operation (including convolution, pooling, full connection, etc.), and if the recognition of the weak supervision task is classified, the target type recognition result Y output by the neural network represents the probability that the marked image X belongs to the target type; if the identification of the weak supervision task is classified as image annotation, Y represents the probability that the marked image X has the image annotation.

Parameters in the neural network are supervised and updated through the difference between the Y and the target type in the training sample, so that the neural network can be trained end to end.

The parameters include, for example, parameters in the neural network that are related to the correlation function, and the parameters can be modified by means of gradient back propagation so as to minimize the difference between the target type recognition result output by the neural network and the target type in the training sample.

The training samples of a certain number can be input into the neural network to train the network, and after the neural network is trained by the samples of a certain number, the trained neural network is obtained.

When the neural network is trained, the neural network is trained in a weak supervision mode, and the required workload is far less than the workload required to mark the specific position of each target as only the target type is required to mark the training sample.

The weak supervision mode refers to: training is performed solely on image-level tags with weak supervision, and the type of object contained in the image is used to identify and locate objects in the image without knowing the specific location of the object in the image.

In an alternative embodiment, before obtaining the training sample including the marker image marked with the target type, the method further includes:

and carrying out shielding treatment on the partial area of the marked image.

For training the neural network in the weak supervision mode, the features learned by the neural network are mainly the features of the salient region of the target, and the features of the non-salient region of the target are difficult to learn, so that the neural network is forced to pay attention to the non-salient region, the general features of the non-salient region can be learned, and the neural network can learn not only the salient features in the sample image but also the general features in the sample image by randomly shielding the sample image, so that the positioning accuracy is improved. For a large number of training samples, the partial areas of each marking image can be randomly shielded, for example, the marking image can be divided into areas with different sizes (such as 32 x 32 or 64 x 64), and the color of one or the partial areas is converted into black with a certain probability, so that the partial areas are shielded.

In some examples, the determining the image recognition coefficients corresponding to the respective target image features in the step S20 includes:

and S21, inputting the plurality of channel characteristics output by the convolution layer into the full-connection layer, and determining the weighting parameters corresponding to the channel characteristics by the full-connection layer.

After the convolution processing of the convolution layer, a plurality of channel characteristics are output, each channel characteristic can represent all area characteristics, local area characteristics and the like of the shape and the color of the target, after the plurality of channel characteristics are input into the full-connection layer, the full-connection layer can screen the plurality of channel characteristics according to a certain rule to determine a key area and a non-key area of the target, accordingly, the weighting parameters corresponding to each channel characteristic are determined, the weighting parameters of the corresponding channel characteristics are larger for the key area of the target, and the weighting parameters of the corresponding channel characteristics are smaller for the non-key area of the target.

In some examples, the weighting parameters corresponding to each channel may also be determined by a method comprising:

step S22, deriving each feature in the channel features to obtain the derivative of each feature;

step S23, taking the calculated weighted average value of the derivative of each characteristic as the weighted parameter corresponding to the channel characteristic.

The above embodiment is another method for determining the weighting parameter corresponding to each channel feature through the fully connected layer of the neural network, specifically, each channel feature may include a feature of a plurality of positions, for each channel feature, derivative of each feature is obtained by deriving the feature of each position, and then the weighted average of the derivatives is calculated, and the average is used as the weighting parameter corresponding to the channel feature.

For each channel feature, it may be represented by a function, for each feature of each position on the function, the derivative of each point on the function may be calculated, i.e. the derivative of each feature may be obtained, and the derivative of the function at a certain point, in particular the tangential slope of the curve represented by the function at that point.

In this embodiment, the weighted parameters corresponding to the channel features are determined by deriving the channel features output by the convolution layer, which is favorable for obtaining the contour features and texture features of the target through the derivation operation, and the influence of the image illumination on the target identification can be weakened, so that the accuracy of target positioning is favorable to be improved.

In an alternative embodiment, as shown in fig. 2, locating the target position of the target included in the image to be identified using the corrected channel features in step S40 includes:

step S41, obtaining response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions;

step S42, determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.

The plurality of channel features output by the convolution layer are data of a plurality of dimensions, for example, the plurality of channel features F (x_o) form a matrix with a shape of h×w×c, where H represents a height of the matrix, W represents a width of the matrix, and C represents a number of channels, and values at each position in the matrix correspond to one channel feature respectively.

After weighting the channel characteristics according to the weighting parameters, each channel characteristic after correction can represent a response value of each position of the image, the response value represents the probability of the position having the target, namely, the probability of the position having the target is larger, the response value is larger, the probability of the position having the target is larger, after correction processing is carried out on each channel characteristic through the weighting parameters corresponding to each channel characteristic, the weight of the channel characteristic with great influence on the target recognition result can be strengthened, the weight of the channel characteristic with great influence on the target recognition result is weakened, and the target position is positioned more accurately.

The response values of the positions may not be the same, i.e. the probability that the targets exist in the positions of the images are different, in order to further locate the positions of the targets, a threshold is set, only the positions with response values larger than the threshold are reserved, and the positions are positions with high possibility of existence of the targets, so that the positions of the targets in the positions can be screened out from the images, the background is filtered out, for example, if the targets are people, the positions with the response values larger than the threshold can comprise the positions of the parts of the head, the body, the feet, the arms and the like, the positions can represent the positions of the parts of the targets, the areas containing the positions are used as target positions, for example, an external rectangle containing the positions is drawn as a labeling frame, and the area where the labeling frame is the target position, so that the positioning of the targets is realized.

Fig. 3A-3C show effect diagrams of locating a target position from each picture to be identified by adopting the target locating method, fig. 3A is a position of a located vehicle, fig. 3B is a position of a located airplane, and fig. 3C is a position of a located bird.

The above-mentioned target positioning method is described below by taking an image to be identified as an image including a person and a dog as an example, and referring to fig. 4, the image including the person and the dog is the image to be identified, and the dog is the target, and the specific identification process is as follows:

inputting the image into a neural network, wherein the neural network comprises a plurality of convolution layers, and the convolution layers are used for carrying out convolution processing on the image to obtain a plurality of channel characteristics;

determining weighting parameters corresponding to the characteristics of each channel;

in the training process of the neural network, the multiple channel features obtained by the convolution layer are processed by the pooling layer and then input into the full-connection layer, and the weighting parameters corresponding to the channel features are determined by the full-connection layer, for example, the weighting parameters of the channel features in fig. 4 are respectively w ₁ 、w ₂ 、…、w _n 。

Carrying out correction processing on each channel characteristic according to the weighting parameter corresponding to each channel characteristic;

acquiring response values of all positions of the image to be identified according to the corrected channel characteristics;

referring to fig. 5, a schematic diagram of a visual analysis of channel features is shown in fig. 5, response values of each position of an image to be identified may be obtained according to each channel feature, for example, each graph on the left of an equal sign in fig. 5 shows a response graph of each channel feature to a target, it can be seen from fig. 5 that each channel feature may indicate a probability that a target exists at each position in the image, a region with higher brightness in the graph is a region with a high probability that the target exists in the channel feature, and a position with a higher response value indicates a high probability that the target exists.

Finally, the target position can be obtained by carrying out threshold screening on each position, the position corresponding to the response value larger than the threshold value is determined, the region comprising the position corresponding to the response value larger than the threshold value is taken as the target position, each local position of the target can be shown in each position with the response value larger than the threshold value, the screened region comprising each position can be identified through the marking frame, the region is the position of the target, for example, the marking frame is used for identifying the target position in the right graph of the equal sign in fig. 5.

The embodiment of the present invention also provides a target positioning device, as shown in fig. 6, where the target positioning device 06 includes:

the feature extraction module 61 is configured to perform feature extraction on an image to be identified to obtain a plurality of channel features;

the weighted parameter determining module 62 is configured to determine, for any channel feature, a weighted parameter corresponding to the channel feature, where the weighted parameter corresponding to the channel feature is used to characterize a correlation between the channel feature and a target position located in the image to be identified;

the feature correction module 63 is configured to perform correction processing on the channel feature according to the weighting parameter corresponding to the channel feature;

the target position locating module 64 is configured to locate a target position of the target included in the image to be identified using the corrected respective channel features.

In some examples, the feature extraction module is specifically configured to:

the device also comprises a training module for:

In an alternative embodiment, after acquiring the training sample including the marker image marked with the target type, the method further includes:

In an alternative embodiment, the weighting parameter determination module is specifically configured to:

For example, the weighting parameter determining module is specifically configured to:

In some examples, the target position location module is specifically configured to:

Corresponding to the embodiment of the target positioning method, the target positioning device provided by the invention can improve the accuracy of target positioning.

For the embodiment of the apparatus, the implementation process of the functions and roles of each unit is specifically detailed in the implementation process of the corresponding steps in the above method, which is not described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the units may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The apparatus of this embodiment may be implemented by software or by software plus necessary general-purpose hardware through the description of the above embodiments, and may of course also be implemented by hardware. Based on such understanding, the technical solution of the present invention may be essentially or partly contributing to the prior art in the form of a software product, which is implemented as a device in a logical sense by reading corresponding computer program instructions in a non-volatile memory into a memory by a processor where an apparatus applying the device is located.

The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method of any of the embodiments described above.

Referring to fig. 7, the present invention also provides a hardware architecture diagram of an electronic device, the electronic device including: a communication interface 101, a processor 102, a machine-readable storage medium 103, a non-volatile storage medium 104, and a bus 105; wherein the communication interface 101, the processor 102, the machine-readable storage medium 103, and the non-volatile storage medium 104 communicate with each other via a bus 105. The processor 102 may perform the object localization method described above by reading and executing machine-executable instructions in the machine-readable storage medium 103 corresponding to the control logic of the object localization method.

The machine-readable storage medium 103 referred to herein may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.

In addition, the electronic device may be various terminal devices or backend devices, such as a video camera, a server, a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A method of locating a target, comprising:

inputting an image to be identified into a trained neural network, and extracting features of the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features; when training the neural network, taking a marked image marked with a target type as training input and taking a target type identification result of the marked image as training output;

inputting the channel characteristics to a full-connection layer of the neural network aiming at any channel characteristic output by the convolution layer, and determining a weighting parameter corresponding to the channel characteristics by the full-connection layer, wherein the weighting parameter corresponding to the channel characteristics is used for representing the relativity of the channel characteristics and a target position positioned from the image to be identified;

positioning a target position containing a target in the image to be identified by utilizing each corrected channel characteristic, wherein the method comprises the following steps: acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions; and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.

2. The method according to claim 1, wherein the neural network is trained by:

3. The method of claim 1, further comprising, after acquiring the training sample comprising the marker image marked with the target type:

4. A target positioning device, comprising:

the feature extraction module is used for inputting the image to be identified into a trained neural network, and carrying out feature extraction on the image to be identified by a convolution layer of the neural network to obtain a plurality of channel features; when training the neural network, taking a marked image marked with a target type as training input and taking a target type identification result of the marked image as training output;

the weighting parameter determining module is used for inputting the channel characteristic to the full-connection layer of the neural network aiming at any channel characteristic output by the convolution layer, determining a weighting parameter corresponding to the channel characteristic by the full-connection layer, wherein the weighting parameter corresponding to the channel characteristic is used for representing the relativity of the channel characteristic and a target position positioned from the image to be identified;

the target position locating module is used for locating the target position of the target contained in the image to be identified by utilizing the corrected channel characteristics, and comprises the following steps: acquiring response values of all positions of the image to be identified according to the corrected channel characteristics, wherein the response values represent the probability of targets in the positions; and determining the position corresponding to the response value larger than the threshold value, and taking the area comprising the position corresponding to the response value larger than the threshold value as the target position.

5. The apparatus of claim 4, further comprising a training module to:

6. The apparatus of claim 4, wherein the weighting parameter determination module is specifically configured to:

7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.

8. An electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor to cause the method of any one of claims 1 to 3 to be performed.