CN111275040A

CN111275040A - Positioning method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN111275040A
Application number: CN202010058788.7A
Authority: CN
Inventors: 战赓; 欧阳万里
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-01-18
Filing date: 2020-01-18
Publication date: 2020-06-12
Anticipated expiration: 2040-01-18
Also published as: JP2022540101A; WO2021143865A1; KR20220093187A; CN111275040B

Abstract

The invention provides a positioning method and device, electronic equipment and a computer readable storage medium, and the positioning method and device determine an object anchor frame for each pixel point in a target image based on an image characteristic diagram of the target image, namely an object frame corresponding to object frame information, wherein mutual exclusivity exists between the anchor frame and a corresponding predicted anchor frame, the number of the object anchor frames used in an object positioning process is reduced, and the calculated amount is reduced. Meanwhile, based on the image feature map of the target image, the object type information of the object to which each pixel point in the target image belongs, the confidence corresponding to the object frame information and the confidence corresponding to the object type information can be determined, and then the final confidence corresponding to the object frame information is determined based on the two determined confidences, so that the information expression capability of the object frame is effectively enhanced, and the accuracy of object positioning based on the object frame is improved.

Description

Positioning method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer technology and image processing, and in particular, to a positioning method and apparatus, an electronic device, and a computer-readable storage medium.

Background

Object detection or object positioning is an important basic technology in computer vision, and is particularly applied to scenes such as instance segmentation, object tracking, pedestrian recognition, face recognition and the like.

Object detection or object positioning is mostly realized by using an object anchor frame. However, the object positioning has the defects of large calculation amount of object positioning and inaccurate positioning caused by large number of object anchor frames used in object positioning, weak expression capability of the object anchor frames and the like.

Disclosure of Invention

In view of the above, the present disclosure provides at least a positioning method and apparatus.

In a first aspect, the present disclosure provides a positioning method, including:

acquiring a target image;

determining object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, a first confidence degree corresponding to the object type information and a second confidence degree corresponding to the object frame information in the target image based on the image feature map of the target image;

respectively determining the target confidence of the object frame information of the object to which each pixel point belongs based on the first confidence and the second confidence;

and determining the positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information.

In the above embodiment, only one object anchor frame, that is, an object frame corresponding to the object frame information, can be determined for each pixel point in the target image based on the image feature map of the target image, so that the number of the object anchor frames used in the object positioning process is reduced, the amount of calculation is reduced, and the efficiency of object positioning is improved. Meanwhile, based on the image feature map of the target image, the object type information of the object to which each pixel point in the target image belongs, the confidence degree corresponding to the object frame information and the confidence degree corresponding to the object type information can be determined, and then the final confidence degree corresponding to the object frame information is determined based on the two determined confidence degrees, so that the information expression capability of the object frame or the object frame information is effectively enhanced, the positioning information and the object type information of the object frame corresponding to the object frame information can be expressed, the confidence degree information of the object frame information can also be expressed, and the accuracy of object positioning based on the object frame can be improved.

In a possible implementation manner, the image feature map includes a classification feature map for classifying objects to which pixel points in the target image belong and a positioning feature map for positioning the objects to which pixel points in the target image belong;

the determining, based on the image feature map of the target image, object type information of an object to which each pixel point belongs, object border information of the object to which each pixel point belongs, a first confidence degree corresponding to the object type information, and a second confidence degree corresponding to the object border information in the target image includes:

determining object type information of an object to which each pixel point in the target image belongs and a first confidence corresponding to the object type information based on the classification feature map;

and determining the object frame information of the object to which each pixel point belongs in the target image and a second confidence corresponding to the object frame information based on the positioning feature map.

In the above embodiment, based on the classification feature map and the positioning feature map of the target image, not only the object frame information of the object to which each pixel point belongs in the target image is determined, but also the object type information of the object to which each pixel point belongs in the target image is determined, and the confidence degrees corresponding to the object type information and the object frame information, respectively, are determined, so that the information expression capability of the object frame is improved, and the accuracy of object positioning based on the object frame is improved.

In a possible implementation manner, the determining, based on the location feature map, object bounding box information of an object to which each pixel point in the target image belongs includes:

respectively determining a target distance range in which the distance between a pixel point and each frame in the object frames of the object to which the pixel point belongs is located based on the positioning feature map aiming at the pixel point in the target image;

respectively determining the target distance between the pixel point and each frame in the object frames of the object to which the pixel point belongs based on the target distance range and the positioning feature map;

and determining the object frame information of the object to which the pixel point belongs based on the position information of the pixel point in the target image and the target distance between the pixel point and each frame.

In the above embodiment, the target distance range in which the distance between the pixel point and each frame in the object frame of the object to which the pixel point belongs is determined, and then the target distance between the pixel point and each frame is determined based on the determined target distance range, and the accuracy of the determined target distance can be improved through the two steps of processing. Then, based on the determined accurate target distance, an object frame with an accurate position can be determined for the pixel point, and the accuracy of the determined object frame is improved.

In a possible implementation manner, determining a target distance range in which a distance between a pixel point and each of object borders of the object at the pixel point is located includes:

aiming at one border in the object borders of the object to which one pixel point in the target image belongs, determining the maximum distance between the pixel point and the border based on the positioning feature map;

carrying out segmentation processing on the maximum distance to obtain a plurality of distance ranges;

determining a first probability value that the distance between the pixel point and the bounding box is within each distance range based on the positioning feature map;

based on the determined first probability value, selecting a target distance range in which the distance between the pixel point and the frame is located from the plurality of distance ranges.

In the above embodiment, the distance range corresponding to the maximum probability value may be selected as the target distance range in which the distance between the pixel point and a certain frame is located, so that the accuracy of the determined target distance range is improved, and the accuracy of the distance between the pixel point and the certain frame determined based on the target distance range is improved.

In a possible embodiment, the selecting, based on the determined first probability value, a target distance range in which a distance between the pixel point and the bounding box is located from the plurality of distance ranges includes:

determining the uncertain parameter value of the distance between the pixel point and the frame of the edge based on the positioning feature map;

determining a target probability value that the distance between the pixel point and the bounding box is within each distance range based on the distance uncertainty parameter value and each first probability value;

and taking the distance range corresponding to the maximum target probability value as the target distance range in which the distance between the pixel point and the frame of the edge is positioned.

In the above embodiment, while the first probability value that the distance between the pixel point and the certain bounding box is within each distance range is determined, an uncertain parameter value is also determined, the first probability value can be corrected and corrected based on the uncertain parameter value, so as to obtain the target probability value that the distance between the pixel point and the certain bounding box is within each distance range, and improve the accuracy of the probability value that the distance between the determined pixel point and the certain bounding box is within each distance range, thereby being beneficial to improving the accuracy of the target distance range determined based on the probability value.

In a possible implementation manner, determining a second confidence degree corresponding to the object bounding box information includes:

and determining a second confidence corresponding to the object frame information of the object to which the pixel point belongs based on a first probability value corresponding to a target distance range in which the distance between one pixel point in the target image and each frame in the object frame of the object to which the pixel point belongs is located.

In the above embodiment, the confidence of the object frame information of the object to which the pixel point belongs can be determined by using the maximum first probability value corresponding to the distance between the pixel point and each frame, so that the information expression capability of the object frame is enhanced.

In a possible implementation manner, the determining, based on the classification feature map, object type information of an object to which each pixel point in the target image belongs includes:

determining a second probability value of each preset object type of an object to which each pixel point in the target image belongs based on the classification feature map;

and determining the object type information of the object to which the pixel point belongs based on the preset object type corresponding to the maximum second probability value.

In the embodiment, the preset object type corresponding to the maximum probability value is selected as the object type information of the object to which the pixel point belongs, so that the accuracy of the determined object type information is improved.

In a possible implementation manner, the determining, based on the object border information of the object to which each pixel belongs and the target confidence of the object border information, the location information of the object in the target image includes:

screening a plurality of target pixel points from the target image; the distance between different target pixel points in the target image is smaller than a preset threshold value, and the object type information of the objects to which the different target pixel points belong is the same;

selecting object frame information corresponding to the highest object confidence from the object frame information of the object to which each target pixel point belongs to obtain target frame information;

and determining the positioning information of the object in the target image based on the selected target frame information and the target confidence corresponding to the target frame information.

According to the embodiment, the object frame information with the highest target confidence coefficient is selected from the pixel points which are relatively close and have the same object type information, so that the object is positioned, the number of the object frame information for positioning the object can be effectively reduced, and the timeliness of the object positioning is improved.

In a second aspect, the present disclosure provides a positioning device comprising:

the image acquisition module is used for acquiring a target image;

the image processing module is used for determining object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, a first confidence coefficient corresponding to the object type information and a second confidence coefficient corresponding to the object frame information in the target image based on the image feature map of the target image;

the confidence processing module is used for respectively determining the target confidence of the object frame information of the object to which each pixel point belongs based on the first confidence and the second confidence;

and the positioning module is used for determining the positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information.

the image processing module is configured to:

In a possible implementation manner, when determining, based on the positioning feature map, object border information of an object to which each pixel point in the target image belongs, the image processing module is configured to:

In a possible implementation manner, when determining a target distance range in which a distance between a pixel point and each of object borders of the object is located, the image processing module is configured to:

In a possible embodiment, the image processing module, when selecting, from the plurality of distance ranges, a target distance range in which the distance between the pixel point and the bounding box is located, based on the determined first probability value, is configured to:

and taking the distance range corresponding to the maximum target probability value as the target distance range in which the distance between the pixel point and the side frame is positioned.

In a possible implementation manner, when determining the second confidence degree corresponding to the object bounding box information, the image processing module is configured to:

In a possible implementation manner, when determining the object type information of the object to which each pixel point in the target image belongs based on the classification feature map, the image processing module is configured to:

In one possible embodiment, the positioning module is configured to:

In a third aspect, the present disclosure provides an electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the steps of the positioning method as described above.

In a fourth aspect, the present disclosure also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, performs the steps of the positioning method as described above.

The above-mentioned apparatus, electronic device, and computer-readable storage medium of the present disclosure at least include technical features substantially the same as or similar to technical features of any aspect or any implementation manner of any aspect of the above-mentioned method of the present disclosure, and therefore, for the description of the effects of the above-mentioned apparatus, electronic device, and computer-readable storage medium, reference may be made to the description of the effects of the above-mentioned method contents, which is not repeated herein.

Drawings

To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings may be obtained from the drawings without inventive effort.

Fig. 1 shows a flowchart of a positioning method provided by an embodiment of the present disclosure;

fig. 2 is a flow chart illustrating another positioning method provided by the embodiment of the present disclosure;

fig. 3 is a flowchart illustrating determining object border information of an object to which each pixel point in a target image belongs based on a positioning feature map in yet another positioning method according to an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a further positioning method provided by the embodiment of the disclosure, wherein a target distance range in which a distance between a pixel point and a bounding box is located is selected from a plurality of distance ranges based on a determined first probability value;

fig. 5 is a flowchart illustrating a method for determining location information of an object in a target image according to object border information of an object to which each pixel belongs and a target confidence of the object border information in another location method provided by the embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of a positioning device provided by an embodiment of the present disclosure;

fig. 7 shows a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it should be understood that the drawings in the present disclosure are for illustrative and descriptive purposes only and are not used to limit the scope of the present disclosure. Additionally, it should be understood that the schematic drawings are not necessarily drawn to scale. The flowcharts used in this disclosure illustrate operations implemented according to some embodiments of the present disclosure. It should be understood that the operations of the flow diagrams may be performed out of order, and steps without logical context may be performed in reverse order or simultaneously. In addition, one skilled in the art, under the direction of the present disclosure, may add one or more other operations to the flowchart, and may remove one or more operations from the flowchart.

In addition, the described embodiments are only a few embodiments of the present disclosure, not all embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

It is to be noted that the term "comprising" will be used in the disclosed embodiments to indicate the presence of the features stated hereinafter, but does not exclude the addition of further features.

The disclosure provides a positioning method and device, an electronic device, and a computer-readable storage medium, in order to reduce the number of object anchor frames used for positioning and improve the information expression capability of the object anchor frames in the process of positioning an object by using the object anchor frames, so as to improve the accuracy of object positioning. According to the method and the device, only one object anchor frame, namely the object frame corresponding to the object frame information, is determined for each pixel point in the target image based on the image characteristic diagram of the target image, so that the number of the object anchor frames used in the object positioning process is reduced, and the calculation amount is reduced. Meanwhile, based on the image feature map of the target image, the object type information of the object to which each pixel point in the target image belongs, the confidence corresponding to the object frame information and the confidence corresponding to the object type information can be determined, and then the final confidence corresponding to the object frame information is determined based on the two determined confidences, so that the information expression capability of the object frame is effectively enhanced, and the accuracy of object positioning based on the object frame is improved.

The following describes the positioning method and apparatus, electronic device, and computer-readable storage medium according to the present disclosure with specific embodiments.

The embodiment of the disclosure provides a positioning method, which is applied to a terminal device for positioning an object in an image. Specifically, as shown in fig. 1, the positioning method provided by the embodiment of the present disclosure includes the following steps:

and S110, acquiring a target image.

Here, the target image may be an image including a target object captured in an object tracking process, or may be an image including a human face captured in human face detection, and the present disclosure does not limit the use of the target image.

The target image comprises at least one object to be positioned. The object may be an object, a human, an animal, or the like.

The target image may be captured by a terminal device executing the positioning method of the present embodiment, or may be captured by another device and then transmitted to the terminal device executing the positioning method of the present embodiment.

S120, determining object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, a first confidence degree corresponding to the object type information and a second confidence degree corresponding to the object frame information in the target image based on the image feature map of the target image.

Before this step is executed, the target image needs to be processed to obtain an image feature map corresponding to the target image. In specific implementation, the convolutional neural network can be used for extracting image features of the target image to obtain an image feature map.

After the image feature map of the target image is determined, the image feature map is processed, and the object type information of the object to which each pixel point belongs, the object frame information of the object to which each pixel point belongs, the first confidence degree corresponding to the object type information and the second confidence degree corresponding to the object frame information in the target image can be determined. In specific implementation, the convolutional neural network may be used to further extract image features from the image feature map to obtain the object type information, the object border information, the first confidence level, and the second confidence level.

The object type information includes an object type of an object to which the pixel point belongs. The object frame information includes a distance between the pixel point and each frame in the object frame corresponding to the object frame information. The object frame may be referred to as an object anchor frame.

The first confidence degree is used for representing the accuracy degree or the credibility degree of the object type information determined based on the image feature map. The second confidence level is used for representing the accuracy or credibility of the object frame information determined based on the image feature map.

S130, respectively determining the target confidence of the object frame information of the object to which each pixel point belongs based on the first confidence and the second confidence.

Here, specifically, a product of the first confidence level and the second confidence level may be used as the target confidence level corresponding to the object border information. The target confidence is used for comprehensively representing the positioning accuracy and the classification accuracy of the object frame corresponding to the object frame information.

Of course, other methods may also be utilized to determine the target confidence, for example, the target confidence may be determined by combining the preset weight of the first confidence, the preset weight of the second confidence, the first confidence and the second confidence, and the present disclosure does not limit the specific implementation scheme for determining the target confidence based on the first confidence and the second confidence.

S140, determining the positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information.

Here, the object frame information of the object to which the pixel point belongs and the target confidence of the object frame information may be used as the positioning information of the object to which the pixel point belongs in the target image, and then the positioning information of each object in the target image is determined based on the positioning information of the object to which each pixel point belongs in the target image.

The method and the device have the advantages that the object frame information of the object to which the pixel point belongs is determined, the target confidence coefficient of the object frame information is also determined, the information expression capacity of the object frame or the object frame information is effectively enhanced, the positioning information and the object type information of the object frame corresponding to the object frame information can be expressed, the confidence coefficient information of the object frame information can be expressed, and accordingly the accuracy of object positioning based on the object frame is improved.

In addition, the above embodiment can determine an object anchor frame, that is, an object frame corresponding to the object frame information, for each pixel point in the target image based on the image feature map of the target image, thereby reducing the number of object anchor frames used in the object positioning process, reducing the amount of calculation, and improving the efficiency of object positioning.

In some examples, as shown in fig. 2, the image feature map includes a classification feature map for classifying objects to which pixel points in the target image belong and a localization feature map for localizing the objects to which pixel points in the target image belong.

In a specific implementation, as shown in fig. 2, the classification feature map and the localization feature map may be obtained by performing image feature extraction on the target image by using a convolutional neural network to obtain an initial feature map, and then processing the initial feature map by using 4 convolutional layers of 3 × 3 and 256 inputs and outputs, respectively.

After the classification feature map and the positioning feature map are obtained, the object type information of the object to which each pixel point belongs, the object border information of the object to which each pixel point belongs, the first confidence degree corresponding to the object type information, and the second confidence degree corresponding to the object border information in the target image are determined based on the image feature map of the target image, which can be specifically realized by the following steps:

determining object type information of an object to which each pixel point in the target image belongs and a first confidence corresponding to the object type information based on the classification feature map; and determining the object frame information of the object to which each pixel point belongs in the target image and a second confidence corresponding to the object frame information based on the positioning feature map.

In specific implementation, the classification feature map may be subjected to image feature extraction by using a convolutional neural network or a convolutional layer, so as to obtain object type information of an object to which each pixel point belongs and a first confidence corresponding to the object type information. And performing image feature extraction on the positioning feature map by using a convolutional neural network or a convolutional layer to obtain object frame information of an object to which each pixel point belongs and a second confidence corresponding to the object frame information.

In the embodiment, based on the classification feature map and the positioning feature map of the target image, not only the object frame information of the object to which each pixel point belongs in the target image is determined, but also the object type information of the object to which each pixel point belongs in the target image is determined, and the confidence degrees corresponding to the object type information and the object frame information respectively are determined, so that the information expression capability of the object frame is improved, and the accuracy of object positioning based on the object frame is improved.

In some embodiments, as shown in fig. 3, the determining, based on the positioning feature map, object border information of an object to which each pixel point in the target image belongs may specifically be implemented by using the following steps:

s310, aiming at one pixel point in the target image, respectively determining a target distance range in which the distance between the pixel point and each frame in the object frame of the object to which the pixel point belongs is located based on the positioning feature map.

Here, the positioning feature map may be subjected to image feature extraction by using a convolutional neural network or a convolutional layer, so as to determine a target distance range in which a distance between a pixel point and each of object frames of an object to which the pixel point belongs is located.

In specific implementation, the maximum distance between a pixel point and a certain border can be determined based on the positioning feature map; then, carrying out segmentation processing on the maximum distance to obtain a plurality of distance ranges; extracting image features of the positioning feature map by using a convolutional neural network or a convolutional layer to determine a first probability value that the distance between the pixel point and the frame of the edge is within each distance range; finally, based on the determined first probability value, a target distance range in which the distance between the pixel point and the frame is located is selected from the plurality of distance ranges. Specifically, the distance range corresponding to the maximum first probability value may be set as the target distance range.

As shown in fig. 2, the object frame may include an upper frame, a lower frame, a left frame and a right frame, five first probability values a, b, c, d, e corresponding to the left frame and five distance ranges are determined based on the above method, and the distance range corresponding to the largest first probability value b is selected as the target distance range.

In the above, the distance range corresponding to the maximum probability value is selected as the target distance range in which the distance between the pixel point and the frame is located, so that the accuracy of the determined target distance range is improved, and the accuracy of the distance between the pixel point determined based on the target distance range and a certain frame is improved.

S320, respectively determining the target distance between the pixel point and each frame in the object frames of the object to which the pixel point belongs based on the target distance range and the positioning feature map.

After the target distance range is determined, a regression network, such as a convolutional neural network, matched with the target distance range is selected, and image feature extraction is performed on the positioning feature map to obtain the target distance between the pixel point and each frame in the object frame of the object to which the pixel point belongs.

On the basis of determining the target distance range, the convolutional neural network is further utilized to determine an accurate distance, and the accuracy of the determined distance can be effectively improved.

In addition, as shown in fig. 2, after the target distance is determined, the determined target distance may be corrected by using a preset or trained parameter or weight N to obtain a final target distance.

As shown in fig. 2, the exact target distance between the pixel point and the left frame is determined by this step, and the target distance is labeled in fig. 2 and denoted by f. As shown in fig. 2, the determined target distance is within the determined target distance range.

S330, determining the object frame information of the object to which the pixel point belongs based on the position information of the pixel point in the target image and the target distance between the pixel point and each frame.

The position information of each frame in the object frame corresponding to the object frame information in the target image can be determined by using the position information of the pixel point in the target image and the target distance between the pixel point and each frame. And finally, the position information of each frame in the target image can be used as the object frame information of the object to which the pixel point belongs.

In the above embodiment, the target distance range where the distance between the pixel point and each frame in the object frame is located is first determined, and then the target distance between the pixel point and each frame is determined based on the determined target distance range. Then, based on the determined accurate target distance, an object frame with an accurate position can be determined for the pixel point, and the accuracy of the determined object frame is improved.

In some embodiments, as shown in fig. 4, the selecting a target distance range from the plurality of distance ranges, in which the distance between the pixel point and a bounding box is located, based on the determined first probability value may be further implemented by:

s410, determining the uncertain distance parameter value between the pixel point and a certain frame based on the positioning feature map.

Here, a convolutional neural network determining a first probability value may be utilized to determine a distance uncertainty parameter value of a pixel point from a bounding box while determining the first probability value that the distance of the pixel point from the bounding box lies within each distance range. The distance uncertainty parameter values here can be used to characterize the confidence level of the determined respective first probability.

And S420, determining a target probability value that the distance between the pixel point and the bounding box is within each distance range based on the distance uncertainty parameter value and each first probability value.

Here, each first probability value is corrected using the distance uncertainty parameter value to obtain a corresponding target probability value.

In particular implementation, the target probability value may be determined using the following formula:

in the formula, p_x,nRepresenting the probability value of the object that the distance between the pixel point and the frame x is within the nth distance range, N representing the number of the distance ranges, σ_xRepresenting the value of the distance uncertainty parameter, s, corresponding to the frame x_x,nA first probability value representing that the distance of the pixel point from the frame x is within an nth distance range; s_x,mA first probability value representing that the distance of the pixel point from the frame x lies within the mth distance range.

And S430, based on the determined target probability value, selecting a target distance range in which the distance between the pixel point and the frame is located from the distance ranges.

Here, the distance range corresponding to the maximum target probability value may be specifically selected as the target distance range.

In the above embodiment, while the first probability value that the distance between the pixel point and the certain bounding box is within each distance range is determined, an uncertain parameter value is also determined, the first probability value can be corrected and corrected based on the parameter value, a target probability value that the distance between the pixel point and the certain bounding box is within each distance range is obtained, and the accuracy of the probability value that the distance between the determined pixel point and the certain bounding box is within each distance range is improved, thereby being beneficial to improving the accuracy of the target distance range determined based on the probability value.

After determining the target distance between the pixel point and each frame in the corresponding object frame, the confidence level of the corresponding object frame information, i.e. the second confidence level, may be determined by using the following steps:

In a specific implementation, an average of first probability values corresponding to target distance ranges corresponding to all borders in an object border of an object to which the pixel point belongs may be used as the second confidence.

Of course, other methods may be used to determine the second confidence level, and the disclosure is not limited to the method of determining the second confidence level based on the first probability value corresponding to the target distance range.

In the above embodiment, the confidence of the object frame information of the object to which the pixel point belongs, that is, the second confidence, can be determined by using the maximum first probability value corresponding to the distance between the pixel point and each frame, so that the information expression capability of the object frame is enhanced.

In some embodiments, the determining the object type information of the object to which each pixel point in the target image belongs based on the classification feature map may specifically be implemented by using the following steps:

determining a second probability value of each preset object type of an object to which each pixel point in the target image belongs based on the classification feature map; and determining the object type information of the object to which the pixel point belongs based on the preset object type corresponding to the maximum second probability value.

In specific implementation, the convolutional neural network or convolutional layer can be used to extract the image features of the classification feature map, so as to obtain a second probability value that the object to which the pixel point belongs is of each preset object type. And then, selecting a preset object type corresponding to the maximum second probability value to determine the object type information of the object to which the pixel point belongs. As shown in fig. 2, the second probability value corresponding to the cat determined by the present embodiment is the largest, and therefore it is determined that the object type information corresponds to the cat.

In some embodiments, as shown in fig. 5, the determining the positioning information of the object in the target image based on the object border information of the object to which each pixel belongs and the target confidence of the object border information may specifically be implemented by using the following steps:

s510, screening a plurality of target pixel points from the target image; and the distance between different target pixel points in the target image is smaller than a preset threshold, and the object type information of the objects to which the different target pixel points belong is the same.

Here, the plurality of target pixel points obtained by screening are pixel points on the same object.

S520, selecting object frame information corresponding to the highest object confidence degree from the object frame information of the object to which each target pixel point belongs to obtain the target frame information.

For the pixel points on the same object, the object frame information corresponding to the highest object confidence coefficient can be selected to position the object, and other object frame information with lower object confidence coefficient can be removed, so that the calculation amount in the object positioning process is reduced.

S530, determining the positioning information of the object in the target image based on the selected target frame information and the target confidence corresponding to the target frame information.

Corresponding to the above positioning method, the embodiment of the present disclosure further provides a positioning apparatus, where the apparatus is used on a terminal device for positioning an object in an image, and the apparatus and each module thereof can perform the same method steps as the above positioning method and can achieve the same or similar beneficial effects, so repeated parts are not described again.

As shown in fig. 6, the present disclosure provides a positioning device including:

and an image obtaining module 610, configured to obtain a target image.

An image processing module 620, configured to determine, based on the image feature map of the target image, object type information of an object to which each pixel point belongs, object border information of the object to which each pixel point belongs, a first confidence corresponding to the object type information, and a second confidence corresponding to the object border information in the target image.

A confidence processing module 630, configured to determine, based on the first confidence and the second confidence, a target confidence of the object border information of the object to which each pixel belongs, respectively.

The positioning module 640 is configured to determine positioning information of an object in the target image based on object border information of an object to which each pixel belongs and a target confidence of the object border information.

In some embodiments, the image feature map includes a classification feature map for classifying objects to which pixel points in the target image belong and a localization feature map for localizing objects to which pixel points in the target image belong;

the image processing module 620 is configured to:

In some embodiments, the image processing module 620, when determining the object border information of the object to which each pixel point in the target image belongs based on the positioning feature map, is configured to:

In some embodiments, the image processing module 620 is configured to, when determining a target distance range in which a distance between a pixel point and each of object borders of the object of the pixel point is located:

In some embodiments, the image processing module 620, when selecting the target distance range from the plurality of distance ranges in which the distance between the pixel point and the bounding box is located, based on the determined first probability value, is configured to:

In some embodiments, the image processing module 620, when determining the second confidence level corresponding to the object bounding box information, is configured to:

In some embodiments, the image processing module 620, when determining the object type information of the object to which each pixel point in the target image belongs based on the classification feature map, is configured to:

In some embodiments, the positioning module 640 is configured to:

An embodiment of the present disclosure discloses an electronic device, as shown in fig. 7, including: a processor 701, a memory 702, and a bus 703, the memory 702 storing machine-readable instructions executable by the processor 701, the processor 701 and the memory 702 communicating via the bus 703 when the electronic device is operating.

The machine readable instructions, when executed by the processor 701, perform the steps of the following positioning method:

acquiring a target image;

In addition, when the processor 701 executes the machine readable instructions, the method contents in any embodiment described in the above method part may also be executed, which is not described herein again.

A computer program product corresponding to the method and the apparatus provided in the embodiments of the present disclosure includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiments, and specific implementation may refer to the method embodiments, which is not described herein again.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to one another, which are not repeated herein for brevity.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to corresponding processes in the method embodiments, and are not described in detail in this disclosure. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and there may be other divisions in actual implementation, and for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or modules through some communication interfaces, and may be in an electrical, mechanical or other form.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above are only specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present disclosure, and shall be covered by the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of positioning, comprising:

acquiring a target image;

2. The method according to claim 1, wherein the image feature map comprises a classification feature map for classifying the objects to which the pixels belong in the target image and a localization feature map for localizing the objects to which the pixels belong in the target image;

3. The method according to claim 2, wherein the determining, based on the localization feature map, object bounding box information of an object to which each pixel point in the target image belongs includes:

4. The method of claim 3, wherein determining a target distance range within which a distance between a pixel point and each of object borders of the object includes:

5. The method according to claim 4, wherein selecting a target distance range from the plurality of distance ranges in which the distance between the pixel point and the bounding box is located based on the determined first probability value comprises:

6. The method according to claim 4, wherein determining the second confidence level corresponding to the object bounding box information comprises:

7. The method according to any one of claims 2 to 6, wherein the determining object type information of the object to which each pixel point in the target image belongs based on the classification feature map comprises:

8. The method according to any one of claims 1 to 7, wherein the determining the positioning information of the object in the target image based on the object border information of the object to which each pixel point belongs and the target confidence of the object border information includes:

9. A positioning device, comprising:

the image acquisition module is used for acquiring a target image;

10. The positioning apparatus according to claim 9, wherein the image feature map includes a classification feature map for classifying the objects to which the pixels belong in the target image and a positioning feature map for positioning the objects to which the pixels belong in the target image;

the image processing module is configured to:

11. The positioning apparatus according to claim 10, wherein the image processing module, when determining the object border information of the object to which each pixel point in the target image belongs based on the positioning feature map, is configured to:

12. The positioning apparatus of claim 11, wherein the image processing module, when determining a target distance range in which a distance between a pixel point and each of object borders of the object of the pixel point is located, is configured to:

13. The positioning apparatus according to claim 12, wherein the image processing module, when selecting the target distance range from the plurality of distance ranges in which the distance between the pixel point and the bounding box is located based on the determined first probability value, is configured to:

14. The positioning apparatus according to claim 12, wherein the image processing module, when determining the second confidence level corresponding to the object border information, is configured to:

15. The positioning apparatus according to any one of claims 10 to 14, wherein the image processing module, when determining the object type information of the object to which each pixel point in the target image belongs based on the classification feature map, is configured to:

16. The positioning device according to any one of claims 9 to 15, wherein the positioning module is configured to:

17. An electronic device, comprising: the positioning device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when an electronic device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the positioning method according to any one of claims 1 to 8.

18. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the positioning method according to any one of claims 1 to 8.