CN111275040B

CN111275040B - Positioning method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN111275040B
Application number: CN202010058788.7A
Authority: CN
Inventors: 战赓; 欧阳万里
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2020-01-18
Filing date: 2020-01-18
Publication date: 2023-07-25
Anticipated expiration: 2040-01-18
Also published as: KR20220093187A; WO2021143865A1; CN111275040A; JP2022540101A

Abstract

The disclosure provides a positioning method and device, electronic equipment and a computer readable storage medium, and the method determines an object anchor frame for each pixel point in a target image based on an image feature map of the target image, namely an object frame corresponding to object frame information, wherein mutual exclusivity exists between the anchor frame and a corresponding predicted anchor frame, so that the number of the object anchor frames used in the object positioning process is reduced, and the calculated amount is reduced. Meanwhile, the object type information, the confidence coefficient corresponding to the object frame information and the confidence coefficient corresponding to the object type information of the object to which each pixel point belongs in the target image can be determined based on the image feature map of the target image, and then the final confidence coefficient corresponding to the object frame information is determined based on the determined two confidence coefficients, so that the information expression capability of the object frame is effectively enhanced, and the accuracy of object positioning based on the object frame is improved.

Description

Positioning method and device, electronic equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of computer technology and image processing, and in particular, to a positioning method and apparatus, an electronic device, and a computer readable storage medium.

Background

Object detection or object positioning is an important basic technology in computer vision, and is particularly applied to scenes such as instance segmentation, object tracking, pedestrian recognition, face recognition and the like.

Object detection or object localization is often accomplished with object anchor boxes. However, there are drawbacks in that the number of object anchor frames used for positioning an object is large, and the positioning is inaccurate due to a large number of object anchor frames, a weak expression ability of the object anchor frames, and the like.

Disclosure of Invention

Accordingly, the present disclosure provides at least one positioning method and apparatus.

In a first aspect, the present disclosure provides a positioning method, including:

acquiring a target image;

determining object type information of an object to which each pixel point belongs in the target image, object frame information of the object to which each pixel point belongs, first confidence corresponding to the object type information and second confidence corresponding to the object frame information based on an image feature map of the target image;

based on the first confidence coefficient and the second confidence coefficient, respectively determining the target confidence coefficient of the object frame information of the object to which each pixel point belongs;

and determining the positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information.

In the above embodiment, only one object anchor frame, that is, the object frame corresponding to the object frame information, can be determined for each pixel point in the target image based on the image feature map of the target image, so that the number of the object anchor frames used in the object positioning process is reduced, the calculation amount is reduced, and the object positioning efficiency is improved. Meanwhile, the object type information, the confidence coefficient corresponding to the object frame information and the confidence coefficient corresponding to the object type information of the object to which each pixel point belongs in the target image can be determined based on the image feature map of the target image, and then the final confidence coefficient corresponding to the object frame information is determined based on the determined two confidence coefficients, so that the information expression capability of the object frame or the object frame information is effectively enhanced, the positioning information and the object type information of the object frame corresponding to the object frame information can be expressed, the confidence coefficient information of the object frame information can be expressed, and the accuracy of object positioning based on the object frame is improved.

In a possible implementation manner, the image feature map includes a classification feature map for classifying objects to which the pixel points in the target image belong and a positioning feature map for positioning objects to which the pixel points in the target image belong;

The determining, based on the image feature map of the target image, object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, a first confidence coefficient corresponding to the object type information, and a second confidence coefficient corresponding to the object frame information in the target image includes:

determining object type information of an object to which each pixel point in the target image belongs and a first confidence corresponding to the object type information based on the classification feature map;

and determining object frame information of an object to which each pixel point in the target image belongs and a second confidence corresponding to the object frame information based on the positioning feature map.

According to the embodiment, based on the classification feature map and the positioning feature map of the target image, not only the object frame information of the object to which each pixel point in the target image belongs is determined, but also the object type information of the object to which each pixel point in the target image belongs and the confidence level respectively corresponding to the object type information and the object frame information are determined, so that the information expression capability of the object frame is improved, and the accuracy of object positioning based on the object frame is improved.

In a possible implementation manner, the determining, based on the positioning feature map, object frame information of an object to which each pixel point in the target image belongs includes:

for a pixel point in the target image, respectively determining a target distance range in which the distance between the pixel point and each frame in object frames of an object to which the pixel point belongs is positioned based on the positioning feature map;

based on the target distance range and the positioning feature map, determining the target distance between the pixel point and each frame in object frames of the object to which the pixel point belongs respectively;

and determining object frame information of an object to which the pixel point belongs based on the position information of the pixel point in the target image and the target distance between the pixel point and each frame.

According to the embodiment, the target distance range of the pixel point and each frame in the object frames of the object to which the pixel point belongs is determined, and then the target distance of the pixel point and each frame is determined based on the determined target distance range, so that the accuracy of the determined target distance can be improved through the two steps of processing. Then, based on the determined accurate target distance, an accurate object frame can be determined for the pixel point, and the accuracy of the determined object frame is improved.

In one possible implementation manner, determining a target distance range in which a pixel is located from each of object borders of the object, includes:

determining the maximum distance between a pixel point and one frame in the object frames of the object to which the pixel point belongs in the target image based on the positioning feature map;

carrying out segmentation processing on the maximum distance to obtain a plurality of distance ranges;

determining a first probability value that the distance between the pixel point and the frame is in each distance range based on the positioning feature map;

and selecting a target distance range in which the distance between the pixel point and the frame is positioned from the plurality of distance ranges based on the determined first probability value.

According to the embodiment, the distance range corresponding to the maximum probability value can be selected as the target distance range where the distance between the pixel point and a certain frame is located, and the accuracy of the determined target distance range is improved, so that the accuracy of the distance between the pixel point determined based on the target distance range and the certain frame is improved.

In one possible implementation manner, the selecting, based on the determined first probability value, a target distance range in which the distance between the pixel point and the frame is located from the plurality of distance ranges includes:

Determining a distance uncertainty parameter value between the pixel point and the frame based on the positioning feature map;

determining a target probability value of the distance between the pixel point and the frame in each distance range based on the distance uncertain parameter value and each first probability value;

and taking the distance range corresponding to the maximum target probability value as the target distance range where the distance between the pixel point and the frame is located.

According to the embodiment, the first probability value of the distance between the pixel point and a certain frame in each distance range is determined, and meanwhile, the uncertain parameter value is also determined, the first probability can be corrected and corrected based on the uncertain parameter value, the target probability value of the distance between the pixel point and the certain frame in each distance range is obtained, and the accuracy of the probability value of the distance between the determined pixel point and the certain frame in each distance range is improved, so that the accuracy of the target distance range determined based on the probability value is improved.

In a possible implementation manner, determining the second confidence corresponding to the object border information includes:

and determining a second confidence coefficient corresponding to the object frame information of the object to which the pixel point belongs based on a first probability value corresponding to a target distance range in which the distance between the pixel point in the target image and each frame in the object frames of the object to which the pixel point belongs is located.

According to the embodiment, the confidence of the object frame information of the object to which the pixel point belongs can be determined by using the maximum first probability value of the pixel point corresponding to the distance of each frame, so that the information expression capability of the object frame is enhanced.

In a possible implementation manner, the determining, based on the classification feature map, object type information of an object to which each pixel point in the target image belongs includes:

determining a second probability value of each preset object type of an object to which each pixel point in the target image belongs based on the classification feature map;

and determining object type information of the object to which the pixel point belongs based on the preset object type corresponding to the maximum second probability value.

According to the embodiment, the preset object type corresponding to the maximum probability value is selected as the object type information of the object to which the pixel point belongs, so that the accuracy of the determined object type information is improved.

In a possible implementation manner, the determining positioning information of the object in the target image based on the object border information of the object to which each pixel point belongs and the target confidence of the object border information includes:

screening a plurality of target pixel points from the target image; the distance between different target pixel points in the target image is smaller than a preset threshold value, and the object type information of the objects to which the different target pixel points belong is the same;

Selecting object frame information corresponding to the highest target confidence coefficient from object frame information of objects to which each target pixel point belongs, and obtaining target frame information;

and determining the positioning information of the object in the target image based on the selected target frame information and the target confidence corresponding to the target frame information.

According to the embodiment, the object frame information with the highest target confidence is selected from the pixel points which are relatively close and have the same object type information, so that the object is positioned, the number of the object frame information for positioning the object can be effectively reduced, and the timeliness of the object positioning is improved.

In a second aspect, the present disclosure provides a positioning device comprising:

the image acquisition module is used for acquiring a target image;

the image processing module is used for determining object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, first confidence corresponding to the object type information and second confidence corresponding to the object frame information in the target image based on the image feature map of the target image;

the confidence coefficient processing module is used for respectively determining the target confidence coefficient of the object frame information of the object to which each pixel point belongs based on the first confidence coefficient and the second confidence coefficient;

And the positioning module is used for determining the positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information.

the image processing module is used for:

In a possible implementation manner, the image processing module is configured to, when determining, based on the positioning feature map, object frame information of an object to which each pixel point in the target image belongs:

In one possible implementation manner, the image processing module is configured to, when determining a target distance range in which a pixel is located from each of object borders of the object, determine that the distance between the pixel and each of the object borders is located:

In one possible implementation manner, the image processing module is configured to, when selecting, from the plurality of distance ranges, a target distance range in which a distance between the pixel point and the frame is located based on the determined first probability value:

In one possible implementation manner, the image processing module is configured to, when determining the second confidence level corresponding to the object border information:

In a possible implementation manner, the image processing module is configured to, when determining, based on the classification feature map, object type information of an object to which each pixel point in the target image belongs:

In one possible embodiment, the positioning module is configured to:

In a third aspect, the present disclosure provides an electronic device comprising: a processor, a memory and a bus, said memory storing machine readable instructions executable by said processor, said processor and said memory communicating over the bus when the electronic device is running, said machine readable instructions when executed by said processor performing the steps of the positioning method as described above.

In a fourth aspect, the present disclosure also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the positioning method as described above.

The apparatus, the electronic device, and the computer-readable storage medium of the present disclosure at least include technical features substantially the same as or similar to technical features of any aspect of the method or any implementation of any aspect of the present disclosure, so for an effect description of the apparatus, the electronic device, and the computer-readable storage medium, reference may be made to an effect description of the content of the method, which is not repeated herein.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present disclosure and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 illustrates a flow chart of a positioning method provided by an embodiment of the present disclosure;

FIG. 2 illustrates a flow chart of another positioning method provided by an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating determining object frame information of an object to which each pixel point in a target image belongs based on a positioning feature map in another positioning method according to an embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a method for selecting a target distance range in which a distance between a pixel point and a certain frame is located from a plurality of distance ranges according to a determined first probability value in a positioning method according to an embodiment of the present disclosure;

fig. 5 shows a flowchart of determining positioning information of an object in a target image based on object frame information of an object to which each pixel point belongs and a target confidence of the object frame information in another positioning method according to an embodiment of the present disclosure;

FIG. 6 shows a schematic structural diagram of a positioning device according to an embodiment of the present disclosure;

fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be clearly and completely described with reference to the accompanying drawings in the embodiments of the present disclosure, it should be understood that the drawings in the present disclosure are for the purpose of illustration and description only, and are not intended to limit the scope of protection of the present disclosure. In addition, it should be understood that the schematic drawings are not drawn to scale. A flowchart, as used in this disclosure, illustrates operations implemented according to some embodiments of the present disclosure. It should be understood that the operations of the flow diagrams may be implemented out of order and that steps without logical context may be performed in reverse order or concurrently. Moreover, one or more other operations may be added to or removed from the flow diagrams by those skilled in the art in light of the present disclosure.

In addition, the described embodiments are only some, but not all, of the embodiments of the present disclosure. The components of the embodiments of the present disclosure, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.

It should be noted that the term "comprising" will be used in embodiments of the present disclosure to indicate the presence of the stated features hereinafter, but not to preclude the addition of further features.

Aiming at how to reduce the number of object anchor frames used for positioning in the process of positioning the objects by utilizing the object anchor frames, the information expression capacity of the object anchor frames is improved so as to improve the accuracy of object positioning, the disclosure provides a positioning method and device, electronic equipment and a computer readable storage medium. According to the method and the device, only one object anchor frame, namely the object frame corresponding to the object frame information, is determined for each pixel point in the target image based on the image feature map of the target image, so that the number of the object anchor frames used in the object positioning process is reduced, and the calculated amount is reduced. Meanwhile, the object type information, the confidence coefficient corresponding to the object frame information and the confidence coefficient corresponding to the object type information of the object to which each pixel point belongs in the target image can be determined based on the image feature map of the target image, and then the final confidence coefficient corresponding to the object frame information is determined based on the determined two confidence coefficients, so that the information expression capability of the object frame is effectively enhanced, and the accuracy of object positioning based on the object frame is improved.

The positioning method and apparatus, the electronic device, and the computer-readable storage medium of the present disclosure are described below with reference to specific embodiments.

The embodiment of the disclosure provides a positioning method which is applied to terminal equipment for positioning an object in an image. Specifically, as shown in fig. 1, the positioning method provided in the embodiment of the present disclosure includes the following steps:

s110, acquiring a target image.

Here, the target image may be an image including a target object captured during object tracking, or an image including a face captured during face detection, and the use of the target image is not limited in the present disclosure.

The target image comprises at least one object to be positioned. The object may be an object, or a person, an animal, or the like.

The target image may be captured by the terminal device that performs the positioning method of the present embodiment, or may be transmitted to the terminal device that performs the positioning method of the present embodiment after being captured by another device, and the capturing device of the target image is not limited in this disclosure.

S120, determining object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, first confidence corresponding to the object type information and second confidence corresponding to the object frame information in the target image based on the image feature map of the target image.

Before executing this step, the target image needs to be processed first to obtain an image feature map corresponding to the target image. In the implementation, the convolutional neural network can be utilized to extract the image characteristics of the target image, so as to obtain an image characteristic diagram.

After the image feature map of the target image is determined, the image feature map is processed, and object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, first confidence corresponding to the object type information and second confidence corresponding to the object frame information in the target image can be determined. In the implementation, the convolutional neural network may be used to further extract image features from the image feature map to obtain the object type information, the object frame information, the first confidence coefficient and the second confidence coefficient.

The object type information includes an object type of an object to which the pixel belongs. The object frame information includes a distance between a pixel point and each frame in the object frame corresponding to the object frame information. The object frame may also be referred to as an object anchor frame.

The first confidence level is used for representing the accuracy degree or the credibility degree of the object type information determined based on the image feature map. The second confidence is used for representing the accuracy degree or the credibility degree of the object frame information determined based on the image feature map.

S130, based on the first confidence coefficient and the second confidence coefficient, respectively determining the target confidence coefficient of the object frame information of the object to which each pixel point belongs.

Here, specifically, a product of the first confidence and the second confidence may be used as the target confidence corresponding to the object frame information. The target confidence is used for comprehensively representing the positioning accuracy and the classification accuracy of the object frames corresponding to the object frame information.

Of course, other methods may be used to determine the target confidence, for example, the target confidence may be determined in combination with the preset weight of the first confidence, the preset weight of the second confidence, the first confidence, and the second confidence, and the specific implementation of determining the target confidence based on the first confidence and the second confidence is not limited in this disclosure.

And S140, determining positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information.

Here, the object frame information of the object to which the pixel point belongs and the target confidence of the object frame information may be used as the positioning information of the object to which the pixel point belongs in the target image, and then the positioning information of each object in the target image is determined based on the positioning information of the object to which each pixel point belongs in the target image.

Here, not only the object frame information of the object to which the pixel point belongs is determined, but also the target confidence coefficient of the object frame information is determined, so that the information expression capability of the object frame or the object frame information is effectively enhanced, not only the positioning information and the object type information of the object frame corresponding to the object frame information, but also the confidence coefficient information of the object frame information can be expressed, and therefore the accuracy of object positioning based on the object frame is improved.

In addition, the above embodiment can determine an object anchor frame, that is, an object frame corresponding to the object frame information, for each pixel point in the target image based on the image feature map of the target image, so that the number of the object anchor frames used in the object positioning process is reduced, the calculation amount is reduced, and the object positioning efficiency is improved.

In some examples, as shown in fig. 2, the image feature map includes a classification feature map for classifying an object to which a pixel in the target image belongs and a positioning feature map for positioning the object to which the pixel in the target image belongs.

In the implementation, as shown in fig. 2, the image feature extraction may be performed on the target image by using a convolutional neural network to obtain an initial feature map, and then the initial feature map is processed by using 4 convolutional layers with 3×3 and 256 input/output layers, respectively, to obtain the classification feature map and the positioning feature map.

After obtaining the classification feature map and the positioning feature map, determining object type information of an object to which each pixel point belongs, object frame information of the object to which each pixel point belongs, a first confidence coefficient corresponding to the object type information, and a second confidence coefficient corresponding to the object frame information in the target image based on the image feature map of the target image may be specifically implemented by using the following steps:

determining object type information of an object to which each pixel point in the target image belongs and a first confidence corresponding to the object type information based on the classification feature map; and determining object frame information of an object to which each pixel point in the target image belongs and a second confidence corresponding to the object frame information based on the positioning feature map.

In the implementation, image feature extraction can be performed on the classification feature map by using a convolutional neural network or a convolutional layer to obtain object type information of an object to which each pixel point belongs and a first confidence corresponding to the object type information. And extracting image features of the positioning feature map by using a convolutional neural network or a convolutional layer to obtain object frame information of an object to which each pixel point belongs and a second confidence coefficient corresponding to the object frame information.

In some embodiments, as shown in fig. 3, the determining, based on the positioning feature map, object frame information of an object to which each pixel point in the target image belongs may be implemented specifically by using the following steps:

s310, respectively determining a target distance range where the distance between a pixel point and each frame in object frames of the object to which the pixel point belongs is located according to the positioning feature map for the pixel point in the target image.

Here, the image feature extraction may be performed on the positioning feature map by using a convolutional neural network or a convolutional layer, so as to determine a target distance range in which a distance between a pixel point and each of object borders of an object to which the pixel point belongs is located.

In the implementation, the maximum distance between the pixel point and a certain frame can be determined firstly based on the positioning feature map; then, carrying out segmentation processing on the maximum distance to obtain a plurality of distance ranges; image feature extraction is carried out on the positioning feature map by utilizing a convolutional neural network or a convolutional layer so as to determine a first probability value that the distance between the pixel point and the frame is in each distance range; and finally, selecting a target distance range in which the distance between the pixel point and the frame is positioned from the plurality of distance ranges based on the determined first probability value. Specifically, the distance range corresponding to the maximum first probability value may be set as the target distance range.

As shown in fig. 2, the object frame may include an upper frame, a lower frame, a left frame and a right frame, five first probability values a, b, c, d, e corresponding to the five distance ranges of the left frame are determined based on the above method, and a distance range corresponding to the largest first probability value b is selected as the target distance range.

The distance range corresponding to the maximum probability value is selected as the target distance range where the distance between the pixel point and the frame is located, so that the accuracy of the determined target distance range is improved, and the accuracy of the distance between the pixel point determined based on the target distance range and a certain frame is improved.

S320, based on the target distance range and the positioning feature map, determining the target distance between the pixel point and each frame in object frames of the object to which the pixel point belongs.

After the target distance range is determined, selecting a regression network matched with the target distance range, such as a convolutional neural network, and extracting image features of the positioning feature map to obtain the target distance between the pixel point and each frame in object frames of the object to which the pixel point belongs.

On the basis of determining the target distance range, the convolutional neural network is further utilized to determine an accurate distance, and the accuracy of the determined distance can be effectively improved.

In addition, as shown in fig. 2, after the target distance is determined, a preset or trained parameter or weight N may be used to correct the determined target distance to obtain a final target distance.

As shown in fig. 2, the accurate target distance between the pixel point and the left frame is determined by this step, and the target distance is denoted by f in fig. 2. As shown in fig. 2, the determined target distance is within the determined target distance range.

S330, determining object frame information of an object to which the pixel point belongs based on the position information of the pixel point in the target image and the target distance between the pixel point and each frame.

The position information of each frame in the target image in the object frames corresponding to the object frame information can be determined by using the position information of the pixel point in the target image and the target distance between the pixel point and each frame. And finally, the position information of each frame in the target image can be used as object frame information of the object to which the pixel point belongs.

In the above embodiment, the target distance range where the distance between the pixel point and each frame in the object frame is located is first determined, and then, based on the determined target distance range, the target distance between the pixel point and each frame is determined, and the accuracy of the determined target distance can be improved through the two steps of processing. Then, based on the determined accurate target distance, an accurate object frame can be determined for the pixel point, and the accuracy of the determined object frame is improved.

In some embodiments, as shown in fig. 4, the selecting, based on the determined first probability value, a target distance range in which a distance between a pixel point and a certain frame is located from the plurality of distance ranges may be further implemented by the following steps:

s410, determining the uncertain parameter value of the distance between the pixel point and a certain frame based on the positioning feature map.

Here, the convolution neural network for determining the first probability value may be used to determine the value of the uncertainty parameter of the distance between the pixel point and a certain frame while determining the first probability value that the distance between the pixel point and the certain frame is within each distance range. The distance uncertainty parameter values here may be used to characterize the confidence level of the respective first probabilities of the determination.

S420, determining a target probability value of the distance between the pixel point and the frame in each distance range based on the distance uncertain parameter value and each first probability value.

Here, each first probability value is corrected with a distance uncertainty parameter value to obtain a corresponding target probability value.

In particular implementations, the target probability value may be determined using the following formula:

wherein p is _x,n A target probability value indicating that the distance between the pixel point and the frame x is within the nth distance range, N indicating the number of distance ranges, sigma _x Representing a distance uncertainty parameter value, s, corresponding to the frame x _x,n A first probability value indicating that the distance between the pixel point and the frame x is in an nth distance range; s is(s) _x,m And a first probability value indicating that the distance between the pixel point and the frame x is in the m-th distance range.

S430, selecting a target distance range in which the distance between the pixel point and the frame is located from the plurality of distance ranges based on the determined target probability value.

Here, specifically, a distance range corresponding to the maximum target probability value may be selected as the target distance range.

According to the embodiment, the first probability value of the distance between the pixel point and a certain frame in each distance range is determined, and meanwhile, the uncertain parameter value is also determined, the first probability can be corrected and corrected based on the parameter value, the target probability value of the distance between the pixel point and the certain frame in each distance range is obtained, and the accuracy of the probability value of the distance between the determined pixel point and the certain frame in each distance range is improved, so that the accuracy of the target distance range determined based on the probability value is improved.

After determining the target distance between the pixel point and each frame in the corresponding object frame, the confidence level of the corresponding object frame information may be determined by using the following steps, namely, the second confidence level:

In the implementation, the average value of the first probability values corresponding to the target distance ranges corresponding to all frames in the object frames of the object to which the pixel points belong may be used as the second confidence.

Of course, other methods may be used to determine the second confidence, and the method of determining the second confidence based on the first probability value corresponding to the target distance range is not limited in this disclosure.

According to the embodiment, the confidence coefficient of the object frame information of the object to which the pixel point belongs, namely the second confidence coefficient, can be determined by using the maximum first probability value of the pixel point corresponding to the distance of each frame, so that the information expression capability of the object frame is enhanced.

In some embodiments, the determining, based on the classification feature map, object type information of an object to which each pixel point in the target image belongs may be specifically implemented by using the following steps:

determining a second probability value of each preset object type of an object to which each pixel point in the target image belongs based on the classification feature map; and determining object type information of the object to which the pixel point belongs based on the preset object type corresponding to the maximum second probability value.

In the implementation, the convolutional neural network or the convolutional layer can be utilized to extract the image features of the classification feature map, so as to obtain a second probability value of each preset object type of the object to which the pixel point belongs. And then, selecting a preset object type corresponding to the second maximum probability value to determine the object type information of the object to which the pixel belongs. As shown in fig. 2, the second probability value corresponding to the cat determined using the present embodiment is the largest, and thus it is determined that the object type information corresponds to the cat.

In some embodiments, as shown in fig. 5, the determining the positioning information of the object in the target image based on the object frame information of the object to which each pixel point belongs and the target confidence of the object frame information may specifically be implemented by the following steps:

s510, screening a plurality of target pixel points from the target image; the distance between different target pixel points in the target image is smaller than a preset threshold value, and the object type information of the objects to which the different target pixel points belong is the same.

Here, the plurality of target pixel points obtained by the screening are pixel points on the same object.

S520, selecting object frame information corresponding to the highest target confidence coefficient from object frame information of objects to which each target pixel point belongs, and obtaining target frame information.

For pixel points on the same object, object frame information corresponding to the highest target confidence coefficient can be selected to position the object, and other object frame information with low target confidence coefficient can be removed, so that the calculated amount in the object positioning process is reduced.

S530, determining positioning information of the object in the target image based on the selected target frame information and the target confidence corresponding to the target frame information.

Corresponding to the above positioning method, the embodiment of the present disclosure further provides a positioning device, where the device performs the same method steps as the above positioning method on the terminal device that positions the object in the image, and each module of the device can perform the same or similar beneficial effects, so repeated parts are not repeated.

As shown in fig. 6, the positioning device provided by the present disclosure includes:

an image acquisition module 610 is configured to acquire a target image.

The image processing module 620 is configured to determine, based on an image feature map of the target image, object type information of an object to which each pixel point belongs, object frame information of an object to which each pixel point belongs, a first confidence corresponding to the object type information, and a second confidence corresponding to the object frame information in the target image.

The confidence processing module 630 is configured to determine, based on the first confidence and the second confidence, a target confidence of object frame information of an object to which each pixel point belongs, respectively.

And the positioning module 640 is configured to determine positioning information of an object in the target image based on object frame information of an object to which each pixel point belongs and a target confidence of the object frame information.

In some embodiments, the image feature map includes a classification feature map for classifying objects to which pixels in the target image belong and a positioning feature map for positioning objects to which pixels in the target image belong;

the image processing module 620 is configured to:

In some embodiments, the image processing module 620 is configured to, when determining, based on the positioning feature map, object frame information of an object to which each pixel point in the target image belongs:

In some embodiments, the image processing module 620 is configured to, when determining a target distance range in which a pixel is located from each of object borders of the object of the pixel,:

In some embodiments, the image processing module 620 is configured to, when selecting, from the plurality of distance ranges, a target distance range in which the distance between the pixel point and the frame is located based on the determined first probability value:

In some embodiments, the image processing module 620, when determining the second confidence level corresponding to the object border information, is configured to:

In some embodiments, the image processing module 620 is configured to, when determining, based on the classification feature map, object type information of an object to which each pixel point in the target image belongs:

In some embodiments, the positioning module 640 is configured to:

The embodiment of the disclosure discloses an electronic device, as shown in fig. 7, including: a processor 701, a memory 702 and a bus 703, said memory 702 storing machine readable instructions executable by said processor 701, said processor 701 and said memory 702 communicating via the bus 703 when the electronic device is running.

The machine readable instructions, when executed by the processor 701, perform the steps of the positioning method of:

acquiring a target image;

In addition, when the machine-readable instructions are executed by the processor 701, the method content in any of the embodiments described in the method section above may be executed, which is not described herein.

The embodiment of the disclosure also provides a computer program product corresponding to the above method and apparatus, including a computer readable storage medium storing program code, where instructions included in the program code may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the method embodiments, and will not be described in detail in this disclosure. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely a specific embodiment of the disclosure, but the protection scope of the disclosure is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the disclosure, and it should be covered in the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A positioning method, comprising:

acquiring a target image;

2. The positioning method according to claim 1, wherein the image feature map includes a classification feature map for classifying an object to which a pixel in the target image belongs and a positioning feature map for positioning an object to which a pixel in the target image belongs;

3. The positioning method according to claim 2, wherein the determining, based on the positioning feature map, object frame information of an object to which each pixel point in the target image belongs includes:

4. A positioning method according to claim 3, wherein determining a target distance range in which a pixel is located from each of object borders of the object comprises:

5. The positioning method according to claim 4, wherein selecting, based on the determined first probability value, a target distance range in which a distance between the pixel point and the frame is located from the plurality of distance ranges includes:

6. The positioning method according to claim 4, wherein determining the second confidence level corresponding to the object frame information comprises:

7. The positioning method according to any one of claims 2 to 6, wherein determining object type information of an object to which each pixel in the target image belongs based on the classification feature map includes:

8. The positioning method according to any one of claims 1 to 7, wherein the determining positioning information of the object in the target image based on object frame information of the object to which each pixel belongs and a target confidence of the object frame information includes:

9. A positioning device, comprising:

the image acquisition module is used for acquiring a target image;

10. The positioning device according to claim 9, wherein the image feature map includes a classification feature map for classifying an object to which a pixel in the target image belongs and a positioning feature map for positioning an object to which a pixel in the target image belongs;

the image processing module is used for:

11. The positioning device according to claim 10, wherein the image processing module, when determining object frame information of an object to which each pixel point in the target image belongs based on the positioning feature map, is configured to:

12. The positioning device of claim 11, wherein the image processing module, when determining a target distance range in which a pixel is located from each of the object borders of the object, is configured to:

13. The positioning device of claim 12, wherein the image processing module, when selecting, from the plurality of distance ranges, a target distance range in which the distance between the pixel point and the frame is located based on the determined first probability value, is configured to:

14. The positioning device of claim 12, wherein the image processing module, when determining the second confidence level corresponding to the object border information, is configured to:

15. The positioning device according to any one of claims 10 to 14, wherein the image processing module, when determining object type information of an object to which each pixel point in the target image belongs based on the classification feature map, is configured to:

16. The positioning device of any one of claims 9 to 15, wherein the positioning module is configured to:

17. An electronic device, comprising: a processor, a storage medium and a bus, the storage medium storing machine-readable instructions executable by the processor, the processor and the storage medium communicating over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the positioning method of any of claims 1-8.

18. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the positioning method according to any of claims 1-8.