WO2021147563A1

WO2021147563A1 - Object detection method and apparatus, electronic device, and computer readable storage medium

Info

Publication number: WO2021147563A1
Application number: PCT/CN2020/135967
Authority: WO
Inventors: 王飞; 钱晨
Original assignee: 上海商汤临港智能科技有限公司
Priority date: 2020-01-22
Filing date: 2020-12-11
Publication date: 2021-07-29
Also published as: CN111242088A; JP2022526548A; CN111242088B; KR20210129189A

Abstract

An object detection method and apparatus, an electronic device, and a computer readable storage medium. The object detection method comprises: acquiring an image to be inspected (S201); on the basis of said image, determining corner position information of each corner in said image and a centripetal shift tensor corresponding to each corner, the corner representing the position of a target object in said image (S202); and on the basis of the corner position information of each corner in said image and the centripetal shift tensor corresponding to each corner, determining the target object in said image (S203).

Description

Target detection method, device, electronic equipment and computer readable storage medium

Cross-references to related applications

This application is filed based on a Chinese patent application with an application number of 202010073142.6 and an application date of January 22, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

Technical field

The embodiments of the present disclosure relate to the field of image recognition technology, and in particular, to a target detection method, device, electronic equipment, and computer-readable storage medium.

Background technique

Target detection is an important basic problem of computer vision. Many computer vision applications rely on target detection, such as autonomous driving, video surveillance, and mobile entertainment.

In the case of target detection, the main task is to use the detection frame to mark the location of the object in the image. This process can determine the location of the object in the image based on the target detection algorithm of the key point of the object, and determine all the key points of the object in the image Then, the key points of the objects belonging to the same object are matched to obtain the detection frame of the object.

In the case that the image contains multiple objects with similar appearance, the matching degree between the key points of the objects corresponding to the similar objects is high, which is easy to cause wrong detection results. For example, the detection result is that the same detection frame contains Multiple objects, therefore, the detection accuracy of current target detection methods is low.

Summary of the invention

The embodiments of the present disclosure provide at least one target detection solution.

In the first aspect, embodiments of the present disclosure provide a target detection method, including:

Obtain the image to be detected;

Based on the image to be detected, determine the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, and the corner point represents the position of the target object in the image to be detected ；

The target object in the image to be detected is determined based on the corner position information of each corner point in the image to be detected and the heart offset tensor corresponding to each corner point.

In the method provided by the embodiments of the present disclosure, after acquiring the image to be detected, first determine the corner position information of the corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, because the corner point refers to the image in the image The main feature points of the corner points in the image to be detected can characterize the position of each target object in the image to be detected. For example, the corner points can include the upper left corner point and the lower right corner point, where the upper left corner point refers to the corresponding The intersection of the straight line of the upper contour of the target object and the straight line corresponding to the left contour of the target object. The lower right corner point refers to the intersection of the straight line corresponding to the lower contour of the target object and the straight line corresponding to the right contour of the target object. In the case where the lower corner points belong to the detection frame of the same target object, the positions pointed by the centripetal offset tensors corresponding to the upper left corner points and the lower right corner points should be relatively close. Therefore, the target detection method proposed in the embodiments of the present disclosure is based on The corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point can determine the corner point belonging to the same target object, and then the same target can be detected based on the determined corner point Object.

In a possible implementation manner, the determining, based on the to-be-detected image, the corner position information of each corner point in the to-be-detected image and the centripetal offset tensor corresponding to each corner point includes:

Performing feature extraction on the image to be detected to obtain an initial feature map corresponding to the image to be detected;

Performing corner pooling processing on the initial feature map to obtain a feature map after corner pooling;

Based on the feature map after the corner point pooling, the corner point position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point are determined.

The method provided by the embodiment of the present disclosure obtains an initial feature map by extracting features from the image to be detected, and performs corner pooling processing on the initial feature map, so as to obtain a convenient extraction of the corner points and the centripetal offset corresponding to the corner points. Feature map, that is, the feature map after corner point pooling.

In a possible implementation manner, the determining the corner position information of each corner point in the image to be detected based on the feature map after the corner point pooling includes:

Generating a corner heat map corresponding to the image to be detected based on the feature map after corner pooling;

Based on the corner point heat map, determine the probability value of each feature point in the corner point heat map as a corner point, and filter out the feature points based on the probability value of each feature point as a corner point corner;

Obtain the position information of the selected corner points in the corner point heat map and the local offset information corresponding to each corner point. The local offset information is used to indicate that the real physical point represented by the corresponding corner point is at Position offset information in the corner point heat map;

Based on the acquired position information of each corner point in the corner point heat map, the local offset information corresponding to each corner point, and the size ratio between the corner point heat map and the image to be detected, determine each The position information of the corner point in the image to be detected.

The embodiment of the present disclosure provides a method for determining the position information of each corner point in the image to be detected. This process introduces a corner point heat map, and determines that the corner point can be used as a corner point by the probability value of each feature point as a corner point. The feature point of the point, after the corner point is selected, the position information of the corner point in the corner point heat map is corrected to determine the corner point position information of the corner point in the image to be detected. This method can obtain accuracy The corner position information of the higher corner point facilitates subsequent detection of the position of the target object in the image to be detected based on the corner point.

In a possible implementation manner, the determining the centripetal offset tensor corresponding to each corner point based on the feature map after the corner point pooling includes:

Based on the feature map after corner point pooling, the steering offset tensor corresponding to each feature point in the corner point pooling feature map is determined, and the steering offset tensor corresponding to each feature point is represented by The offset tensor of the feature point pointing to the center point of the target object in the image to be detected;

Based on the steering offset tensor corresponding to each feature point, determine the offset domain information of the feature point; the offset domain information includes multiple initial feature points associated with the feature point respectively pointing to their corresponding offsets The offset tensor of the feature point after the shift;

Based on the corner point pooled feature map and the offset domain information of the feature points in the corner point pooled feature map, the feature data of the feature points in the corner point pooled feature map Make adjustments to obtain the adjusted feature map;

Based on the adjusted feature map, the centripetal offset tensor corresponding to each corner point is determined.

The process of determining the centripetal offset tensor provided by the embodiments of the present disclosure considers the target object information, such as introducing the steering offset tensor corresponding to the corner point, and the offset domain information of the feature point. The feature data of the feature points in the feature map are adjusted so that the feature data of the feature points in the adjusted feature map can contain richer target object information, so that a more accurate orientation corresponding to each corner point can be determined. The central offset tensor, through accurate centripetal offset tensor, can accurately obtain the position information of the center point pointed by the corner point, so as to accurately detect the position of the target object in the image to be detected.

In a possible implementation manner, the corner heat map corresponding to the image to be detected includes a corner heat map corresponding to multiple channels, and each channel of the multiple channels corresponds to a preset object category; After determining the probability value of each feature point in the corner heat map as a corner point based on the corner heat map, the detection method further includes:

For each channel of the plurality of channels, determine whether the corner point exists in the corner heat map corresponding to the channel based on the probability value of each feature point in the corner heat map corresponding to the channel as a corner point;

When the corner point exists in the corner point heat map corresponding to the channel, it is determined that the image to be detected contains the target object of the preset object category corresponding to the channel.

In the method provided by the embodiment of the present disclosure, by inputting the feature map after the corner point pooling into the corner point heat map prediction network, the corner heat map containing the preset number of channels can be obtained, and the corner heat map corresponding to each channel can be obtained. Whether there is a corner point in the image, it can be determined whether there is a target object corresponding to the channel in the image to be detected.

In a possible implementation manner, the target object in the image to be detected is determined based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point ,include:

Based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, the detection frame of the target object in the image to be detected is determined.

The method provided by the embodiments of the present disclosure can determine the detection frame of each target object based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point. The position information of the target object in the image to be detected.

In a possible implementation manner, the target object in the image to be detected is determined based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point The detection box includes:

Based on the corner point position information of each corner point in the image to be detected, filter candidate corner point pairs that can form a candidate detection frame;

Determine the position information of the center point pointed to by the corner point based on the corner position information of each corner point in the image to be detected in each candidate corner point pair and the centripetal offset tensor corresponding to the corner point;

Determine the center area information corresponding to the candidate corner point pair based on the corner point position information of each corner point in the candidate corner point pair in the image to be detected;

The detection frame of the target object is determined in the candidate detection frame based on the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair.

In the method of determining the detection frame of the target object proposed by the embodiment of the present disclosure, the corner point position information of the corner points is used to first determine the candidate corner point pairs that can constitute the candidate detection frame, and then based on each corner point in the candidate corner point pair. The corresponding centripetal offset tensor is used to determine whether the target object surrounded by the candidate detection frame is the same target object, so that the detection frame of all target objects in the image to be detected can be detected more accurately.

In a possible implementation manner, the determining the center region information corresponding to the candidate corner point pair based on the corner point position information of each corner point in the candidate corner point pair in the image to be detected includes :

Based on the corner point position information of each corner point of the candidate corner point pair, determine the corner point position information that characterizes the center area frame corresponding to the candidate corner point pair;

Based on the corner point position information of the central area frame, the coordinate range of the central area frame corresponding to the candidate corner point pair is determined.

In a possible implementation manner, based on the position information of the center point pointed to by each corner point in each candidate corner point pair, and the center area information corresponding to the candidate corner point pair, in the candidate detection frame Determining the detection frame of the target object includes:

Determine a valid candidate corner point pair based on the position information of the center point pointed to by each corner point in each candidate corner point pair, and the center area information corresponding to the candidate corner point pair;

Based on the position information of the center point pointed to by each corner point in the valid candidate corner point pair, the central area information corresponding to the valid candidate corner point pair, and the probability corresponding to each corner point in the valid candidate corner point pair Value, determine the score of the candidate detection frame corresponding to each valid candidate corner point; the probability value corresponding to each corner point is used to indicate the probability value of the corresponding feature point of the corner point in the corner heat map as the corner point ；

Based on the score of the candidate detection frame corresponding to each valid candidate corner point and the size of the overlapping area between adjacent candidate detection frames, the detection frame of the target object is determined in the candidate detection frame.

The method provided by the embodiment of the present disclosure effectively screens the candidate corner points that constitute the candidate detection frame, and determines that the candidate detection frame that only represents one target object can be screened out, and then performs detection on these candidate detection frames that only represent one target object. Soft non-maximum suppression screening, so as to obtain an accurate detection frame that characterizes the target object.

In a possible implementation manner, after the determination of the detection frame of the target object in the image to be detected, the target detection method further includes:

The instance information of the target object in the image to be detected is determined based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected.

The method provided by the embodiments of the present disclosure can determine the instance information of the target object. The instance here means that after the instance segmentation of the target object in the image, the pixel of each target object is given at the pixel level, and the instance segmentation can be accurate to the object. To obtain more accurate position information of the target object in the image to be detected.

In a possible implementation manner, the determining the instance information of the target object in the image to be detected is based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected, include:

Extracting feature data of feature points of the initial feature map in the detection frame based on the detection frame of the target object and the initial feature map;

Based on the feature data of the feature points of the initial feature map in the detection frame, the instance information of the target object in the image to be detected is determined.

In a possible implementation manner, the target detection method is implemented by a neural network, and the neural network is obtained by training using sample pictures containing labeled target sample objects.

In a possible implementation manner, the neural network is obtained by training using the following steps:

Obtain sample images;

Based on the sample image, determine the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point, and the sample corner point represents the target sample object in the sample image. Location;

Predicting the target sample object in the sample image based on the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point;

The network parameter value of the neural network is adjusted based on the predicted target sample object in the sample image and the labeled target sample object in the sample image.

The neural network training method provided by the embodiments of the present disclosure obtains a sample image, and based on the sample image, determines the position information of each sample corner point in the sample image, and the centripetal offset corresponding to each sample corner point. Based on the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point, the target sample object is detected in the sample image, because the sample corner point refers to the main Feature points, such as sample corner points may include upper left sample corner points and lower right sample corner points, where the upper left sample corner point refers to the intersection of a line corresponding to the upper contour of the target sample object and a line corresponding to the left contour of the target sample object. The lower right sample corner point refers to the intersection of the straight line corresponding to the lower contour of the target sample object and the straight line corresponding to the right contour of the target sample object. The upper left sample corner and the lower right sample corner belong to the detection frame of the same target sample object. In this case, the centripetal offset tensor corresponding to the upper left sample corner point and the lower right sample corner point should be relatively close to each other. Therefore, the neural network training method proposed in the embodiment of the present disclosure is based on characterizing the target sample object in the sample The corner position information of the corner points of the position in the image, and the centripetal offset tensor corresponding to each sample corner point, determine the sample corner points belonging to the same target sample object, and then based on the determined sample corner points can be detected The same target sample object is extracted, and then the neural network parameters are continuously adjusted based on the target object in the sample image, so as to obtain a neural network with higher accuracy. Based on the neural network with higher accuracy, the target object can be Perform accurate detection.

In a second aspect, embodiments of the present disclosure provide a target detection device, including:

The obtaining part is configured to obtain the image to be detected;

The determining part is configured to determine, based on the image to be detected, the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, where the corner points represent the image to be detected The location of the target object in

The detection part is configured to determine the target object in the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point.

In a possible implementation manner, the determining part is configured to:

In a possible implementation manner, when the determining part is configured to determine the corner position information of each corner point in the image to be detected based on the feature map after the corner point pooling, the method includes :

Based on the acquired position information of each corner point in the corner point heat map, the local offset information corresponding to each corner point, and the size ratio between the corner point heat map and the image to be detected, determine The position information of each corner point in the image to be detected.

In a possible implementation manner, when the determining part is configured to determine the centripetal offset tensor corresponding to each corner point based on the feature map after the corner point pooling, the method includes:

In a possible implementation manner, the corner heat map corresponding to the image to be detected includes a corner heat map corresponding to multiple channels, and each channel of the multiple channels corresponds to a preset object category; After the determining part is configured to determine the probability value of each feature point in the corner heat map as a corner point based on the corner heat map, it is further configured to:

In a possible implementation manner, the detection part is configured to:

In a possible implementation manner, the detection part is configured to determine the to-be-detected image based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point. In the case of detecting the detection frame of the target object in the image, it includes:

Determine the central region information corresponding to the candidate corner point pair based on the corner point position information of each corner point in the candidate corner point pair in the image to be detected;

In a possible implementation manner, the detection part is configured to determine the candidate corner point pair based on the corner point position information of each corner point of each candidate corner point pair in the image to be detected. The corresponding central area information includes:

In a possible implementation manner, the detection part is configured to be based on the position information of the center point pointed to by each corner point in each candidate corner point pair, and the center area information corresponding to the candidate corner point pair, In the case that the detection frame of the target object is determined in the candidate detection frame, the method includes:

In a possible implementation manner, the detection part is further configured to:

After the detection frame of the target object in the image to be detected is determined, the target object in the image to be detected is determined based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected Instance information.

In a possible implementation manner, the detection part is configured to determine the image in the image to be detected based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected In the case of the instance information of the target object, it includes:

In a possible implementation manner, the target detection device further includes a neural network training part, and the neural network training part is configured to:

Training a neural network for target detection, the neural network is obtained by training using sample pictures containing labeled target sample objects.

In a possible implementation manner, the neural network training part is configured to train the neural network according to the following steps:

Obtain sample images;

In a third aspect, an embodiment of the present disclosure provides an electronic device, including a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the The processor and the memory communicate through a bus, and when the machine-readable instructions are executed by the processor, the steps of the target detection method as described in the first aspect are executed.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the target detection method as described in the first aspect when the computer program is run by a processor. step.

In a fifth aspect, the embodiments of the present disclosure provide a computer program, including computer-readable code. When the computer-readable code runs in an electronic device, the processor in the electronic device executes the following On the one hand, the steps of the target detection method.

In order to make the above-mentioned features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with accompanying drawings are described in detail below.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present disclosure more clearly, the following will briefly introduce the drawings that need to be used in the embodiments. The drawings here are incorporated into the specification and constitute a part of the specification. The figure shows an embodiment conforming to the present disclosure, and is used together with the description to explain the technical solution of the present disclosure. It should be understood that the following drawings only show certain embodiments of the present disclosure, and therefore should not be regarded as limiting the scope. For those of ordinary skill in the art, they can also Obtain other related drawings based on these drawings.

Figure 1 shows a schematic diagram of a result obtained when detecting an image to be detected;

Fig. 2 shows a flow chart of an exemplary target detection method provided by an embodiment of the present disclosure;

FIG. 3 shows a flowchart of a process for determining the position information of corner points and the centripetal offset tensor corresponding to the corner points provided by an embodiment of the present disclosure;

FIG. 4 shows a flowchart for determining the position information of a corner point and the centripetal offset tensor corresponding to the corner point provided by an embodiment of the present disclosure;

FIG. 5 shows a flow chart of determining the centripetal offset tensor corresponding to a corner point provided by an embodiment of the present disclosure;

FIG. 6 shows a schematic flow chart of an exemplary feature adjustment network provided by an embodiment of the present disclosure for adjusting a feature map after corner pooling;

FIG. 7 shows a schematic diagram of a process for determining the category of a target object provided by an embodiment of the present disclosure;

FIG. 8 shows a schematic flow chart of determining a detection frame of a target object provided by an embodiment of the present disclosure;

FIG. 9 shows a schematic flowchart of determining a detection frame of a target object based on each candidate corner point pair provided by an embodiment of the present disclosure;

FIG. 10 shows a schematic flowchart corresponding to an exemplary target detection method provided by an embodiment of the present disclosure; FIG.

FIG. 11 shows a schematic flowchart of a neural network training method provided by an embodiment of the present disclosure;

FIG. 12 shows a schematic structural diagram of a target detection device provided by an embodiment of the present disclosure;

FIG. 13 shows a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

In order to make the technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only the present invention. A part of the embodiments is disclosed, but not all of the embodiments are disclosed. The components of the embodiments of the present disclosure generally described and illustrated in the drawings herein may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed present disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the embodiments of the present disclosure.

In the case of target detection for an image, in the case that the image contains multiple similar target objects, as shown in Figure 1, in the case of multiple similar aircraft in the image, the use of aircraft-based When the key point detects the aircraft in the image, the situation shown in the detection frame (1) and the detection frame (2) in Figure 1 is easy to occur, that is, the same detection frame contains multiple aircraft, that is, the detection occurs Mistake, that is, in the current situation of detecting the target object in the image, the accuracy of the detection result is low. For this, the embodiment of the present disclosure provides a target detection method, which can improve the accuracy of the detection result.

Based on the above research, the embodiments of the present disclosure provide a target detection method. After acquiring the image to be detected, first determine the corner position information of each corner point in the image to be detected and the centripetal offset corresponding to each corner point. Because the corner point refers to the main feature point in the image, the position information of the corner point in the image to be detected can characterize the position of each target object in the image to be detected. For example, the corner point can include the upper left corner and the lower right corner. Point, where the upper left point refers to the intersection of the line corresponding to the upper profile of the target object and the line corresponding to the left profile of the target object, and the lower right point refers to the line corresponding to the lower profile of the target object and the line corresponding to the right profile of the target object In the case where the upper left and lower right corners belong to the detection frame of the same target object, the centripetal offset tensor corresponding to the upper left and lower right points should be relatively close to each other. Therefore, the present disclosure The target detection method proposed in the embodiment can determine the corner points belonging to the same target object based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point. The corner point of can detect the same target object.

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. The components of the embodiments of the present disclosure generally described and illustrated in the drawings herein may be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present disclosure provided in the accompanying drawings is not intended to limit the scope of the claimed embodiments of the present disclosure, but merely represents selected embodiments of the present disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the embodiments of the present disclosure.

It should be noted that similar reference numerals and letters indicate similar items in the following drawings. Therefore, once a certain item is defined in one drawing, it does not need to be defined and explained in the subsequent drawings.

In order to facilitate the understanding of this embodiment, a target detection method disclosed in the embodiment of the present disclosure is first introduced in detail. The execution subject of the target detection method provided in the embodiment of the present disclosure is generally a computer device with a certain computing capability. The equipment includes, for example, terminal equipment or servers or other processing equipment. In some possible implementations, the target detection method can be implemented by a processor invoking a computer-readable instruction stored in a memory.

Referring to FIG. 2, which is a flowchart of a target detection method provided by an embodiment of the present disclosure, the method includes steps S201 to S203, and the steps are as follows:

S201: Acquire an image to be detected.

The image to be detected here can be an image to be detected in a specific environment. For example, to detect a vehicle at a certain traffic intersection, a camera can be installed at the traffic intersection, and the video stream of the traffic intersection in a certain period of time can be collected by the camera, and then Frame the video stream to obtain the image to be detected; or to detect animals in a zoo, a camera can be installed in the zoo, and the video stream of the zoo in a certain period of time can be collected by the camera, and then the video The stream undergoes framing processing to obtain the image to be detected.

Here, the image to be detected can contain the target object. The target object here refers to the object to be detected in a specific environment, such as a vehicle at a traffic intersection mentioned above, and an animal in a zoo, or it may not contain the target. Object, if the target object is not included, the detection result is empty, and the implementation of the present disclosure will describe the image to be detected that contains the target object.

S202: Based on the image to be detected, determine the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, where the corner point represents the position of the target object in the image to be detected.

The position of the target object in the image to be detected can be represented by a detection frame. The embodiment of the present disclosure uses corner points to characterize the position of the target object in the image to be detected, that is, the corner points here may be the corner points of the detection frame, for example, The position of the target object in the image to be detected is characterized by the upper left corner point and the lower right corner point. The upper left corner point is the upper left corner point of the detection frame, and the lower right corner point is the lower right corner point of the detection frame, where the upper left corner point is Refers to the intersection of the line corresponding to the upper contour of the target object and the line corresponding to the left contour of the target object. The lower right corner point refers to the intersection of the line corresponding to the lower contour of the target object and the line corresponding to the right contour of the target object.

Of course, the position of the target object is not limited to the upper left corner point and the lower right corner point. The position of the target object can also be characterized by the upper right corner point and the lower left corner point. The embodiment of the present disclosure uses the upper left corner point and the lower right corner point. Take an example for illustration.

The centripetal offset tensor here refers to the offset tensor from the corner point to the center position of the target object. Because the image to be detected is a two-dimensional image, the centripetal offset tensor here includes the offset in two directions. When the two directions are the X-axis direction and the Y-axis direction, the centripetal offset tensor includes the offset value in the X-axis direction and the offset value in the Y-axis direction. Through the centripetal offset tensor corresponding to the corner point and the corner point, the center position of the corner point can be determined. In the case where the upper left corner point and the lower right corner point are in the same detection frame, the center position of the point should be the same , Or relatively close, so the corner points belonging to the same target object can be determined based on the centripetal offset tensor corresponding to each corner point, and then the detection frame of the target object can be determined based on the determined corner points.

The embodiment of the present disclosure uses a neural network to determine the corner point and the centripetal offset tensor corresponding to the corner point, which will be described in conjunction with the following embodiments.

S203: Determine a target object in the image to be detected based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point.

Among them, the corner position information of each corner in the image to be detected refers to the corner position information of each of the multiple corner points in the image to be detected, and the centripetal offset tensor corresponding to each corner refers to the Each of the multiple corner points corresponds to the centripetal offset tensor.

The detection of the target object in the image to be detected here can include the location of the detected target object, such as determining the detection frame of the target object in the image to be detected, or determining the instance information of the target object in the image to be detected, or at the same time determining the target object to be detected The detection frame and instance information of the target object in the image, and how to determine the target object in the image to be detected will be explained in detail later.

The target detection method proposed in the above steps S201 to S203, after acquiring the image to be detected, first determine the corner position information of each corner point in the image to be detected, and the centripetal offset tensor corresponding to each corner point, because the angle Point refers to the main feature point in the image. The position information of the corner point in the image to be detected can characterize the position of each target object in the image to be detected. For example, the corner point can include the upper left corner point and the lower right corner point, where the upper left corner The corner point refers to the intersection point of the line corresponding to the upper contour of the target object and the line corresponding to the left contour of the target object. The lower right corner point refers to the intersection point of the line corresponding to the lower contour of the target object and the line corresponding to the right contour of the target object. In the case where the upper left corner point and the lower right corner point belong to the detection frame of the same target object, the positions of the centripetal offset tensor corresponding to the upper left corner point and the lower right corner point should be relatively close. Therefore, the embodiment of the present disclosure proposes The target detection method can determine the corner points belonging to the same target object based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, and then based on the determined corner point The same target object is detected.

The description of S201 to S203 above will be continued below.

Regarding the above S202, in an embodiment, based on the image to be detected, the corner position information of the corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point are determined, as shown in FIG. 3 It may include the following steps S301 to S303:

S301: Perform feature extraction on the image to be detected to obtain an initial feature map corresponding to the image to be detected;

S301: Perform corner pooling processing on the initial feature map to obtain a feature map after corner pooling;

S303: Based on the feature map after corner pooling, determine the corner position information of each corner in the image to be detected, and the centripetal offset tensor corresponding to each corner.

Here the size of the image to be detected is fixed, for example, the size is H*W, where H and W represent the pixel values in the length and width directions of the image to be detected respectively, and then input the image to be detected into the pre-trained hourglass convolution The neural network performs feature extraction, such as texture feature extraction, color feature extraction, edge feature extraction, etc., and the initial feature map corresponding to the image to be detected can be obtained.

Of course, because the input end of the hourglass convolutional neural network has requirements for the received image size, that is, it receives a set size of the image to be inspected. If the size of the image to be inspected does not meet the set size, it needs to be treated first. Adjust the size of the detected image, and then input the adjusted image to be detected into the hourglass convolutional neural network for feature extraction and size compression, that is, an initial feature map with a size of h*w*c can be obtained, where c Indicates the number of channels of the initial feature map, and h and w represent the size of the initial feature map on each channel.

The initial feature map contains multiple feature points, and each feature point has feature data. These feature data can represent the global information of the image to be detected. In order to facilitate the extraction of corner points from these feature points, the embodiment of the present disclosure proposes to The graph performs corner pooling processing to obtain the feature map after corner pooling. Compared with the initial feature map, the feature map after corner pooling enhances the semantic information of the target object contained in the corner points, so it is based on The feature map after corner pooling can more accurately determine the corner position information of each corner point in the image to be detected, and the centripetal offset tensor corresponding to each corner point.

Here, by performing feature extraction on the image to be detected, an initial feature map is obtained, and corner pooling is performed on the initial feature map to obtain a feature map that can facilitate the extraction of the corner points and the centripetal offset corresponding to the corner points, that is, the corner points Feature map after pooling.

After the corner point pooled feature map is obtained, the corner point pooling feature map and the pre-trained neural network can be used to determine whether there is a corner point. If there is a corner point, determine whether each corner point is The position information of the corner points in the image to be detected. In the embodiments of the present disclosure, the position of the target object in the image to be detected is characterized by the upper left corner point and the lower right corner point, that is, the corner point position information of each corner point in the image to be detected is determined The process can be the process of determining the corner position information of the upper left corner point in the image to be detected, and the process of determining the corner position information of the lower right corner point in the image to be detected, where the upper left corner can be detected through the upper left corner point detection network The corner position information of the point in the image to be detected, and the corner position information of the lower right corner point in the image to be detected through the lower right corner point detection network, because the corner position information of the upper left corner point in the image to be detected and The method for determining the corner position information of the lower right corner point in the image to be detected is similar, and the embodiment of the present disclosure determines the corner position information of the upper left corner point in the image to be detected as an example for detailed description.

In an embodiment, the upper left corner point detection network may include the upper left corner point heat map prediction network and the upper left corner point local offset prediction network. Based on the feature map after corner point pooling, it is determined that each corner point is in the image to be detected. In the case of the corner position information in, as shown in FIG. 4, the following steps S401 to S404 may be included:

S401: Based on the feature map after corner pooling, generate a corner heat map corresponding to the image to be detected.

In the case of predicting the position information of the corner point of the upper left corner point in the image to be detected, the corner point heat map here can be obtained by the upper left corner point heat map prediction network in the upper left corner point detection network, and the corner point pooled The feature map is input into the upper left corner point heat map prediction network, and the upper left corner point heat map corresponding to the image to be detected can be obtained. The upper left corner point heat map contains multiple feature points, and each feature point has the feature data corresponding to the feature point. , Based on the feature data of the feature point in the corner point heat map, the probability value of the feature point as the upper left corner point can be determined.

The size of the upper left corner heat map obtained here is h*w*m, where h*w represents the size of the corner heat map on each channel, m represents the number of preset channels, and each preset channel corresponds to a preset Object category, such as m=40, which means there are 40 preset object categories. The upper left corner heat map can be used to determine the upper left corner point in the image to be detected, and can also be used to determine that the upper left corner point is in the image to be detected. The category of the target object represented in the image, and the process of how to determine the category of the target object will be explained in detail later.

S402, based on the corner heat map, determine the probability value of each feature point in the corner heat map as a corner point, and filter out the corners from the feature points of the corner heat map based on the probability value of each feature point as a corner point point.

Based on the probability value of each feature point as the upper left corner point, the probability that the feature point is the upper left corner point can be determined, so that the feature point with the probability value greater than the set threshold is taken as the upper left corner point.

S403: Obtain the position information of the selected corner points in the corner point heat map and the local offset information corresponding to each corner point.

The local offset information is used to indicate the position offset information of the real physical point represented by the corresponding corner point in the corner heat map. The local offset information corresponding to each upper left corner point is used to indicate the position of the upper left corner point. The position offset information of the characterization of the real physical point in the upper left corner of the heat map.

The local offset information here can be represented by a local offset tensor. The local offset tensor can also represent the offset values in two directions in the upper left corner heat map, such as the coordinates in the upper left corner heat map The system includes two directions, namely the x-axis direction and the y-axis direction. The local offset tensor includes the offset value in the x-axis direction and the offset value in the y-axis direction.

Based on the coordinate system corresponding to the upper left corner point heat map, the position information of each feature point in the upper left corner point heat map in the upper left corner point heat map can be obtained, taking into account the obtained position information of the upper left corner point and the upper left corner point There may be errors between the position information of the real physical points represented. For example, the position information of a certain upper left corner point can be obtained by detecting the position of the upper left corner point heat map, and the position information of the real physical point represented by the upper left corner point is different from the position information of the real physical point. The detected position information of the upper left corner point should have a certain deviation, and the local offset information is used to indicate the deviation.

In order to improve the accuracy of target object detection, we can introduce the pre-trained upper left corner point local offset prediction network, and then input the pooled feature map into the upper left corner point local offset prediction network in the upper left corner point prediction network. Determine the local offset information corresponding to each feature point in the upper left corner heat map, and then modify the position information of the feature point in the corner heat map based on the local offset information, and then based on the corrected position information, Determine the corner position information of the upper left corner point in the image to be detected.

S404: Based on the acquired position information of each corner point in the corner point heat map, the local offset information corresponding to each corner point, and the size ratio between the corner point heat map and the image to be detected, it is determined that each corner point is in the to-be-detected image. Detect the position information of the corner points in the image.

Here, the acquired position information of each upper left corner point in the upper left corner point heat map may include the coordinate value x in the x-axis direction in the upper left corner point heat map, and the coordinate value y in the y-axis direction, to be detected The corner position information of each upper left corner point in the image may include the coordinate value X in the X-axis direction and the coordinate value Y in the Y-axis direction.

Here, the corner position information of the i-th upper left corner point in the image to be detected can be determined according to the following formula (1) and formula (2):

tl _x(i) =n*(x _l(i) +σ _lx(i) ); (1);

tl _y(i) = n*(y _l(i) +σ _ly(i) ); (2);

Among them, tl _x(i) represents the coordinate value of the i-th upper left corner point in the X-axis direction of the image to be detected, and tl _y(i) represents the coordinate value of the i-th upper left corner point in the Y-axis direction of the image to be detected Value; n represents the size ratio between the upper left corner point heat map and the image to be detected; x _l(i) represents the coordinate value of the i-th upper left corner point in the x-axis direction of the upper left corner point heat map, y _{l(i )} Represents the coordinate value of the i-th upper left corner point in the y-axis direction of the corner heat map; σ _lx(i) represents the real physical point represented by the i-th upper left corner point in the x-axis direction of the corner heat map The offset value of σ _ly(i) represents the offset value of the real physical point represented by the i-th upper left corner point in the y-axis direction of the corner heat map.

The above process is the process of determining the corner position information of the upper left corner point in the image to be detected, and the process of determining the corner point position information of the lower right corner point in the image to be detected is the same, that is, the feature map after the corner point pooling is input to the right The lower right corner point heat map prediction network in the lower corner point prediction network obtains the lower right corner point heat map, and then determines the probability value of each feature point in the lower right corner point heat map as the lower right corner point, selects the lower right corner point from it, and combines them at the same time The position information of the corner point of each lower right corner point in the image to be detected is determined by the local offset information corresponding to the lower right corner point determined by the lower right corner point local offset network in the lower right corner point prediction network, which will not be repeated here.

Similarly, the corner position information of the j-th lower right corner point in the image to be detected can be determined according to the following formula (3) and formula (4):

br _x(j) =n*(x _r(j) +σ _rx(j) ); (3);

br _y(j) = n*(y _r(j) +σ _ry(j) ); (4);

Among them, br _x(j) represents the coordinate value of the j-th lower right corner point in the X-axis direction of the image to be detected, and br _y(j) represents the coordinate value of the j-th lower right corner point in the Y-axis direction of the image to be detected Value; n represents the size ratio between the lower right corner point heat map and the image to be detected; x _r(j) represents the coordinate value of the j-th lower right corner point in the x-axis direction of the lower right corner point heat map, y _{r(j )} Represents the coordinate value of the j-th lower right corner point in the y-axis direction of the corner heat map; σ _rx(j) represents the real physical point represented by the j-th lower right corner point in the x-axis direction of the corner heat map The offset value of σ _ry(j) represents the offset value of the real physical point represented by the j-th lower right corner point in the y-axis direction of the corner heat map.

The above steps S401 to S404 are a way to determine the corner position information of each corner in the image to be detected according to the embodiment of the present disclosure. This process introduces a corner heat map and passes each feature point as the probability of a corner point. The value determines the feature point that can be used as the corner point. After the corner point is selected, the position information of the corner point in the corner point heat map is corrected to determine the corner point position information of the corner point in the image to be detected. This method can obtain the corner point position information of the corner point with higher accuracy, thereby facilitating the subsequent detection of the position of the target object in the image to be detected based on the corner point.

The following describes the process of determining the centripetal offset tensor corresponding to each corner point. When the corner points are divided into the upper left corner point and the lower right corner point, it is also necessary to confirm the centripetal offset tensor corresponding to the upper left corner point, and The centripetal offset tensor corresponding to the lower right corner point, the embodiment of the present disclosure determines the centripetal offset tensor corresponding to the upper left corner point as an example for detailed description, the centripetal offset tensor corresponding to the lower right corner point and the upper left corner point The method for determining the corresponding centripetal offset tensor is similar, and will not be repeated in the embodiment of the present disclosure.

In one embodiment, in order to obtain a more accurate centripetal offset tensor, before determining the centripetal offset tensor, a feature adjustment process is introduced to adjust the feature map after corner pooling, and then the direction is determined. The central offset tensor, where, in the case of determining the centripetal offset tensor corresponding to each corner point based on the feature map after the corner point pooling, as shown in FIG. 5, the following steps S501 to S504 may be included:

S501: Determine a steering offset tensor corresponding to each feature point in the feature map after corner point pooling based on the feature map after corner point pooling.

Wherein, the steering offset tensor corresponding to each feature point represents the offset tensor from the feature point to the center point of the target object in the image to be detected.

Considering that the position of the target object in the image to be detected is related to the target object information, that is, it is hoped that the feature data of the corner points of the feature map after the corner point pooling can contain richer target object information, so each feature point can be considered here. In particular, it means that the corner point feature vector can contain richer target object information, so based on the steering offset tensor corresponding to each feature point, the feature map after the corner point pooling can be adjusted to make the adjusted feature map Each feature point, especially the corner point, can contain richer target object information.

Here, the corner point pooled feature map can be convolved to obtain the steering offset tensor corresponding to each feature point in the corner point pooled feature map. The steering offset tensor includes the direction along x The offset value in the axis direction and the offset value along the y-axis direction.

To determine the centripetal offset tensor corresponding to the upper left corner point as an example, the convolution operation is performed on the feature map after the corner point pooling, and the feature point is mainly obtained as the steering offset tensor corresponding to the upper left corner point.

S502: Determine the offset domain information of each feature point based on the steering offset tensor corresponding to each feature point.

Wherein, the offset domain information includes a plurality of initial feature points associated with the feature point and respectively point to the offset tensors of the respective offset feature points.

After the steering offset tensor corresponding to each feature point is obtained, a convolution operation is performed based on the steering offset tensor corresponding to each feature point to obtain the offset domain information of the feature point.

To determine the centripetal offset tensor corresponding to the upper left corner point as an example, after obtaining the corresponding steering offset tensor with each feature point as the upper left corner point, then use each feature point as the upper left corner point The corresponding steering offset tensor is subjected to convolution operation to obtain the offset domain information when the feature point is used as the upper left corner point.

S503: Based on the feature map after corner point pooling and the offset domain information of the feature points in the feature map after corner point pooling, adjust the feature data of the feature points in the feature map after corner point pooling , Get the adjusted feature map.

After the feature point of the feature map after corner point pooling is obtained as the offset domain information in the case of the upper left corner point, the feature map after the corner point pooling can be pooled, and the feature map after the corner point pooling Each feature point of is used as the offset domain information of the upper left corner point, and the deformable convolution operation is performed at the same time to obtain the adjusted feature map corresponding to the upper left corner point.

Here, the process of steps S501 to S503 can be determined through the feature adjustment network as shown in FIG. 6:

Perform a convolution operation on the corner point pooled feature map to obtain the corresponding steering offset tensor when each feature point in the corner point pooled feature map is used as the upper left corner point, and then the steering offset tensor The tensor performs convolution operation to obtain offset domain information. The offset domain information here is explained as follows:

Perform the convolution operation on the feature data of the feature points in the feature map after the corner point pooling. For example, if the feature data of the feature point A in the feature map after the corner point pooling is subjected to the convolution operation, the When there is offset domain information, when the feature data of feature point A is convolved according to 3*3 convolution, the feature map after corner pooling can be used to include feature point A. The feature data of the 9 initial feature points represented by the solid line frame are obtained by convolution operation. After considering the offset domain information, it is hoped that the feature point A can be adjusted by the feature data containing more abundant target object information. For example, the feature points used for feature adjustment of feature point A can be offset based on the steering offset vector corresponding to each feature point. For example, the offset feature points can be pooled by corner points as shown in Figure 6. The latter feature map is represented by the 9 dashed boxes, so that the feature data of the 9 offset feature points can be used to perform the convolution operation, and the feature data of feature point A can be adjusted. The offset domain information can be passed The offset tensor in Figure 6 is represented. Each offset tensor in the offset tensor is the offset tensor of each initial feature point pointing to the offset feature point corresponding to the initial feature point, representing the initial feature After the point is offset in the x-axis direction and the y direction, the offset feature point corresponding to the initial feature point is obtained.

Considering each feature point as the corresponding steering offset tensor in the case of the upper left corner point, so that the feature data in the feature point after feature adjustment contains richer target object information, which is convenient for later determination based on the adjusted feature map In the case of the centripetal offset tensor corresponding to each upper left corner point, a more accurate centripetal offset tensor can be obtained.

In the same way, considering each feature point as the lower right corner point of the guide offset tensor relative to the center point of the target object, the feature point after feature adjustment contains richer target object information, which is convenient for the subsequent adjustment based on the target object information. In the case of determining the centripetal offset tensor corresponding to each lower right corner of the feature map, a more accurate centripetal offset tensor can be obtained.

S504, based on the adjusted feature map, determine the centripetal offset tensor corresponding to each corner point.

Here, a convolution operation is performed on the feature data corresponding to the corner points in the adjusted feature map, and the centripetal offset tensor corresponding to each corner point is determined.

Here, the adjusted feature map may include the adjusted feature map corresponding to the upper left corner point, and the adjusted feature map corresponding to the lower right corner point, and each upper left corner is determined based on the adjusted feature map corresponding to the upper left corner point In the case of the centripetal offset tensor corresponding to the point, it can be determined by the centripetal offset prediction network corresponding to the upper left corner point. Based on the adjusted feature map corresponding to the lower right corner point, determine the direction corresponding to each lower right corner point. In the case of the heart offset tensor, it can be determined by the centripetal offset prediction network corresponding to the lower right corner point.

The above process of S501 to S504 is the process of determining the centripetal offset tensor provided by the embodiments of the present disclosure, by considering the target object information, such as introducing the steering offset tensor corresponding to the corner point, and the offset domain information of the feature point , Adjust the feature data of the feature points in the feature map after the corner point pooling, so that the feature data of the feature points in the adjusted feature map can contain richer target object information, so that each A more accurate centripetal offset tensor corresponding to each corner point. Through accurate centripetal offset tensor, the position information of the center point pointed by the corner point can be accurately obtained, so as to accurately detect the position of the target object in the image to be detected .

As mentioned above, the category of the target object contained in the image to be detected can be determined through the corner heat map. Here is how to determine the category of the target object based on the corner heat map. From the above, we know the corner heat of the image to be detected. The figure includes the corner heat maps corresponding to multiple channels, and each channel corresponds to a preset object category; in the above-mentioned corner heat map, each feature point in the corner heat map is determined as a corner point After the probability value of, as shown in FIG. 7, the detection method provided by the embodiment of the present disclosure further includes the following steps S701 to S702:

S701: For each channel of the multiple channels, determine whether there is a corner point in the corner heat map corresponding to the channel based on the probability value of each feature point as the corner point in the corner heat map corresponding to the channel.

S702: When there are corner points in the corner point heat map corresponding to the channel, determine that the image to be detected contains the target object of the preset object category corresponding to the channel.

For the corner heat map corresponding to the image to be detected, the probability value of each feature point in the corner heat map corresponding to each channel as a corner point can be determined whether there is a corner point in the corner heat map of the channel, for example, When the corner feature map of a channel contains multiple feature points with a corresponding probability value greater than the set threshold, it means that the corner feature map of the channel contains corner points with a high probability, and the corner points are used to represent the target object The position in the image to be detected, so that it can be explained that the image to be detected contains the target object of the preset object category corresponding to the channel.

For example, to detect animals in a zoo, you can set the number of channels to 100, that is, the obtained corner heat map is h*w*100, and each channel corresponds to a preset object category, for a certain type of object to be detected Image, among the 100 channels of the corner heat map corresponding to the image to be detected, only the corner heat maps in the first and second channels contain corner points, and the first channel corresponds to the pre- Assuming that the object category is 01, and the preset object category corresponding to the second channel is 02, it can be explained that the image to be detected contains target objects of the categories 01 and 02.

The embodiment of the present disclosure proposes that by inputting the feature map after the corner point pooling into the corner heat map prediction network, the corner heat map containing the preset number of channels can be obtained, and whether the corner heat map corresponding to each channel is There are corner points, and then it can be determined whether there is a target object corresponding to the channel in the image to be detected.

In addition, after detecting the corner points contained in the corner heat map on each channel, the centripetal offset tensor corresponding to the corner point can be determined, so as to determine the position of the target object corresponding to each channel in the image to be detected. , In order to determine the category of each target object in the image to be detected in combination with the category of the target object corresponding to the channel.

Regarding the above S203, that is, in the case of determining the target object in the image to be detected based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, it may include:

Here, in the case of generating the detection frame of the target object in the image to be detected, it is necessary to determine the corner position information of at least one upper left corner point and the lower right corner point, or at least one corner point position information of the upper right corner point and the lower left corner point is required To be sure, the embodiment of the present disclosure uses an upper left corner point and a lower right corner point to determine the detection frame as an example for description.

Here, in the case of determining the detection frame of the target object in the image to be detected based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, as shown in Figure 8, Can include:

S801: Based on the corner position information of each corner point in the image to be detected, filter candidate corner point pairs that constitute a candidate detection frame.

Take the candidate corner point pair containing the upper left corner and the lower right corner as an example. In the case of screening the upper left and lower right corner points that can constitute the candidate detection frame, in order to improve the screening speed, the upper left corner and the lower right corner can be judged first Whether the points belong to the same target object category, in the case of determining that any upper left corner point and lower right corner point belong to the same target object category, continue to determine the corner position of any upper left corner point and lower right corner point in the image to be detected Whether the information constitutes the same candidate detection frame.

For example, the upper left corner point should be located at the upper left of the lower right corner point in the image to be detected, and the position information of the corner point based on the upper left corner point and the lower right corner point, such as the position coordinates of the upper left corner point in the image to be detected, and the right If the position coordinates of the lower corner point in the image to be detected cannot be such that the upper left corner point is located at the upper left corner of the lower right corner point, the upper left corner point and the lower right corner point cannot constitute a candidate corner point pair.

Here, a coordinate system can be established in the image to be detected, the coordinate system includes X axis and Y axis, and the corner position information of each corner point in the coordinate system includes the abscissa value in the X axis direction and the Y axis. The ordinate value in the axis direction, and then in the coordinate system, according to the corresponding coordinate value of each corner point in the coordinate system, the upper left corner point and the lower right corner point that can constitute the candidate detection frame are filtered.

S802: Determine the position information of the center point to which the corner point points based on the corner position information of each corner point in the image to be detected in each candidate corner point pair and the centripetal offset tensor corresponding to the corner point.

Here, the position information of the center point pointed to by the upper left corner point in each candidate corner point pair can be determined according to the following formula (5), and the center point pointed to by the lower right corner point in each candidate corner point pair can be determined according to the following formula (6) location information:

in,

Represents the corresponding abscissa value in the X-axis direction in the position information of the center point pointed to by the i-th upper left corner point,

Indicates the corresponding ordinate value in the Y-axis direction in the center point position information pointed to by the i-th upper left corner point; tl _x(i) indicates the X-axis direction in the corner point position information of the i-th upper left corner point in the image to be detected The abscissa value corresponding to above; tl _y(i) represents the ordinate value corresponding to the Y axis direction of the upper left corner point in the corner point position information in the image to be detected;

Represents the offset value to the X axis in the centripetal offset tensor of the i-th upper left corner point,

Represents the Y-axis offset value in the centripetal offset tensor of the i-th upper left corner point.

in,

Represents the corresponding abscissa value in the X-axis direction in the position information of the center point pointed to by the j-th lower right corner point,

Represents the corresponding ordinate value in the Y axis direction in the center point position information pointed to by the jth lower right corner point; br _x(j) represents the X axis direction in the corner point position information of the jth lower right corner point in the image to be detected The abscissa value corresponding to above; br _y(j) represents the ordinate value corresponding to the Y-axis direction of the j-th lower right corner point in the corner position information of the image to be detected;

Represents the offset value to the X axis in the centripetal offset tensor of the j-th lower right corner point,

Represents the offset value to the Y axis in the centripetal offset tensor of the jth lower right corner point.

S803: Based on the corner position information of each corner point in each candidate corner point pair in the image to be detected, determine the center area information corresponding to the candidate corner point pair.

The central area information here can be preset, which is defined as the coordinate range of the central area frame that coincides with the center of the detection frame of the target object. Through the coordinate range of the central area frame, it is possible to detect whether the candidate detection frame contains a unique target.

For example, in the candidate corner point pair, the position information of the center point pointed to by the upper left corner point and the center point position information pointed to by the lower right corner point are located within the coordinate range of the central area frame, in the case where the coordinate range of the central area frame is small Then, it can be considered that the position information of the center point pointed to by the upper left corner point is relatively close to the position information of the center point pointed to by the lower right corner point, so as to determine that the candidate detection frame formed by the candidate corner point pair contains a unique target object.

Here, in the case of determining the center area information corresponding to the candidate corner point pair based on the corner point position information of each corner point in the candidate corner point pair in the image to be detected, it may include:

(1) Based on the corner position information of each corner point of the candidate corner point pair, determine the corner point position information that characterizes the center area frame corresponding to the candidate corner point pair;

(2) Based on the corner position information of the central area frame, determine the coordinate range of the central area frame corresponding to the candidate corner point pair.

In the case that the m-th candidate corner point pair is composed of the i-th upper left corner point and the j-th lower right corner point, the m-th candidate corner point pair can be determined according to the following formulas (7) to (10) Corresponding corner position information of the central area frame:

in,

Represents the abscissa value of the upper left corner point of the center area frame corresponding to the m-th candidate corner point pair in the X-axis direction of the image to be detected;

Represents the abscissa value of the upper left corner of the center area frame corresponding to the m-th candidate corner point pair in the Y-axis direction of the image to be detected;

Represents the abscissa value of the lower right corner of the center area frame corresponding to the m-th candidate corner point pair in the X-axis direction of the image to be detected;

Represents the abscissa value of the lower right corner of the central area frame corresponding to the m-th candidate corner point pair in the Y-axis direction of the image to be detected; μ represents the ratio of the length and width of the central area frame to the length and width of the candidate detection frame, The ratio is preset, and μ∈(0,1).

After determining the corner position information of the center area frame corresponding to the m-th candidate corner point pair, the coordinate range of the center area frame can be determined according to the following formula (11):

Among them, R _central(m) represents the coordinate range of the central area frame corresponding to the m-th candidate corner point pair. The coordinate range of the central area frame passes through the x(m) value in the X-axis direction and the Y-axis direction. y(m) value, where the range of x(m) satisfies

The range of y(m) satisfies

S804: Determine the detection frame of the target object in the candidate detection frame based on the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair.

The center area information corresponding to each candidate corner point pair is used to restrict the proximity between the center point position information pointed to by each corner point in the candidate corner point pair, and each corner point in a certain candidate corner point pair When the position information of the pointed center point is located in the center area frame corresponding to the candidate corner point pair, it can be explained that the center point of each corner point in the candidate corner point pair is relatively close, and it can be explained that the candidate corner point pair constitutes The target object contained in the candidate detection frame of is the only target object.

In the method of determining the detection frame of the target object proposed by the embodiment of the present disclosure, the corner point position information of the corner points is used to first determine the candidate corner point pairs that can constitute the candidate detection frame, and then based on each corner point in the candidate corner point pair. The centripetal offset tensor of the object is used to determine whether the target object surrounded by the candidate detection frame is the same target object, so that the detection frame of all target objects in the image to be detected can be detected more accurately.

Here, in the case that the detection frame of the target object is determined in the candidate detection frame based on the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair, As shown in Figure 9, the following steps S901 to S903 may be included:

S901: Determine a valid candidate corner point pair based on the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair.

Here, in a case where the position information of the center point pointed to by each corner point in a certain candidate corner point pair is located in the center area frame corresponding to the candidate corner point pair, the candidate corner point pair is regarded as a valid candidate corner point pair. .

Here, the following formula (12) can be used to determine whether the candidate corner point pair formed by the i-th upper left corner point and the j-th lower right corner point is a valid candidate corner point pair, that is, the i-th upper left corner point and the j-th corner point pair are judged Whether the coordinate range of the m-th central area frame corresponding to the candidate detection frame formed by the lower right corner point and the center point position information pointed to by the i-th upper left corner point and the j-th lower right corner point respectively meet the following formula (12):

The coordinate range of the m-th central area frame corresponding to the candidate detection frame formed by the i-th upper-left corner point and the j-th lower-right corner point, and the center point respectively pointed to by the i-th upper-left corner point and the j-th lower-right corner point When the position information satisfies the above formula (12), it means that the candidate corner point pair formed by the i-th upper left corner point and the j-th lower right corner point is a valid candidate corner point pair, and then continue to perform S902 on the valid candidate corner point Otherwise, if the candidate corner point pair formed by the i-th upper left corner point and the j-th lower right corner point is an invalid candidate corner point pair, continue to determine whether the i-th upper left corner point and other lower right corner points are A valid candidate corner point pair can be formed, and the subsequent steps can be executed after a valid candidate corner point pair is obtained.

S902, based on the position information of the center point pointed to by each corner point in the valid candidate corner point pair, the central area information corresponding to the valid candidate corner point pair, and the probability value corresponding to each corner point in the valid candidate corner point pair, determine each The score of the candidate detection frame corresponding to the valid candidate corner points.

Wherein, the probability value corresponding to each corner point is used to indicate the probability value of the corresponding feature point of the corner point in the corner point heat map as the corner point.

In the case of detecting the image to be detected, there may be multiple candidate detection frames for the same target object. The position of the target object represented by some candidate detection frames in the image to be detected may be less accurate. Here, Introduce the score of the candidate detection frame corresponding to each valid candidate corner point pair, such as the area formed by the center point of each corner point in the valid candidate corner point pair and the center area corresponding to the valid candidate corner point pair The area relationship between the frames and the probability value corresponding to each corner point in the effective candidate corner point pair represent the score value of the candidate detection frame corresponding to each effective candidate corner point pair. The candidate detection frame with the higher score is taken as The probability of the detection frame of the target object is relatively large, and the candidate detection frame is screened through this.

Here, for the valid candidate corner point pair formed by the i-th upper left corner point and the j-th lower right corner point, the score of the candidate detection frame corresponding to the valid candidate corner point pair can be determined according to the following formula (13):

Among them, s represents the score of the candidate detection frame corresponding to the valid candidate corner point pair formed by the i-th upper left corner point and the j-th lower right corner point; s _tl(i) represents the i-th upper left corner point at the upper left corner point The corresponding feature point in the heat map is used as the probability value of the upper left corner point; s _br(j) represents the probability value of the j-th lower right corner point in the lower right corner point in the heat map as the lower right corner point.

S903: Determine the detection frame of the target object in the candidate detection frame based on the score of the candidate detection frame corresponding to each valid candidate corner point and the size of the overlapping area between adjacent candidate detection frames.

Here, the overlap area can be determined by the size of the overlap area in the image to be detected. The following describes how to base each valid candidate corner point on the corresponding candidate detection frame score and the overlap area between adjacent candidate detection frames , To filter the detection frame of the target object.

Here, the detection frame of the target object can be screened in multiple candidate detection frames by soft non-maximum suppression. Here, for multiple candidate detection frames that exceed the overlap area threshold, the candidate detection frame with the highest corresponding score can be used as For the detection frame of the target object, delete other candidate detection frames in the multiple candidate detection frames, so that the detection frame of the target object in the image to be detected can be obtained.

In the above steps S901 to S903, through effective screening of the candidate corner points constituting the candidate detection frame, it is determined that the candidate detection frame that only characterizes one target object can be screened out, and then the candidate detection frames that only represent one target object are softly non-selected. The screening is greatly suppressed, so as to accurately obtain the detection frame that characterizes the target object.

After the detection frame of the target object in the image to be detected is obtained, the instance information of the target object in the detection frame can be determined. Here, the object to be detected can be determined based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected Instance information of the target object in the image.

The instance information here can be represented by a mask. The mask here means that after instance segmentation of the target object in the image, the pixels of each target object are given at the pixel level, so the mask can be accurate to the edge of the object. In this way, a more accurate position of the target object in the image to be detected can be obtained; in addition, the shape of the target object can also be represented based on the mask, so that the determination of the target object's category can be verified based on the shape, and based on The shape of the target object represented by the mask is subjected to subsequent action analysis on the target object, which is not described in the embodiment of the present disclosure.

Here, in the case of determining the instance information of the target object in the image to be detected based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected, it may include:

(1) Based on the detection frame and the initial feature map of the target object, extract the feature data of the feature points of the initial feature map in the detection frame;

(2) Based on the feature data of the feature points of the initial feature map in the detection frame, determine the instance information of the target object in the image to be detected.

Here, the detection frame of the target object and the initial feature map corresponding to the image to be detected are input to the region of interest extraction network. The region of interest extraction network can first extract the region of interest matching the size of the initial feature map, and then pass the alignment pool of interest The feature data of the feature points of the initial feature map in the detection frame (that is, the region of interest) is obtained by transformation processing, and then the feature data of the feature points of the initial feature map in the detection frame are input into the mask prediction network. Generate instance information of the target object, the instance information can be expressed in the form of a mask, and then the mask of the target object can be expanded to the same size as the target object in the image to be detected, that is, the target object of the image to be detected can be obtained Instance information.

The overall description of the target detection method proposed in the embodiment of the present disclosure will be given below with reference to FIG. 10:

Input the image to be detected into the hourglass convolutional neural network to obtain the initial feature map f corresponding to the image to be detected, and then to detect the target object in the image to be detected, the initial feature map f can be corner points Pooling process to obtain the corner point pooled feature map p, and then perform the upper left corner point detection and feature adjustment on the corner point pooled feature map p, and the direction corresponding to the upper left corner point and the upper left corner point can be obtained. Heart offset tensor, the process of obtaining the upper left corner point is determined by the upper left corner point detection network. The upper left corner point detection network includes the upper left corner point heat map prediction network and the upper left corner point local offset prediction network (none in Figure 10). Show), before obtaining the centripetal offset tensor corresponding to the upper left corner point, the feature adjustment network is first used to adjust the feature map p after the corner point pooling. This process includes determining the steering offset tensor corresponding to the upper left corner point. Then, based on the deformable convolution operation, the feature map p after the corner point pooling is adjusted to obtain the adjusted feature map g, and then through the convolution operation, the centripetal corresponding to the upper left corner point is determined The offset tensor.

The lower right corner point is determined by the lower right corner point detection network. The centripetal offset tensor corresponding to the lower right corner point is obtained by feature adjustment and convolution operation. The process is the same as the centripetal offset tensor corresponding to the upper left corner and the upper left corner point. The determination process is similar, and then the detection frame of the target object is determined based on the centripetal offset tensor corresponding to the upper left corner point and the upper left corner point, and the centripetal offset tensor corresponding to the lower right corner point and the lower right corner point.

After the detection frame of the target object is obtained, the region of interest is extracted based on the detection frame of the target object and the initial feature map f, and then the region of interest is aligned and pooled to obtain the feature of the region of interest (ie, the initial feature The feature data of the feature points in the detection frame), and then through the convolution operation in the mask prediction network, the mask of the target object can be obtained, and then the size of the mask is enlarged, and the image to be detected is obtained The mask image of the same size (ie, the instance information of the target object).

Through the target detection method proposed in the embodiments of the present disclosure, the detection frame of the target object, the mask of the target object, and the target object category can be output, and the required results can be obtained according to the preset requirements, such as outputting the detection of the target object The frame, or the output of the mask image of the target object, or both the detection frame of the target object and the mask image of the target object, and the category of the target object are output at the same time, which are not limited in the embodiment of the present disclosure.

The target detection method in the embodiments of the present disclosure may be implemented by a neural network, which is obtained by training using sample pictures containing labeled target sample objects.

Here, as shown in FIG. 11, the neural network of the target detection method proposed in the embodiment of the present disclosure can be obtained by training using the following steps, including steps S1101 to S1104:

S1101. Obtain a sample image.

The sample image here may include a positive sample that annotates the target sample object, and a negative sample that does not include the target sample object, and the target object contained in the positive sample may include multiple categories.

Here, the positive samples labeled with the target sample objects can be divided into the target sample objects labeled with the detection frame and the target sample objects labeled with the mask.

S1102, based on the sample image, determine the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point, where the sample corner point represents the position of the target sample object in the sample image.

Here, based on the sample image, the process of determining the corner position information of the sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point is the same as the process of determining the corner point in the image to be detected as mentioned above. The corner position information in and the centripetal offset tensor corresponding to each corner are similar, so I won’t repeat them here.

S1103: Predict the target sample object in the sample image based on the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point.

Here, the process of predicting the target sample object in the sample image is the same as the method of determining the target object in the image to be detected as mentioned above, and will not be repeated here.

S1104: Adjust network parameter values of the neural network based on the predicted target sample object in the sample image and the labeled target sample object in the sample image.

Here, a loss function can be introduced to determine the loss value corresponding to the target sample object prediction. After multiple training, the network parameter value of the neural network can be adjusted through the loss value, for example, when the loss value is less than the set threshold, that is You can stop training to get the network parameter values of the neural network.

In addition, the detection frame of the target sample object, the mask of the target sample object, and the determination process of the target sample object's category are the same as the detection frame of the target object, the mask of the target object, and the category of the target object described above. The process is similar, so I won't repeat it here.

The neural network training method provided by the embodiments of the present disclosure obtains a sample image, and based on the sample image, determines the corner position information of each sample corner point in the sample image, and the centripetal offset corresponding to each sample corner point. Based on the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point, the target sample object is detected in the sample image, because the sample corner point refers to the main Feature points, for example, the sample corner points can include the upper left sample corner point and the lower right sample corner point, where the upper left sample corner point refers to the intersection of the line corresponding to the upper contour of the target sample object and the line corresponding to the left contour of the target sample object. The lower sample corner point refers to the intersection of the straight line corresponding to the lower contour of the target sample object and the straight line corresponding to the right contour of the target sample object. The upper left sample corner and the lower right sample corner belong to the detection frame of the same target sample object. The positions of the centripetal offset tensor corresponding to the upper left sample corner point and the lower right sample corner point should be relatively close. Therefore, the training method of the neural network proposed in the embodiment of the present disclosure is based on the representation of the target sample object being trained The corner point position information of the position in the sample image, and the centripetal offset tensor corresponding to each sample corner point, determine the sample corner point belonging to the same target sample object, and then based on the determined sample corner point, the sample corner point can be detected. The same target sample object, and then continuously adjust the neural network parameters based on the target object in the sample image, so as to obtain a neural network with higher accuracy. Based on the neural network with higher accuracy, the target object can be accurately measured. Detection.

Those skilled in the art can understand that in the above method, the writing order of the steps does not mean a strict execution order but constitutes any limitation on the implementation process. The execution order of the steps should be determined by their functions and possible internal logic.

Based on the same technical concept, the embodiment of the present disclosure also provides a target detection device corresponding to the target detection method. Since the technical principle of the device in the embodiment of the disclosure is similar to the target detection method described in the embodiment of the disclosure, the implementation of the device can be referred to The implementation of the method will not repeat the repetition.

Referring to FIG. 12, it is a schematic diagram of a target detection device 1200 provided by an embodiment of the present disclosure. The device includes: an acquisition part 1201, a determination part 1202, and a detection part 1203.

Wherein, the acquiring part 1201 is configured to acquire the image to be detected;

The determining part 1202 is configured to determine, based on the image to be detected, the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, where the corner points represent the target object in the image to be detected Location;

The detection part 1203 is configured to determine the target object in the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point.

In a possible implementation manner, the determining part 1202 is configured to:

Perform feature extraction on the image to be detected to obtain an initial feature map corresponding to the image to be detected;

Perform corner pooling processing on the initial feature map to obtain a feature map after corner pooling;

Based on the feature map after corner pooling, the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point are determined.

In a possible implementation manner, when the determining part 1202 is configured to determine the corner position information of each corner in the image to be detected based on the feature map after corner pooling, the method includes:

Based on the feature map after corner pooling, generate the corner heat map corresponding to the image to be detected;

Based on the corner heat map, determine the probability value of each feature point in the corner heat map as a corner point, and filter out the corner points from the feature points based on the probability value of each feature point as a corner point;

Obtain the position information of the selected corner points in the corner point heat map and the local offset information corresponding to each corner point. The local offset information is used to indicate that the real physical point represented by the corresponding corner point is in the corner heat map Position offset information in;

Based on the acquired position information of each corner point in the corner point heat map, the local offset information corresponding to each corner point, and the size ratio between the corner point heat map and the image to be detected, it is determined that each corner point is in the to-be-detected image. The position information of the corner points in the image.

In a possible implementation manner, when the determining part 1202 is configured to determine the centripetal offset tensor corresponding to each corner point based on the feature map after corner point pooling, the method includes:

Based on the feature map after corner pooling, determine the steering offset tensor corresponding to each feature point in the feature map after corner pooling, and the steering offset tensor corresponding to each feature point is represented by the feature point. The offset tensor of the center point of the target object in the image to be detected;

Based on the steering offset tensor corresponding to each feature point, determine the offset domain information of the feature point; the offset domain information contains multiple initial feature points associated with the feature point respectively pointing to their corresponding offset feature points The offset tensor;

Based on the corner point pooled feature map and the offset domain information of the feature points in the corner point pooled feature map, the feature data of the feature points in the corner point pooled feature map are adjusted to obtain The adjusted feature map;

In a possible implementation, the corner heat map corresponding to the image to be detected includes a corner heat map corresponding to multiple channels, and each channel of the multiple channels corresponds to a preset object category; the determining part 1202 is in After being configured to determine the probability value of each feature point in the corner heat map as a corner point based on the corner heat map, it is also configured to:

For each channel in the multiple channels, determine whether there is a corner point in the corner heat map corresponding to the channel based on the probability value of each feature point in the corner heat map corresponding to the channel as a corner point;

When there are corner points in the corner point heat map corresponding to the channel, it is determined that the image to be detected contains the target object of the preset object category corresponding to the channel.

In a possible implementation manner, the detection part 1203 is configured to:

Determine the detection frame of the target object in the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point;

In a possible implementation, the detection part 1203 is configured to determine the target object in the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point. The case of the detection box includes:

Based on the corner position information of each corner in the image to be detected, filter candidate corner pairs that can form a candidate detection frame;

Based on the corner position information of each corner point in the image to be detected in each candidate corner point pair and the centripetal offset tensor corresponding to the corner point, determine the center point position information pointed to by the corner point;

Determine the center area information corresponding to the candidate corner point pair based on the corner point position information of each corner point in each candidate corner point pair in the image to be detected;

Based on the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair, the detection frame of the target object is determined in the candidate detection frame.

In a possible implementation manner, for each candidate corner point pair, the detecting part 1203 is configured to be based on the corner position information of each corner point in each candidate corner point pair in the image to be detected. , In the case of determining the central region information corresponding to the candidate corner point pair, it includes:

Based on the corner position information of the central area frame, the coordinate range of the central area frame corresponding to the candidate corner point pair is determined.

In a possible implementation manner, the detection part 1203 is configured to be based on the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair. When the detection frame determines the detection frame of the target object, it includes:

Based on the position information of the center point pointed to by each corner point in the valid candidate corner point pair, the central area information corresponding to the valid candidate corner point pair, and the probability value corresponding to each corner point in the valid candidate corner point pair, each valid The score value of the candidate detection frame corresponding to the candidate corner point; the probability value corresponding to each corner point is used to indicate the probability value of the corresponding feature point of the corner point in the corner heat map as the corner point;

In a possible implementation manner, the detection part 1203 is further configured to:

After the detection frame of the target object in the image to be detected is determined, the instance information of the target object in the image to be detected is determined based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected.

In a possible implementation, the detection part 1203 is configured to determine the instance information of the target object in the image to be detected based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected, including :

Based on the detection frame and the initial feature map of the target object, extract the feature data of the feature points of the initial feature map in the detection frame;

In a possible implementation manner, the target detection device 1200 further includes a neural network training part 1204, and the neural network training part 1204 is configured to:

Train a neural network for target detection. The neural network is trained using sample images that contain labeled target sample objects.

In a possible implementation manner, the neural network training part 1204 is configured to train the neural network according to the following steps:

Obtain sample images;

Based on the sample image, determine the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point, and the sample corner point represents the position of the target sample object in the sample image;

Predict the target sample object in the sample image based on the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point;

Based on the predicted target sample object in the sample image and the labeled target sample object in the sample image, the network parameter values of the neural network are adjusted.

In the embodiments of the present disclosure and other embodiments, "parts" may be parts of circuits, parts of processors, parts of programs or software, etc., of course, may also be units, modules, or non-modular.

Corresponding to the target detection method in FIG. 2, an embodiment of the present disclosure further provides an electronic device 1300. As shown in FIG. 13, a schematic structural diagram of the electronic device 1300 provided by the embodiment of the present disclosure includes:

The processor 1301, the memory 1302, and the bus 1303; the memory 1302 is configured to store execution instructions, including the memory 13021 and the external memory 13022; the memory 13021 here is also called internal memory, and is configured to temporarily store the calculation data in the processor 1301, As well as data exchanged with external storage 13022 such as a hard disk, the processor 1301 exchanges data with the external storage 13022 through the memory 13021, and when the electronic device 1300 is running, the processor 1301 and the storage 1302 communicate through the bus 1303, which is machine readable When the instruction is executed by the processor 1301, the following processing is performed:

Obtain the image to be detected;

Based on the image to be detected, determine the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, and the corner point represents the position of the target object in the image to be detected;

Based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, the target object in the image to be detected is determined.

The embodiments of the present disclosure also provide a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and the computer program executes the steps of the target detection method in the foregoing method embodiment when the computer program is run by a processor. Wherein, the storage medium may be a volatile or non-volatile computer readable storage medium.

The embodiments of the present disclosure also provide a computer program, including computer-readable code, and when the computer-readable code runs in an electronic device, the processor in the electronic device executes the same as described in the first aspect. The steps of the target detection method described.

The computer program product of the target detection method provided by the embodiment of the present disclosure includes a computer-readable storage medium storing program code, and the program code includes instructions that can be used to execute the steps of the target detection method described in the above method embodiment For details, please refer to the above method embodiment, which will not be repeated here.

The embodiments of the present disclosure also provide a computer program, which, when executed by a processor, implements any one of the methods in the foregoing embodiments. The computer program product can be specifically implemented by hardware, software, or a combination thereof. In some embodiments, the computer program product is specifically embodied as a computer storage medium. In other embodiments, the computer program product is specifically embodied as a software product, such as a software development kit (SDK) and so on.

Those skilled in the art can clearly understand that, for convenience and concise description, the working process of the system and device described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, device, and method may be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation. For example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be through some communication interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code .

Finally, it should be noted that the above-mentioned embodiments are only specific implementations of the present disclosure, which are used to illustrate the technical solutions of the present disclosure, rather than limit it. The protection scope of the present disclosure is not limited to this, although referring to the foregoing The embodiments describe the present disclosure in detail, and those of ordinary skill in the art should understand that any person skilled in the art can still modify the technical solutions described in the foregoing embodiments within the technical scope disclosed in the present disclosure. Or it can be easily conceived of changes, or equivalent replacements of some of the technical features; and these modifications, changes or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present disclosure, and should be covered by the present disclosure. Within the scope of protection. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Industrial applicability

The embodiments of the present disclosure provide a target detection method, device, electronic equipment, and computer-readable storage medium. The target detection method includes: acquiring an image to be detected; and determining that each corner point is in the image based on the image to be detected. The position information of the corner points in the image to be detected and the centripetal offset tensor corresponding to each corner point. The corner points represent the position of the target object in the image to be detected; based on the angle of each corner point in the image to be detected The point position information and the centripetal offset tensor corresponding to each corner point are used to determine the target object in the image to be detected. The target detection method proposed in the embodiments of the present disclosure can determine the corner points belonging to the same target object based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, and then based on The determined corner point can detect the same target object.

Claims

A target detection method includes:

Obtain the image to be detected;

Based on the image to be detected, determine the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, and the corner point represents the position of the target object in the image to be detected ；

Based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, the target object in the image to be detected is determined.
The target detection method according to claim 1, wherein the determining, based on the image to be detected, the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point comprises :

Performing feature extraction on the image to be detected to obtain an initial feature map corresponding to the image to be detected;

Performing corner pooling processing on the initial feature map to obtain a feature map after corner pooling;

Based on the feature map after the corner point pooling, the corner point position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point are determined.
The target detection method according to claim 2, wherein the determining the corner position information of each corner point in the image to be detected based on the feature map after the corner point pooling comprises:

Generating a corner heat map corresponding to the image to be detected based on the feature map after corner pooling;

Based on the corner point heat map, determine the probability value of each feature point in the corner point heat map as a corner point, and based on the probability value of each feature point as a corner point, from the feature point of the corner point heat map To filter out the corner points;

Obtain the position information of the selected corner points in the corner point heat map and the local offset information corresponding to each corner point. The local offset information is used to indicate that the real physical point represented by the corresponding corner point is at Position offset information in the corner point heat map;

Based on the acquired position information of each corner point in the corner point heat map, the local offset information corresponding to each corner point, and the size ratio between the corner point heat map and the image to be detected, determine each The position information of the corner point in the image to be detected.
The target detection method according to claim 2 or 3, wherein the determining the centripetal offset tensor corresponding to each corner point based on the feature map after the corner point pooling comprises:

Based on the feature map after corner point pooling, the steering offset tensor corresponding to each feature point in the corner point pooling feature map is determined, and the steering offset tensor corresponding to each feature point is represented by The offset tensor of the feature point pointing to the center point of the target object in the image to be detected;

Based on the steering offset tensor corresponding to each feature point, determine the offset domain information of the feature point; the offset domain information includes multiple initial feature points associated with the feature point respectively pointing to their corresponding offsets The offset tensor of the feature point after the shift;

Based on the corner point pooled feature map and the offset domain information of the feature points in the corner point pooled feature map, the feature data of the feature points in the corner point pooled feature map Make adjustments to obtain the adjusted feature map;

Based on the adjusted feature map, the centripetal offset tensor corresponding to each corner point is determined.
The target detection method according to claim 3, wherein the corner heat map corresponding to the image to be detected includes a corner heat map corresponding to a plurality of channels, and each channel of the plurality of channels corresponds to a preset Set the object category; after determining the probability value of each feature point in the corner heat map as a corner point based on the corner heat map, the detection method further includes:

For each channel of the plurality of channels, determine whether the corner point exists in the corner heat map corresponding to the channel based on the probability value of each feature point in the corner heat map corresponding to the channel as a corner point;

When the corner point exists in the corner point heat map corresponding to the channel, it is determined that the image to be detected contains the target object of the preset object category corresponding to the channel.
The target detection method according to claim 1, wherein said determining the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point The target audience in includes:

Based on the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, the detection frame of the target object in the image to be detected is determined.
The target detection method according to claim 6, wherein said determining the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point The detection frame of the target object in the middle, including:

Based on the corner position information of each corner point in the image to be detected, screening candidate corner point pairs that constitute a candidate detection frame;

Determine the position information of the center point pointed to by the corner point based on the corner position information of each corner point in the image to be detected in each candidate corner point pair and the centripetal offset tensor corresponding to the corner point;

Determine the center area information corresponding to the candidate corner point pair based on the corner point position information of each corner point in the candidate corner point pair in the image to be detected;

The detection frame of the target object is determined in the candidate detection frame based on the location information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair.
8. The target detection method according to claim 7, wherein said determining the center corresponding to each candidate corner point pair based on the corner point position information of each corner point in the image to be detected in each candidate corner point pair Regional information, including:

Based on the corner point position information of each corner point of the candidate corner point pair, determine the corner point position information that characterizes the center area frame corresponding to the candidate corner point pair;

Based on the corner point position information of the central area frame, the coordinate range of the central area frame corresponding to the candidate corner point pair is determined.
The target detection method according to claim 7 or 8, wherein the position information of the center point pointed to by each corner point in each candidate corner point pair and the center area information corresponding to the candidate corner point pair are based on The determination of the detection frame of the target object in the candidate detection frame includes:

Determine a valid candidate corner point pair based on the position information of the center point pointed to by each corner point in each candidate corner point pair, and the center area information corresponding to the candidate corner point pair;

Based on the position information of the center point pointed to by each corner point in the valid candidate corner point pair, the central area information corresponding to the valid candidate corner point pair, and the probability corresponding to each corner point in the valid candidate corner point pair Value, determine the score of the candidate detection frame corresponding to each valid candidate corner point; the probability value corresponding to each corner point is used to indicate the probability value of the corresponding feature point of the corner point in the corner heat map as the corner point ；

Based on the score of the candidate detection frame corresponding to each valid candidate corner point and the size of the overlapping area between adjacent candidate detection frames, the detection frame of the target object is determined in the candidate detection frame.
The target detection method according to claim 6, wherein, after the determination of the detection frame of the target object in the image to be detected, the target detection method further comprises:

The instance information of the target object in the image to be detected is determined based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected.
The target detection method according to claim 10, wherein the target object in the image to be detected is determined based on the detection frame of the target object and the initial feature map obtained by feature extraction of the image to be detected Instance information, including:

Extracting feature data of feature points of the initial feature map in the detection frame based on the detection frame of the target object and the initial feature map;

Based on the feature data of the feature points of the initial feature map in the detection frame, the instance information of the target object in the image to be detected is determined.
The target detection method according to any one of claims 1 to 11, wherein the target detection method is implemented by a neural network, and the neural network is obtained by training using sample pictures containing labeled target sample objects.
The target detection method according to claim 12, wherein the neural network is obtained by training in the following steps:

Obtain sample images;

Based on the sample image, determine the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point, and the sample corner point represents the target sample object in the sample image. Location;

Predicting the target sample object in the sample image based on the corner position information of each sample corner point in the sample image and the centripetal offset tensor corresponding to each sample corner point;

The network parameter value of the neural network is adjusted based on the predicted target sample object in the sample image and the labeled target sample object in the sample image.
A target detection device includes:

The obtaining part is configured to obtain the image to be detected;

The determining part is configured to determine, based on the image to be detected, the corner position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point, where the corner points represent the image to be detected The location of the target object in

The detection part is configured to determine the target object in the image to be detected based on the position information of each corner point in the image to be detected and the centripetal offset tensor corresponding to each corner point.
An electronic device includes a processor, a memory, and a bus. The memory stores machine-readable instructions executable by the processor. When the electronic device is running, the processor and the memory pass through Bus communication, when the machine-readable instructions are executed by the processor, the steps of the target detection method according to any one of claims 1 to 13 are executed.
A computer-readable storage medium having a computer program stored on the computer-readable storage medium, which executes the steps of the target detection method according to any one of claims 1 to 13 when the computer program is run by a processor.
A computer program, comprising computer-readable code, when the computer-readable code runs in an electronic device, the processor in the electronic device executes the method described in any one of claims 1 to 13 when executed method.