WO2020134528A1

WO2020134528A1 - Target detection method and related product

Info

Publication number: WO2020134528A1
Application number: PCT/CN2019/114330
Authority: WO
Inventors: 陈乐�; 刘海军; 顾鹏
Original assignee: 深圳云天励飞技术有限公司
Priority date: 2018-12-29
Filing date: 2019-10-30
Publication date: 2020-07-02
Also published as: CN109815843B; CN109815843A

Abstract

A target detection method and a related product. The method comprises: obtaining an image to be processed (101); inputting the image to be processed into a preset convolutional neural network to obtain M first-type frames, wherein each first-type frame corresponds to a score, and M is an integer greater than 1 (102); sorting the M first-type frames in the descending order of the scores of the M first-type frames (103); setting the mask of all the frames to be 1, selecting one of the sorted M first-type frames as a target frame, and setting the mask of the target frame to be 0 (104); determining an overlapping area between an ith frame and the target frame, wherein the ith frame is any one frame with the mask being 1 (105); and when the overlapping area is greater than a preset threshold, setting the mask of the ith frame to be 0 (106). By means of the method, the complexity of calculation can be reduced, and an NMS operation time can be shortened.

Description

Target detection methods and related products

This application requires the priority of the Chinese patent application filed on December 29, 2018, with the application number 201811645347.6 and the invention titled "Target Detection Method and Related Products", the entire contents of which are incorporated by reference in this application.

Technical field

This application relates to the field of target detection technology, and in particular to a target detection method and related products.

Background technique

With the rapid development of electronic technology, electronic devices (such as mobile phones, tablet computers, etc.) are becoming more and more intelligent. For example, electronic devices can take pictures and realize target detection. However, non-maximum suppression is often used in detection algorithms The (non-maximum suppression (NMS) method) filters out overlapping frames (an object detected is a frame). However, due to its inherent iterative-traversal-elimination algorithmic nature, the NMS algorithm needs to be traversed one by one, with a large number of iterations and high computational complexity.

Summary of the invention

The embodiments of the present application provide a target detection method and related products, which can reduce the number of iterations and reduce the calculation complexity.

The first aspect of the embodiments of the present application provides a target detection method, which is applied to an electronic device and includes:

Get the image to be processed;

Input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each first-type frame corresponds to a score, and M is an integer greater than 1;

Sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low;

Set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0;

Determine the overlapping area between the i-th frame and the target frame, the i-th frame is any frame whose mask is 1;

When the overlapping area is greater than a preset threshold, the mask of the i-th frame is set to 0.

A second aspect of an embodiment of the present application provides a target detection device, including:

An acquisition unit for acquiring an image to be processed;

An input unit, configured to input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each first-type frame corresponds to a score, and M is an integer greater than 1;

A sorting unit, configured to sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low;

The selection unit is used to set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0;

A determining unit, configured to determine an overlapping area between the i-th frame and the target frame, and the i-th frame is any frame whose mask is 1;

The setting unit is configured to set the mask of the i-th frame to 0 when the overlapping area is greater than a preset threshold.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor, The above program includes instructions for performing the steps in the first aspect of the embodiments of the present application.

According to a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes the computer to execute the first embodiment of the present application. Part or all of the steps described in one aspect.

In a fifth aspect, an embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as implemented in the present application Examples of some or all of the steps described in the first aspect. The computer program product may be a software installation package.

The implementation of the embodiments of the present application has the following beneficial effects:

It can be seen that through the target detection method and related products described in the embodiments of the present application, the image to be processed is obtained, and the image to be processed is input to a preset convolutional neural network to obtain M first-type frames, each of the first-type frames The box corresponds to a score, M is an integer greater than 1, the M first-type boxes are sorted according to the order of the scores of each box in the M first-type boxes from high to low, and the mask of all boxes is set to 1, after sorting Select one of the M first-class boxes as the target box, set the mask of the target box to 0, determine the overlap area between the i-th box and the target box, and the i-th box is any box with a mask of 1, When the overlap area is greater than the preset threshold, set the mask of the i-th frame to 0. During the target detection process, some frames can be filtered out, which can reduce the number of iterations and reduce the computational complexity.

BRIEF DESCRIPTION

In order to more clearly explain the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings based on these drawings without creative work.

1A is a schematic flowchart of an embodiment of a target detection method provided by an embodiment of the present application;

FIG. 1B is a schematic diagram of a block provided by an embodiment of the present application;

FIG. 1C is a schematic diagram illustrating the overlapping area of the frame provided by the embodiment of the present application;

2 is a schematic flowchart of another embodiment of a target detection method provided by an embodiment of the present application;

3A is a schematic structural diagram of an embodiment of a target detection device provided by an embodiment of the present application;

FIG. 3B is a schematic structural diagram of another embodiment of a target detection device provided by an embodiment of the present application;

4 is a schematic structural diagram of an embodiment of an electronic device provided by an embodiment of the present application.

detailed description

It should be noted that the electronic device in the embodiment of the present application can be connected to multiple cameras, each camera can be used to capture video images, and each camera can have a corresponding position mark, or, there can be a The corresponding number. Generally, cameras can be installed in public places, such as schools, museums, intersections, pedestrian streets, office buildings, garages, airports, hospitals, subway stations, stations, bus platforms, supermarkets, hotels, entertainment venues, and so on. After the camera captures the video image, the video image can be saved to the memory of the system where the electronic device is located. Multiple image libraries can be stored in the memory, and each image library can contain different video images of the same person. Of course, each image library can also be used to store video images of an area or video images taken by a specified camera.

Further optionally, in the embodiment of the present application, each frame of video image captured by the camera corresponds to one piece of attribute information, and the attribute information is at least one of the following: the shooting time of the video image, the position of the video image, and the attribute parameters of the video image ( Format, size, resolution, etc.), the number of the video image, and the character attributes of the video image. The character characteristic attributes in the video image may include, but are not limited to: the number of characters in the video image, the position of the character, the angle value of the character, the age, the image quality, and so on.

It should be further noted that the video image collected by each camera is usually a dynamic face image. Therefore, in this embodiment of the present application, the angle value information of the face image may be planned. The above angle value information may include but is not limited to: horizontal Rotation angle value, pitch angle or inclination. For example, it can be defined that the dynamic face image data requires that the distance between the eyes is not less than 30 pixels, and more than 60 pixels is recommended. The horizontal rotation angle value does not exceed ±30°, the pitch angle does not exceed ±20°, and the tilt angle does not exceed ±45°. It is recommended that the horizontal rotation angle value should not exceed ±15°, the pitch angle should not exceed ±10°, and the tilt angle should not exceed ±15°. For example, you can also filter whether the face image is blocked by other objects. Generally, accessories should not cover the main area of the face. Accessories such as dark sunglasses, masks and exaggerated jewelry, of course, may also be covered with dust on the camera , Causing the face image to be blocked. The image formats of the video images in the embodiments of the present application may include, but are not limited to: BMP, JPEG, JPEG2000, PNG, etc., the size of which may be between 10-30KB, each video image may also correspond to a shooting time, and shooting The camera's unified number of the video image, the link of the panoramic large image corresponding to the face image, and other information (the face image and the global image establish a characteristic correspondence file).

In the embodiment of the present application, the requirements on the device are very low, and only a single camera capable of capturing RGB images or videos is needed to complete the data collection and point cloud generation, and then send the point cloud data and the original RGB images to the subsequent package. Three-dimensional reconstruction of the scene can be achieved in the process. The scene 3D reconstruction technology based on single camera depth of field prediction can be divided into: video stream acquisition, image preprocessing, depth feature extraction and scene depth map generation, depth map-based point cloud data generation, RGB image and point cloud data matching fusion, 3D Six modules are generated on the surface of the object. Among them, video stream acquisition, image pre-processing, RGB image matching with point cloud data, and 3D object surface generation technology are relatively mature. This application can optimize the method of generating point cloud data from the scene, greatly reducing its equipment and computing. Ability requirements.

Please refer to FIG. 1A, which is a schematic flowchart of an embodiment of a target detection method provided by an embodiment of the present application. The target detection method described in this embodiment includes the following steps:

101. Acquire an image to be processed.

Among them, the embodiment of the present application is applied to electronic equipment, specifically, it can be applied to target detection, and the image to be processed may be an image including a target, and the target may be at least one of the following: people, animals, license plates, vehicles, buildings Things, etc., are not limited here.

The image to be processed may be captured by a camera, and the image to be processed may be designated by a user or captured by the camera.

Optionally, in step 101 above, acquiring the target face image may include the following steps:

11. Obtain target environmental parameters;

12. Determine the target shooting parameters corresponding to the target environment parameters according to the mapping relationship between the preset environment parameters and the shooting parameters;

13. Shoot according to the target shooting parameters to obtain the image to be processed.

In the embodiment of the present application, the environmental parameters may include at least one of the following: temperature, humidity, location, magnetic field interference intensity, weather, ambient light brightness, number of ambient light sources, etc., which are not limited herein. The above environmental parameters can be collected by environmental sensors, which can be integrated into electronic devices. The environmental sensor may be at least one of the following: a temperature sensor, a humidity sensor, a positioning device, a magnetic field detection sensor, a processor, an ambient light sensor, a color sensor, etc., which are not limited herein, for example, a temperature sensor may be used to detect temperature, The humidity sensor can be used to detect humidity, the global positioning system GPS can be used to detect position, the magnetic field detection sensor can be used to detect magnetic field strength, and the processor can be used to obtain weather (for example, a weather APP is installed in an electronic device and obtained through the weather APP Weather), the ambient light sensor can be used to detect ambient brightness, the color sensor can be used to detect the number of ambient light sources, and so on.

Further, the shooting parameters may be at least one of the following: exposure duration, shooting mode (such as seascape mode, desert mode, night scene mode, panorama mode, etc.), sensitivity ISO, focal length, object distance, aperture size, etc., here No limitation.

In addition, the mapping relationship between the preset environmental parameters and the shooting parameters can also be pre-stored in the electronic device. The following provides a mapping relationship between the environmental parameters and the shooting parameters, as follows:

环境参数Environmental parameters	拍摄参数Shooting parameters
环境参数1 Environmental parameters 1	拍摄参数1 Shooting parameters 1
环境参数2 Environmental parameters 2	拍摄参数2 Shooting parameters 2
......	......
环境参数nEnvironmental parameter n	拍摄参数nShooting parameters n

In a specific implementation, the electronic device can obtain the target environmental parameters, and then, according to the mapping relationship between the preset environmental parameters and the shooting parameters, determine the target shooting parameters corresponding to the target environmental parameters, and shoot according to the target shooting parameters to obtain pending processing The image, in this way, can get an image suitable for the environment, improving the monitoring efficiency.

102. Input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each of the first-type frames corresponds to a score, and M is an integer greater than 1.

Among them, the above-mentioned preset convolutional neural network may be preset. The electronic device can input the image to be processed into the preset convolutional neural network to obtain M first-type frames, and each first-type frame corresponds to a score. The score can be understood as the probability that the corresponding frame has a target, the higher the score , The area where the frame is located is more likely to be the target. The above M is an integer greater than 1. In a specific implementation, two coordinates corresponding to the diagonal of each frame in the M first-type frames may be taken, and the two The coordinates are used to mark the frame. As shown in FIG. 1B, FIG. 1B shows a box, the dotted line represents the diagonal of the box, and (x _0a , y _0a ), (x _1a , y _1a ) represents the two vertices corresponding to the diagonal .

103. Sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low.

In a specific implementation, the electronic device obtains the score of each frame in the M first-type frames, and sorts the M first-type frames according to the score of each frame in the M first-type frames. Specifically, The M first-type frames can be sorted in order from high to low.

104. Set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0.

In the embodiment of the present application, mask is a mask. When mask=1 is set for an image, the pixel value of all pixels of the image is 1, when mask=1 is set for a certain pixel in an image , The pixel value of the pixel is 1, and the electronic device can select any frame from the sorted M first-type frames as the target frame. Of course, this any frame is not the last frame in the sort, and set all the frames Is mask=1, that is, the pixel value of the pixels of all frames is 1, so that the area can be calculated later. After selecting the target frame, you can set the mask of the target frame to 0, that is, the pixel value of all pixels in the target frame 0.

Optionally, in the above step 104, selecting one frame from the M first-type frames after sorting as the target frame may be implemented as follows:

A frame with the highest score is selected from the M first-class frames after sorting as the target frame.

Among them, the electronic device may select the frame with the highest score from the sorted M first-class frames as the target frame.

105. Determine the overlapping area between the i-th frame and the target frame, where the i-th frame is any frame whose mask is 1.

The electronic device can calculate the overlapping area between the i-th frame and the target frame, specifically, the number of overlapping pixels between the two. The i-th frame is any frame whose mask is 1, Of course, any frame with a mask of 1 other than the target frame and the score ranking after the target frame in the M first-type frames may also be used.

106. When the overlapping area is greater than a preset threshold, set the mask of the i-th frame to 0.

The above-mentioned preset threshold can be set by the user or the system default. The electronic device can set the mask of the i-th frame to 0 when the overlapping area is greater than the preset threshold, which is equivalent to filtering out the i-th frame, otherwise, the i-th frame can be retained and i=i+1 is executed, Repeating steps 105-106, the remaining frames can be deduplicated using NMS to obtain at least one frame, and the corresponding area is used as the target image, that is, the image that ultimately represents the area where the target is located. Conversely, when the overlap area is less than the preset threshold, the mask of the i-th box can be kept as 1.

Optionally, the electronic device includes a vector register. After the above step 106, the electronic device may further include the following steps:

A1. Use the scalar register to calculate the area value of the target box;

A2. Use the preset vector register to take a second-type frame of a preset dimension, where the second-type frame is a vector frame corresponding to the i-th frame;

A3. Calculate the target overlap area between the second type frame and the target frame using a vector operation method, and the target overlap area is a vector;

A4. Calculate the vector area of the frame of the second type using a vector operation method;

A5. Determine a preset comparison formula according to the target overlap area, the vector area, and the preset threshold, and set the corresponding mask of the second type frame to 0 according to the preset comparison formula.

The above-mentioned preset dimensions can be set by the user or the system default. In the embodiment of the present application, the electronic device may take 64/32/16 (related to the capabilities of the vector processor) second type frame through the vector register, that is, the preset dimension may be 64, 32 or 16, and the electronic device may use Set the vector register to take the second-type box of the preset dimension. The second-type box is a vector box. Specifically, the vector box corresponding to the i-th box. Specifically, expand the parameters (such as area) of the i-th box ( Copy) is the preset dimension.

For the understanding of the overlapping area, as shown in FIG. 1C, in FIG. 1C, the black area represents the overlapping area between the two, 1, 2 are the coordinates of two vertices of one diagonal of a frame, and 3, 4 are the other. The coordinates of the two vertices of a diagonal line of the frame, based on the four

vertices

1, 2, 3, and 4, the overlapping area between the two frames can be calculated.

Further, the electronic device can determine the target overlap area between the second type frame and the target frame, and the target overlap area is a vector. Further, similarly, based on this principle, the vector area of the second type frame can be calculated, specifically , Calculate the vector area of the second type of box according to the following formula:

Where, S _B represents the vector area of the second type frame, and (X _0B , Y _0B ) and (X _1B , Y _1B ) are the coordinates of two vertices of a diagonal line of the second type frame.

Further, the electronic device determines a preset comparison formula according to the target overlap area, vector area, and preset threshold, and sets the mask of the i-th box to 0 according to the preset comparison formula.

Optionally, in the above step A1, calculating the area value of the target frame may be implemented as follows:

Calculate the area value of the target frame according to the following formula:

_{_{Where, (x 0a, y 0a)}} , (x 1a, y 1a) of the target frame coordinates of two vertices of a diagonal line, s _a frame area of the target value is a scalar +.

Optionally, in the above step A3, calculating the target overlapping area between the second type frame and the target frame using a vector operation method may be implemented as follows:

Calculate the target overlap area between the second type frame and the target frame according to the following formula:

Where (x _0a , y _0a ), (x _1a , y _1a ) are the coordinates of two vertices of a diagonal line of the target frame, (X _0B , Y _0B ), (X _1B , Y _1B ) are all The two vertex coordinates of a diagonal line of the second type frame, S _overlap represents the target overlap area between the second type frame and the target frame, combined with FIG. 1C, (x _0a , y _0a ), ( x _1a , y _1a ) can be regarded as the vertices of a box in Figure 1C, (X _0B , Y _0B ), (X _1B , Y _1B ) can be regarded as the vertices of another box in Figure 1C, based on the four vertices Calculate the area of overlap between the two boxes.

Optionally, in the above step A5, a preset comparison formula is determined according to the target overlap area, the vector area and the preset threshold, and the second type box is mapped according to the preset comparison formula The mask is set to 0, which can be implemented as follows:

The preset comparison formula is constructed as follows:

(s _a +S _B -S _overlap )*thres, where s _a is a vector and is obtained by scalar s _a vectorization. Specifically, the area is expanded (copied) to a preset dimension, and the number of s _a dimensions is S _{overlap has} the same number of dimensions, where thres is the preset threshold, and S _B represents the vector area of the second type of frame;

Compare S _overlap with (s _a +S _B -S _overlap )*thres, specifically: the jth element of S _overlap and (s _a +S _B -S _overlap )*thres(s _a +S _B- S _overlap ) *thres The corresponding jth element is compared, if it is greater, the mask of the jth element of the second type frame is set to 0, otherwise, the jth element of the second type frame is set The mask of is kept at 1, and j is any element position in S _overlap .

The preset comparison formula is constructed as follows:

_{_{min (s a, S B)}} * thres, wherein, s _a is a vector, and the resulting scalar s _a vector processing, in particular, the area of the expanded (copied) as the default dimension, s _a dimension number of S _overlap The number of dimensions is the same, where thres is the preset threshold, and S _B represents the vector area of the second type of frame;

S _{overlap is} compared with min(s _a , S _B )*thres, specifically: the k-th element of S _overlap is compared with the corresponding k-th element in min(s _a , S _B )*thres, if If it is greater, the mask of the k-th element of the second type frame is set to 0, otherwise, the mask of the k-th element of the second type frame is kept at 1, and k is any element position in S _overlap .

For example, for any frame, as shown in FIG. 1B, taking the coordinates (x _0a , y _0a ) and (x _1a , y _1a ) of the two vertices of the diagonal line of the frame in FIG. 1B, the frame can be Recorded as (x _0a , y _0a , x _1a , y _1a ), respectively corresponding to the coordinates of the upper left corner of the image and the coordinates of the lower right corner (you can default the coordinates of the point in the upper left corner of the image to (0,0)), each box corresponds For a score, you can perform the following steps:

1. Sort the M boxes from large to small according to the score;

2. Set a mask for each frame and initialize to 1;

3. If the mask is 1, the frame a (x _0a , y _0a , x _1a , y _1a ) with the highest score is obtained. If it cannot be obtained (the masks are all 0), the NMS is completed; if it can be obtained, the mask is set after it is taken Is 0, this box is the one that satisfies the condition is saved in the result, and the area s _{a of} box _a

4. Use the vector register to take 64/32/16 (related to the capabilities of the vector processor) box B (X _0B , Y _0B , Y _1B , Y _1B ), calculate the overlap area S _{overlap of} B and a, each box B The area S _B ;

Note: The above S _overlap and S _B are vectors.

5. Determine whether the preset threshold thres is met (convert division to multiplication), and the mask that does not exceed the threshold is set to 0;

You can use the following two comparison methods, union and min, the specific choice can be determined by the user.

The union situation is as follows:

Vector comparison S _overlap and (s _a +S _B -S _overlap )*thres;

If an element in S _overlap is greater than the corresponding element in (s _a +S _B -S _overlap )*thres, the corresponding position of the element in the corresponding frame is set to mask 0, otherwise, the corresponding position of the element in the corresponding frame Set mask=1;

The min situation is as follows:

Vector comparison S _overlap and min(s _a , S _B )*thres;

If an element in S _overlap is greater than min(s _a , S _B )*thres, the corresponding position of the element in the corresponding frame is set to mask 0, otherwise, the corresponding position of the element in the corresponding frame is set to mask=1;

6. Repeat 4 and 5 until all the frames after a are traversed;

7. Return to step 3.

Optionally, after the above step 106, the following steps may also be included:

The remaining frames are used in the non-maximum suppression operation to obtain at least one frame, and the area corresponding to the at least one frame is used as the target image.

In this way, by reducing the number of frames, in addition, NMS operation efficiency can also be improved.

Further optionally, when the target image includes a face image, the above steps may include the following steps after using the area corresponding to the at least one frame as the target image:

B1. Perform feature point extraction on the target image to obtain a target feature point set;

B2. Determine the distribution density of the target feature points of the target image according to the target feature point set;

B3. Determine the target matching threshold corresponding to the target feature point distribution density according to the preset mapping relationship between the distribution density of the feature points and the matching threshold;

B4. Perform a search in a preset database according to the target matching threshold and the target image to obtain a target object that successfully matches the target image.

Wherein, the mapping relationship between the distribution density of the preset feature points and the matching threshold may be pre-stored in the electronic device, and the preset database may also be established in advance, and the preset database includes at least one face image. In a specific implementation, the electronic device can extract feature points from the target image to obtain a target feature point set. Based on the target feature point set, the target feature point distribution density of the target image can be determined. The target feature point distribution density = the target feature point set Number/area of the target image, further, the target matching threshold corresponding to the distribution density of the target feature points can be determined according to the above mapping relationship, and according to the target matching threshold, the target image can be searched in a preset database to obtain a match with the target image A successful target object, that is, when the matching value between the target image and the face image of the target object is greater than the target matching threshold, the two can be considered as a successful match. In this way, the matching threshold can be dynamically adjusted to improve retrieval efficiency.

Further, in the above step B4, searching in a preset database according to the target matching threshold and the target image to obtain a target object successfully matched with the target image may include the following steps:

B41. Perform contour extraction on the target image to obtain the peripheral contour of the target;

B42. Match the target feature point set with the feature point set of the face image x to obtain a first matching value, and the face image x is any face image in the preset database;

B43. Match the peripheral contour of the target with the peripheral contour of the face image x to obtain a second matching value;

B44. Obtain the first weight value corresponding to the feature point set and the second weight value corresponding to the peripheral contour;

B45. Perform a weighted operation according to the first matching value, the second matching value, the first weight value, and the second weight value to obtain a target matching value;

B46. When the target matching value is greater than the target matching threshold, confirm that the face image x is a target object;

B47. When the target matching value is less than or equal to the target matching threshold, confirm that the face image x is not the target object.

Among them, in a specific implementation, the electronic device can extract the contour of the target image to obtain the contour of the target periphery, and can match the target feature point set with the feature point set of the face image x to obtain the first matching value. For any face image in the preset database, you can match the outer contour of the target with the outer contour of the face image x to obtain the second matching value, obtain the first weight corresponding to the feature point set, and the third corresponding to the outer contour Two weights, both the first weight and the second weight can be set in advance, the first weight + the second weight = 1, and further, the target matching value = the first matching value * the first weight + the second matching Value*second weight value, when the target matching value is greater than the target matching threshold, confirm that the face image x is the target object, otherwise, when the target matching value is less than or equal to the target matching threshold, confirm that the face image x is not the target object, In this way, face recognition can be realized more accurately.

Optionally, the following steps may also be included between the above steps 102 and 103:

C1. Perform image segmentation on the image to be processed to obtain at least one target area;

C2. Determine the overlapping area of each frame of the M frames and the at least one target area to obtain multiple overlapping areas;

C3. Select an overlapping area greater than a preset area value from the plurality of overlapping areas to obtain N overlapping areas, and obtain N frames corresponding to the N overlapping areas, where N is an integer less than or equal to the M;

Then, in the above step 103, sorting the M first-type frames according to the score of each frame in the M first-type frames may be implemented as follows:

The N first-type frames are sorted according to the score of each of the N first-type frames.

Among them, the above-mentioned preset area value can be set by the user or the system default. In a specific implementation, the image to be processed may be segmented first to obtain at least one target area, that is, an area where the target may be initially identified, and then determine the overlapping area of each frame in the M frames with at least one target area to obtain multiple overlaps Area, select the overlapping area greater than the preset area value from multiple overlapping areas to obtain N overlapping areas, and obtain N frames corresponding to the N overlapping areas, N is an integer less than or equal to M, so it can be reduced The number of NMS operations in subsequent frames improves the speed of the operation and also improves the recognition accuracy.

It can be seen that, through the target detection method described in the embodiments of the present application, the image to be processed is obtained, and the image to be processed is input into a preset convolutional neural network to obtain M first-type frames, each of which corresponds to one Score, M is an integer greater than 1, sort the M first-type frames according to the score of each frame in the M first-type frames from high to low, set all frame masks to 1, from the M after sorting Select a frame as the target frame in the first type of frame, set the mask of the target frame to 0, determine the overlap area between the i-th frame and the target frame, the i-th frame is any frame with a mask of 1, and the overlap area When it is greater than the preset threshold, the mask of the i-th box is set to 0. During the target detection process, some boxes can be filtered out, which can reduce the number of iterations and reduce the calculation complexity. Afterwards, you can select the mask with the highest score of 1 as the target frame from the remaining first-type frames, and repeat the above overlapping area filtering until the last target frame with the mask of 1 is taken out, which can reduce the NMS running time.

In addition, in the embodiments of the present application, the method of the embodiments of the present application is first used to filter the frames, which can reduce the number of subsequent NMS operations. Compared with the traditional use of all frames for NMS operations, it can reduce the number of iterations. Reduced computational complexity and improved target detection efficiency.

Consistent with the above, please refer to FIG. 2, which is a schematic flowchart of an embodiment of a target detection method provided by an embodiment of the present application. The target detection method described in this embodiment includes the following steps:

201. Acquire an image to be processed.

202. Input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each first-type frame corresponds to a score, and M is an integer greater than 1.

203. Sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low.

204. Set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0.

205. Determine the overlapping area between the i-th frame and the target frame. The i-th frame is any frame whose mask is 1.

206. When the overlapping area is greater than a preset threshold, set the mask of the i-th frame to 0.

207. Calculate the area value of the target frame using a scalar register.

208. Use the preset vector register to take a second type frame of a preset dimension, where the second type frame is a vector frame corresponding to the i-th frame.

209. Calculate the target overlap area between the second type frame and the target frame using a vector operation method, and the target overlap area is a vector.

210. Use a vector operation method to calculate the vector area of the second type of box.

211. Determine a preset comparison formula according to the target overlap area, the vector area, and the preset threshold, and set the corresponding mask of the second type frame to 0 according to the preset comparison formula.

For the target detection method described in the above steps 201-211, reference may be made to the corresponding steps of the target detection method described in FIG. 1A.

It can be seen that, through the target detection method described in the embodiments of the present application, the image to be processed is obtained, and the image to be processed is input into a preset convolutional neural network to obtain M first-type frames, each of which corresponds to one Score, M is an integer greater than 1, sort the M first-type frames according to the score of each frame in the M first-type frames from high to low, set all frame masks to 1, from the M after sorting Select a frame as the target frame in the first type of frame, set the mask of the target frame to 0, determine the overlap area between the i-th frame and the target frame, the i-th frame is any frame with a mask of 1, and the overlap area When it is greater than the preset threshold, set the mask of the i-th box to 0, use the scalar register to calculate the area value of the target box, and use the preset vector register to take the second-type box of the preset dimension, the second-type box is the i-th box For the vector box corresponding to the box, use the vector operation method to calculate the target overlap area between the second type box and the target box. The target overlap area is a vector. Use the vector operation method to calculate the vector area of the second type box. According to the target overlap area, The vector area and the preset threshold determine the preset comparison formula, and set the corresponding mask of the second type of frame to 0 according to the preset comparison formula. During the target detection process, some frames can be filtered out, which can reduce the number of iterations. Reduced computational complexity.

Consistent with the above, the following is a device for implementing the above target detection method, specifically as follows:

Please refer to FIG. 3, which is a schematic structural diagram of an embodiment of a target detection device according to an embodiment of the present application. The target detection device described in this embodiment includes: an acquisition unit 301, an input unit 302, a sorting unit 303, a selection unit 304, a determination unit 305, and a setting unit 306, as follows:

The obtaining unit 301 is used to obtain an image to be processed;

The input unit 302 is configured to input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each of the first-type frames corresponds to a score, and M is an integer greater than 1;

The sorting unit 303 is configured to sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low;

The selection unit 304 is used to set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0;

The determining unit 305 is configured to determine an overlapping area between the i-th frame and the target frame, and the i-th frame is any frame whose mask is 1.

The setting unit 306 is configured to set the mask of the i-th frame to 0 when the overlapping area is greater than a preset threshold.

It can be seen that, through the target detection device described in the embodiment of the present application, the image to be processed is acquired, and the image to be processed is input into a preset convolutional neural network to obtain M first-type frames, each of which corresponds to one Score, M is an integer greater than 1, sort the M first-type frames according to the score of each frame in the M first-type frames from high to low, set all frame masks to 1, from the M after sorting Select a frame as the target frame in the first type of frame, set the mask of the target frame to 0, determine the overlap area between the i-th frame and the target frame, the i-th frame is any frame with a mask of 1, and the overlap area When it is greater than the preset threshold, the mask of the i-th box is set to 0. During the target detection process, some boxes can be filtered out, which can reduce the number of iterations and reduce the calculation complexity.

The above obtaining unit 301 can be used to implement the method described in step 101 above, the input unit 302 can be used to implement the method described in step 102 above, the sorting unit 303 can be used to implement the method described in step 103 above, the selection unit 304 described above It can be used to implement the method described in step 104 above, the determination unit 305 can be used to implement the method described in step 105 above, the setting unit 306 can be used to implement the method described in step 106 above, and so on.

In a possible example, in terms of selecting one frame from the M first-class frames after sorting as the target frame, the sorting unit 303 is specifically configured to:

In a possible example, the electronic device includes a vector register. As shown in FIG. 3B, FIG. 3B is another modified structure of the target detection device shown in FIG. 3A. Compared with FIG. 3A, it may further include: The method further includes: a calculation unit 307 and an execution unit 308, as follows:

The calculation unit 307 is configured to calculate the area value of the target frame using a scalar register;

The obtaining unit 301 is configured to use the preset vector register to obtain a second type frame of a preset dimension, where the second type frame is a vector frame corresponding to the i-th frame;

The determining unit 305 is configured to calculate a target overlapping area between the second type frame and the target frame using a vector operation method, and the target overlapping area is a vector;

The calculation unit 307 is also used to calculate the vector area of the second type frame by using a vector operation method;

The execution unit 308 is further configured to determine a preset comparison formula according to the target overlap area, the vector area and the preset threshold, and according to the preset comparison formula The corresponding mask is set to 0.

In a possible example, in terms of using the scalar register to calculate the area value of the target frame, the calculation unit 307 is specifically configured to:

Wherein the area _{_{value, (x 0a, y 0a)}} , (x 1a, y 1a) two vertex coordinates of the target frame is a diagonal, s _a of the target frame.

In a possible example, in terms of calculating the target overlapping area between the second-type frame and the target frame using a vector operation method, the execution unit 308 is specifically configured to:

Where (X _0B , Y _0B ), (X _1B , Y _1B ) are the coordinates of two vertices of a diagonal line of the second type frame, and S _overlap represents the difference between the second type frame and the target frame The target overlap area between.

In a possible example, a preset comparison formula is determined according to the target overlap area, the vector area and the preset threshold, and the second type of frame is determined according to the preset comparison formula The corresponding mask of is set to 0, and the execution unit 308 is specifically used to:

The preset comparison formula is constructed as follows:

(s _a +S _B -S _overlap )*thres, where s _a is a vector and is obtained by s _a vectorization processing, the number of dimensions of s _a is the same as the number of dimensions of S _overlap , where thres is the preset the threshold value, S _B denotes a vector area of the second frame type;

Compare S _overlap with (s _a +S _B -S _overlap )*thres, specifically: the jth element of S _overlap and the corresponding jth element in (s _a +S _B -S _overlap )*thres Comparing, if it is greater, set the mask of the jth element of the second type frame to 0, otherwise, keep the mask of the jth element of the second type frame to 1, and j is any of S _overlap An element position.

In a possible example, a preset comparison formula is determined according to the target overlap area, the vector area and the preset threshold, and the second type of frame is determined according to the preset comparison formula The corresponding mask of is set to 0, and the determination unit is specifically used to:

The preset comparison formula is constructed as follows:

_{_{min (s a, S B)}} * thres, wherein, a vector s _a, and the process to obtain the vector s _a, s _a number of the same dimension and S _overlap dimension, wherein, for the thres is a predetermined threshold value, S _B represents the vector area of the second type of box;

It can be understood that the functions of each program module of the target detection device in this embodiment may be specifically implemented according to the method in the above method embodiment, and the specific implementation process may refer to the related description of the above method embodiment, which will not be repeated here.

Consistent with the above, please refer to FIG. 4, which is a schematic structural diagram of an embodiment of an electronic device according to an embodiment of the present application. The electronic device described in this embodiment includes: at least one input device 1000; at least one output device 2000; at least one processor 3000, such as a CPU; and memory 4000, the above input device 1000, output device 2000, processor 3000 and The memory 4000 is connected through a bus 5000.

The input device 1000 may specifically be a touch panel, physical buttons, or a mouse.

The above output device 2000 may specifically be a display screen.

The above-mentioned memory 4000 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. The above memory 4000 is used to store a set of program codes, and the above input device 1000, output device 2000, and processor 3000 are used to call the program codes stored in the memory 4000, and perform the following operations:

The aforementioned processor 3000 is used for:

Get the image to be processed;

It can be seen that through the electronic device described in the embodiment of the present application, the image to be processed is acquired, and the image to be processed is input into a preset convolutional neural network to obtain M first-type frames, each of the first-type frames corresponds to a score , M is an integer greater than 1, and sort the M first-type frames according to the score of each frame in the M first-type frames from high to low, set all frame masks to 1, from the M Select a frame as the target frame in a type of frame, set the mask of the target frame to 0, and determine the overlap area between the i-th frame and the target frame. The i-th frame is any frame whose mask is 1, and the overlap area is greater than When the threshold is preset, the mask of the i-th frame is set to 0. During the target detection process, some frames can be filtered out, which can reduce the number of iterations and reduce the calculation complexity.

In a possible example, in terms of selecting one frame from the M first-class frames after sorting as the target frame, the processor 3000 is specifically used to:

In a possible example, the electronic device includes a vector register, and the processor 3000 is further specifically used to:

Calculate the area value of the target box with a scalar register;

Adopting the preset vector register to take a second-type frame of a preset dimension, where the second-type frame is a vector frame corresponding to the i-th frame;

Calculate the target overlap area between the second type frame and the target frame using a vector operation method, the target overlap area is a vector;

Use the vector operation method to calculate the vector area of the second type of box;

A preset comparison formula is determined according to the target overlap area, the vector area, and the preset threshold, and the corresponding mask of the second type frame is set to 0 according to the preset comparison formula.

In a possible example, in terms of using the scalar register to calculate the area value of the target frame, the processor 3000 is further specifically used to:

In a possible example, in terms of calculating the target overlapping area between the second-type frame and the target frame using a vector operation method, the processor 3000 is specifically used to:

In a possible example, a preset comparison formula is determined according to the target overlap area, the vector area and the preset threshold, and the second type of frame is determined according to the preset comparison formula The corresponding mask of is set to 0, and the above processor 3000 is specifically used for:

The preset comparison formula is constructed as follows:

An embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a program, and when the program is executed, it includes some or all steps of any one of the target detection methods described in the foregoing method embodiments.

An embodiment of the present application provides a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium that stores the computer program, and the computer program is operable to cause the computer to execute as described in the embodiment of the present application Part or all of the steps described in any target detection method. The computer program product may be a software installation package.

Claims

A target detection method is characterized in that it is applied to an electronic device, and the method includes:

Get the image to be processed;

Input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each first-type frame corresponds to a score, and M is an integer greater than 1;

Sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low;

Set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0;

Determine the overlapping area between the i-th frame and the target frame, the i-th frame is any frame whose mask is 1;

When the overlapping area is greater than a preset threshold, the mask of the i-th frame is set to 0.
The method according to claim 1, wherein the selecting one of the M first-type frames after sorting as the target frame includes:

A frame with the highest score is selected from the M first-class frames after sorting as the target frame.
The method according to claim 1 or 2, wherein the electronic device includes a vector register, and the method further includes:

Calculate the area value of the target box with a scalar register;

Adopting the preset vector register to take a second-type frame of a preset dimension, where the second-type frame is a vector frame corresponding to the i-th frame;

Calculate the target overlap area between the second type frame and the target frame using a vector operation method, the target overlap area is a vector;

Use the vector operation method to calculate the vector area of the second type of box;

A preset comparison formula is determined according to the target overlap area, the vector area, and the preset threshold, and the corresponding mask of the second type frame is set to 0 according to the preset comparison formula.
The method according to claim 3, wherein the calculating the area value of the target frame using a scalar register includes:

Calculate the area value of the target frame according to the following formula:

Wherein the area value, (x 0a, y 0a) , (x 1a, y 1a) two vertex coordinates of the target frame is a diagonal, s a of the target frame.
The method according to claim 4, wherein the calculation of the target overlapping area between the second type frame and the target frame using a vector operation method includes:

Calculate the target overlap area between the second type frame and the target frame according to the following formula:

Where (X 0B , Y 0B ), (X 1B , Y 1B ) are the coordinates of two vertices of a diagonal line of the second type frame, and S overlap represents the difference between the second type frame and the target frame The target overlap area between.
The method according to claim 5, characterized in that the predetermined comparison formula is determined according to the target overlap area, the vector area and the predetermined threshold, and according to the predetermined comparison formula The corresponding mask of the second type frame is set to 0, including:

The preset comparison formula is constructed as follows:

(s a +S B -S overlap )*thres, where s a is a vector and is obtained by s a vectorization processing, the number of dimensions of s a is the same as the number of dimensions of S overlap , where thres is the preset the threshold value, S B denotes a vector area of the second frame type;

Compare S overlap with (s a +S B -S overlap )*thres, specifically: the jth element of S overlap and the corresponding jth element in (s a +S B -S overlap )*thres Comparing, if it is greater, set the mask of the jth element of the second type frame to 0, otherwise, keep the mask of the jth element of the second type frame to 1, and j is any of S overlap An element position.
The method according to claim 5, characterized in that the predetermined comparison formula is determined according to the target overlap area, the vector area and the predetermined threshold, and according to the predetermined comparison formula The corresponding mask of the second type frame is set to 0, including:

The preset comparison formula is constructed as follows:

min (s a, S B) * thres, wherein, a vector s a, and the process to obtain the vector s a, s a number of the same dimension and S overlap dimension, wherein, for the thres is a predetermined threshold value, S B represents the vector area of the second type of box;

S overlap is compared with min(s a , S B )*thres, specifically: the k-th element of S overlap is compared with the corresponding k-th element in min(s a , S B )*thres, if If it is greater, the mask of the k-th element of the second type frame is set to 0, otherwise, the mask of the k-th element of the second type frame is kept at 1, and k is any element position in S overlap .
An object detection device, characterized in that it includes:

An acquisition unit for acquiring an image to be processed;

An input unit, configured to input the image to be processed into a preset convolutional neural network to obtain M first-type frames, each first-type frame corresponds to a score, and M is an integer greater than 1;

A sorting unit, configured to sort the M first-type frames according to the order of the scores of each frame in the M first-type frames from high to low;

The selection unit is used to set the masks of all frames to 1, select one frame from the M first-class frames after sorting as the target frame, and set the mask of the target frame to 0;

A determining unit, configured to determine an overlapping area between the i-th frame and the target frame, and the i-th frame is any frame whose mask is 1;

The setting unit is configured to set the mask of the i-th frame to 0 when the overlapping area is greater than a preset threshold.
An electronic device, characterized in that it includes a processor and a memory, and the memory is used to store one or more programs, and is configured to be executed by the processor, the program including is used to execute claims 1-7 The instructions of the steps in the method of any one.
A computer-readable storage medium storing a computer program, the computer program being executed by a processor to implement the method according to any one of claims 1-7.