CN115115978A

CN115115978A - Object identification method and device, storage medium and processor

Info

Publication number: CN115115978A
Application number: CN202210663391.XA
Authority: CN
Inventors: 张黎; 鹿强; 吴健宇
Original assignee: FAW Group Corp
Current assignee: FAW Group Corp
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-09-27

Abstract

The invention discloses an object identification method, an object identification device, a storage medium and a processor. Wherein, the method comprises the following steps: acquiring a plurality of detection frames of an object in a video, wherein each detection frame is used for representing the position of the object in the video; respectively determining the time frame number of a plurality of detection frames in the video, wherein the time frame number is used for representing the continuous occurrence time length of each detection frame in the video; determining a target detection frame among the plurality of detection frames based on the time frame number and the frame number difference threshold; an object is identified in the video based on the target detection box. The invention solves the technical problem of low accuracy of object identification.

Description

Object identification method and device, storage medium and processor

Technical Field

The invention relates to the field of vehicles, in particular to an object identification method, an object identification device, a storage medium and a processor.

Background

At present, in the process of automatic driving, based on the result of detecting the object on the traffic road, the information of the position, size, direction, category and the like of the object in the space can be accurately determined, so that the purposes of three-dimensional modeling, path planning and the like are achieved.

In the related art, a detection method of Non-Maximum Suppression (NMS) is generally used for detecting an object, but the method is only suitable for detection in a two-dimensional scene, and thus, there is a technical problem that an accuracy rate of object identification is low for detection in a space.

In view of the above-mentioned problem of low accuracy in object recognition in the related art, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides an object identification method, an object identification device, a storage medium and a processor, and at least solves the technical problem of low accuracy of target identification.

According to an aspect of an embodiment of the present invention, there is provided an object recognition method, including: acquiring a plurality of detection frames of an object in a video, wherein each detection frame is used for representing the position of the object in the video; respectively determining the time frame number of a plurality of detection frames in the video, wherein the time frame number is used for representing the continuous occurrence time length of each detection frame in the video; determining a target detection frame among the plurality of detection frames based on the number of time frames and a frame number difference threshold; an object is identified in the video based on the target detection box.

Optionally, determining the target detection frame among the plurality of detection frames based on the time frame number and the frame number difference threshold includes: determining the intersection ratio of a first detection frame in the plurality of detection frames and a second detection frame in the plurality of detection frames, wherein the first detection frame and the second detection frame are any two detection frames in the plurality of detection frames at the same moment; responding to the fact that the intersection ratio is larger than the intersection ratio threshold value, and acquiring the time frame number of the first detection frame and the time frame number of the second detection frame; and determining a target detection frame in the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame and the frame number difference threshold.

Optionally, determining the target detection frame from the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold includes: determining a first frame number difference between the time frame number of the first detection frame and the time frame number of the second detection frame; a target detection frame is determined from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold.

Optionally, determining a target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold comprises: and determining a detection frame long in the time frame number from the first detection frame and the second detection frame as a target detection frame in response to the absolute value of the first frame number difference being greater than the frame number difference threshold.

Optionally, determining a target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold comprises: and in response to the fact that the absolute value of the first frame number difference is not larger than the frame number difference threshold, determining a detection frame with a high matching degree in the first detection frame and the second detection frame as a target detection frame, wherein the matching degree is used for representing the matching degree of the corresponding detection frame and the object.

Optionally, obtaining a matching degree of each detection frame at each moment in the video to obtain at least one matching degree of each detection frame; and determining the quotient of the sum of the at least one matching degree of the detection frame and the number of the at least one matching degree as the target matching degree of the detection frame.

Alternatively, the frame number difference threshold is determined based on historical detection data processed for the object.

According to another aspect of the embodiments of the present invention, there is also provided an apparatus for identifying an object, including: an acquisition unit configured to acquire a plurality of detection frames of an object in a video, wherein each detection frame is used to indicate a position of the object in the video; the first determining unit is used for respectively determining the time frame numbers of a plurality of detection frames in the video, wherein the time frame numbers are used for representing the time length of each detection frame in the video; a second determination unit configured to determine a target detection frame among the plurality of detection frames based on the number of time frames and a frame number difference threshold; and the identification unit is used for identifying the object in the video based on the target detection frame.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium. The computer readable storage medium includes a stored program, wherein when the program runs, the apparatus in which the computer readable storage medium is located is controlled to execute the object identification method according to the embodiment of the present invention.

According to another aspect of the embodiments of the present invention, there is also provided a processor. The processor is used for running a program, wherein the program executes the identification method of the object of the embodiment of the invention when running.

In the embodiment of the invention, a plurality of detection frames of an object in a video are acquired, wherein each detection frame is used for representing the position of the object in the video; respectively determining the time frame number of a plurality of detection frames in the video, wherein the time frame number is used for representing the continuous occurrence time length of each detection frame in the video; determining a target detection frame among the plurality of detection frames based on the number of time frames and a frame number difference threshold; an object is identified in the video based on the target detection box. That is to say, the embodiment of the present invention achieves the purpose of filtering overlapping objects with short occurrence time by comparing the number of time frames between a plurality of detection frames, thereby achieving the technical effect of improving the accuracy of object identification and solving the technical problem of low accuracy of object identification.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a method of identifying an object according to an embodiment of the invention;

FIG. 2 is a flow chart of a method for acquiring a detection frame according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a non-maxima suppression identification process according to the related art;

figure 4(a) is a graphical representation of the results of a process that has not undergone non-maxima suppression in accordance with an embodiment of the present invention,

FIG. 4(b) is a graph illustrating the results of a process that has not undergone non-maxima suppression, in accordance with an embodiment of the present invention;

fig. 5 is a flow chart of a method of processing non-maximum suppression according to an embodiment of the invention;

fig. 6 is a schematic diagram of an object recognition apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided an embodiment of a method for object recognition, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that herein.

Fig. 1 is a flowchart of an object recognition method according to an embodiment of the present invention, such as the flowchart of the object recognition method shown in fig. 1, the method including the steps of:

step S102, a plurality of detection frames of the object in the video are obtained, wherein each detection frame is used for representing the position of the object in the video.

In the technical solution provided in step S102 of the present invention, a time period of images may be acquired by various sensors and the like to obtain a video to be processed, and the video may be processed to obtain a plurality of detection frames of an object in the video, where the video may be a time period of images, and may be a time period of images acquired by an acquisition device such as a camera, for example, a continuous time period of pictures captured by a vehicle-mounted camera of a vehicle during a driving process; the object may be an object existing in the video, for example, a truck on a highway section, a pedestrian at an intersection, or the like, and the video acquisition mode and the type of the object in the video are not specifically limited herein; the detection frame can be an inclined frame (rotation box, rbox for short), can be a three-dimensional detection frame (3D rbox) of a three-dimensional body, for example, can be a cube, a cuboid, and also can be a two-dimensional detection frame (2D rbox) of a two-dimensional plane, for example, can be a rectangle.

Alternatively, a plurality of detection frames of the object in the video may be acquired by sensors installed at different positions, for example, a three-dimensional detection frame may be acquired by a laser radar (lidar) and a vision (camera) sensor, and a two-dimensional detection frame may be acquired by a millimeter wave radar.

For example, when a vehicle runs on a road section such as a roundabout, a construction road section, a downtown block, a straight road section and the like, a video of the vehicle running for a period of time is acquired, each frame of picture in the video can be processed through equipment such as a laser radar, a millimeter wave radar, vision and the like, a detection frame corresponding to an object in each frame of the video is obtained, and a plurality of detection frames are obtained.

Alternatively, a three-dimensional detection frame may be acquired through vision (camera) and laser radar (lidar), and the detected three-dimensional detection frame may be projected onto the ground to obtain a two-dimensional detection frame, so as to acquire information of a center point, a size, an orientation, and the like of the detection frame, where the information of the detection frame may be used to represent a position of an object in the video.

In the related art, the detection of the two-dimensional detection frame can only be realized by a detection method of non-maximum suppression, but the method has a limitation on the filtering of the three-dimensional space object in the scene.

And step S104, respectively determining the time frame number of a plurality of detection frames in the video, wherein the time frame number is used for representing the continuous time length of each detection frame in the video.

In the technical solution provided by step S104 of the present invention, the number of time frames in which each detection frame continuously appears in the video may be respectively determined, where the number of time frames may be represented by frame _ cnt, and may be used to characterize the length of time in which each detection frame continuously appears in the video, for example, the length of time may be 1 second, 2 seconds, and the like, which is only for illustration and is not limited specifically herein.

For example, if three frames of the object-corresponding detection frame occur consecutively from the second frame of the video, the number of the time frames of the object-corresponding detection frame occurring in the video may be three frames.

Step S106, based on the time frame number and the frame number difference threshold, determining a target detection frame in the plurality of detection frames.

In the technical solution provided in step S106 of the present invention, based on the time frame number corresponding to each detection frame, the frame number difference between the two detection frames is determined, the time frame number is compared with the frame number difference threshold, and a target detection frame can be determined from the multiple detection frames by using the comparison result, where the target detection frame can be the detection frame with the highest matching degree with the object, and the frame number difference threshold can be referred to as a time threshold, and can be represented by time _ threshold, can be a value set empirically, or can be a value set according to an actual situation.

For example, starting from the second frame of the video, if three frames continuously appear in the video in the detection frame corresponding to the first object, the time frame number of the first object in the video is three frames, and if five frames continuously appear in the video in the detection frame corresponding to the second object, the time frame number of the second object in the video is five frames, and if the frame number difference threshold is one frame, the first object and the second object may be filtered and screened based on the time frame number of the first object, the time frame number of the second object and the set frame number difference threshold, so as to achieve the purpose of determining the target detection frame from the plurality of detection frames.

In the related art, the influence of other time frame information on the sensing result in a period of time is ignored by detecting the images with a single time frame number, and in the embodiment of the invention, the other time frame information is introduced, and a time threshold value is set according to the time frame number of the object, so that overlapped objects with short occurrence time are filtered, and the accuracy of object identification is improved.

In step S108, an object is identified in the video based on the target detection box.

In the above-mentioned technical solution of step S108 of the present invention, information such as a position and a direction of an object in each frame of a video is identified based on the target detection frame.

In the above steps S102 to S108 of the present application, a plurality of detection frames of an object in a video are obtained, where each detection frame is used to indicate a position of the object in the video; respectively determining the time frame number of a plurality of detection frames in the video, wherein the time frame number is used for representing the continuous occurrence time length of each detection frame in the video; determining a target detection frame among the plurality of detection frames based on the number of time frames and a frame number difference threshold; the object is identified in the video based on the target detection frame, so that the technical effect of improving the accuracy of target identification is achieved, and the technical problem of low accuracy of target identification is solved.

The above-described method of this embodiment is further described below.

As an alternative embodiment, the determining the target detection frame among the plurality of detection frames based on the time frame number and the frame number difference threshold includes: determining the intersection ratio of a first detection frame in the plurality of detection frames and a second detection frame in the plurality of detection frames, wherein the first detection frame and the second detection frame are any two detection frames in the plurality of detection frames at the same moment; responding to the fact that the intersection ratio is larger than the intersection ratio threshold value, and acquiring the time frame number of the first detection frame and the time frame number of the second detection frame; and determining a target detection frame in the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame and the frame number difference threshold.

In this embodiment, two arbitrary detection frames at the same time may be selected from the multiple detection frames to obtain a first detection frame and a second detection frame, determine an intersection ratio between the first detection frame and the second detection frame, determine whether the intersection ratio between the first detection frame and the second detection frame is greater than an intersection ratio threshold, indicate that the first detection frame and the second detection frame overlap in response to the intersection ratio being greater than the intersection ratio threshold, may respectively determine time frame numbers of the first detection frame and the second detection frame, determine a target detection frame in the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame, and a frame number difference threshold, where the intersection ratio may be used to represent an overlapping degree between the detection frames, and may be represented by IoU; the intersection ratio threshold may be a value set according to an actual situation, may also be a value obtained empirically, and may be represented by T; the first detection frame and the second detection frame may be any two detection frames in the same frame.

Optionally, a first detection frame and a second detection frame are selected from the plurality of detection frames, the intersection of the first detection frame and the second detection frame is calculated, the intersection-to-parallel ratio between the two detection frames is determined according to the area of the convex edge formed by judging the intersection point set, if the intersection-to-parallel ratio is larger than the intersection-to-parallel ratio threshold, the two detection frames are overlapped, the detection frames are screened according to the time frame number, the time frame number of the first detection frame and the time frame number of the second detection frame are determined, and the target detection frame is determined in the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame and the frame number difference threshold.

Optionally, a first detection frame and a second detection frame are selected from the multiple detection frames, intersection operation is performed on the first detection frame and the second detection frame to obtain a corresponding intersection point set, and an intersection-to-parallel ratio between the two detection frames is obtained according to the area of the convex edge formed by judging the intersection point set, that is, the intersection-to-parallel ratio can be calculated by the following formula:

IoU＝area_r1r2/(area_r1+area_r2-arear1r2)

the area _ r1 and the area _ r2 are areas of the first detection frame and the second detection frame respectively, and the area _ r1r2 is a convex area formed by intersecting the two detection frames.

Alternatively, if IoU is greater than the threshold T, determining the time frame number of the first detection frame and the time frame number of the second detection frame, and determining the target detection frame among the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold.

As an alternative embodiment, determining the target detection frame from the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold includes: determining a first frame number difference between the time frame number of the first detection frame and the time frame number of the second detection frame; a target detection frame is determined from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold.

In this embodiment, a first detection frame and a second detection frame of a plurality of detection frames are selected, a time frame number of the first detection frame and a time frame number of the second detection frame are determined, a time frame number difference between the first detection frame and the second detection frame is determined, a first frame number difference is obtained, and the first frame number difference is judged based on a frame number difference threshold value, so that a target detection frame is determined from the first detection frame and the second detection frame.

As an alternative embodiment, the determining the target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold includes: and determining a detection frame long in the time frame number from the first detection frame and the second detection frame as a target detection frame in response to the absolute value of the first frame number difference being greater than the frame number difference threshold.

In this embodiment, a first detection frame and a second detection frame are selected from a plurality of detection frames, a time frame number of the first detection frame and a time frame number of the second detection frame are determined, a time frame number difference between the first detection frame and the second detection frame is determined to obtain a first frame number difference, whether an absolute value of the first frame number difference is greater than a frame number difference threshold is determined, and in response to the absolute value of the first frame number difference being greater than the frame number difference threshold, a detection frame with a longer time frame number from among the first detection frame and the second detection frame is determined as a target detection frame.

Optionally, a first detection frame and a second detection frame are selected from the multiple detection frames, the intersection of the first detection frame and the second detection frame is calculated, the intersection-parallel ratio between the two detection frames is determined according to the area of the convex edge formed by judging the intersection point set, and if the intersection-parallel ratio is greater than the intersection-parallel ratio threshold, the two detection frames are overlapped, so that the detection frames can be screened through the time frame number; further, the time frame number of the first detection frame and the time frame number of the second detection frame are determined, the frame number difference between the two detection frames is determined, the first frame number difference is obtained, if the absolute value of the first frame number difference is larger than the frame number difference threshold value, the two detection frames are overlapped, the overlapping time is long, and the detection frame with the long time frame number in the two detection frames is determined as the target detection frame. Alternatively, an iterative operation may be performed between the plurality of detection frames according to the above description, so as to achieve the purpose of determining the target detection frame from the plurality of detection frames.

As an alternative embodiment, the determining the target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold includes: and determining a detection frame with a high target matching degree in the first detection frame and the second detection frame as a target detection frame in response to the fact that the absolute value of the first frame number difference is not larger than the frame number difference threshold, wherein the target matching degree is used for representing the matching degree of the corresponding detection frame and the object.

In this embodiment, a first detection frame and a second detection frame are selected from a plurality of detection frames, a time frame number of the first detection frame and a time frame number of the second detection frame are determined, a time frame number difference between the first detection frame and the second detection frame is determined, a first frame number difference is obtained, whether an absolute value of the first frame number difference is greater than a frame number difference threshold is determined, and in response to that the absolute value of the first frame number difference is not greater than the frame number difference threshold, a detection frame with a large target matching degree in the first detection frame and the second detection frame is determined as a target detection frame, where the target matching degree may be used to represent a matching degree between the corresponding detection frame and an object, and may be a matching average value between the detection frame and the object over a period of time, for example, a confidence coefficient average value may be represented by a score average value.

Optionally, a first detection frame and a second detection frame are selected from the plurality of detection frames, the intersection of the first detection frame and the second detection frame is calculated, the intersection-parallel ratio between the two detection frames is determined according to the area of the convex edge formed by judging the intersection point set, and if the intersection-parallel ratio is greater than the intersection-parallel ratio threshold, the two detection frames are overlapped, the detection frames can be screened through the time frame number; further, the time frame number of the first detection frame and the time frame number of the second detection frame are determined, the frame number difference between the two detection frames is determined, the first frame number difference is obtained, if the absolute value of the first frame number difference is not larger than the frame number difference threshold, the overlapping time is not long although the two detection frames are overlapped, and the detection frame with the high target matching degree in the first detection frame and the second detection frame is determined as the target detection frame. Optionally, the target detection frame is determined from the plurality of detection frames by performing an iterative operation between the plurality of detection frames through the above description.

In the embodiment of the invention, the overlapping relation between the detection frames is determined, and the detection frames with low target matching degree are filtered, so that the corresponding three-dimensional space objects are filtered, and the problems of detection result instability caused by repetition, jumping and association of the detection objects can be effectively solved, thereby improving the fusion performance of the multi-sensor three-dimensional space objects and improving the accuracy of object detection.

As an optional embodiment, the matching degree of each detection frame at each moment in the video is obtained, and at least one matching degree of each detection frame is obtained; and determining the quotient between the sum of the at least one matching degree of the detection frame and the number of the at least one matching degree as the target matching degree of the detection frame.

In this embodiment, the matching degree of each detection frame corresponding to each moment in the video is obtained, at least one matching degree of each detection frame is obtained, all the matching degrees corresponding to each detection frame are added, and a quotient between the sum of all the matching degrees corresponding to the detection frames and the matching degree quantity is determined as a target matching degree of the detection frame, where the matching degree may be used to represent the matching degree of the detection frame with the object to be recognized in each frame, and may be represented by score.

In the related art, the Non-Maximum Suppression is a post-processing module in the target detection frame, and is mainly used to delete a highly redundant detection frame, optionally, a plurality of detection frames are generated for each object in one frame of image during detection, and the plurality of detection frames of each object are de-redundant through the Non-Maximum Suppression to obtain a final target detection frame.

Optionally, in the embodiment of the present invention, a plurality of detection frames corresponding to an object may be obtained through a plurality of sensors, a detection library obtained by a millimeter wave radar is filtered by using a speed threshold, a three-dimensional detection frame obtained by detection by a vision sensor and a laser radar is projected onto the ground to obtain a two-dimensional detection frame, information such as a center point, a size, and a direction of the object is obtained, a frame number (frame _ cnt) appearing in each detection frame and a matching degree (score) of each frame are respectively determined to obtain matching degrees of the plurality of detection frames, all matching degrees corresponding to each detection frame are added, a quotient between a sum of all matching degrees corresponding to the detection frames and the matching degree number is determined as a target matching degree of the detection frame, and a matching degree of the same detection frame in each frame is changed to a target matching degree.

Optionally, the detection frames of each frame may be sorted by using the target matching degree to obtain a detection frame sequence (rbox _ list), so as to compare the detection frames in the detection frame sequence to obtain a target detection frame, where the sorting of the detection frames is only to ensure that the detection frames can be sequentially processed, thereby avoiding omission, and therefore, no specific limitation is imposed on the sorting method.

In the embodiment of the invention, the target matching degree of the detection frame is set as the mean value of the matching degrees of all the frames, so that the problems of associated target jumping and unstable associated results in fusion are effectively solved, and the result fusion in the multi-sensor object detection process is improved.

As an alternative embodiment, the frame number difference threshold is determined based on historical detection data processed for the object.

In this embodiment, the selection may be performed through multiple experiments, so as to achieve the purpose of determining the appropriate frame number difference threshold.

According to the embodiment of the invention, the purpose of filtering the overlapped objects with shorter occurrence time of the part is achieved by comparing the time frame number among the detection frames, so that the technical effect of improving the accuracy of target identification is realized, and the technical problem of low accuracy of target identification is solved.

Example 2

The technical solutions of the embodiments of the present invention will be illustrated below with reference to preferred embodiments.

The core function module in the field of automatic driving can be divided into a sensing module, a decision-making module and a control module, wherein the sensing module can provide analysis basis for other follow-up modules only by realizing accurate sensing of the surrounding environment, so that the accuracy of object detection is improved, meanwhile, the object attributes and sensing ranges sensed by different sensors are different, and if different sensors of the same type are arranged at different positions, sensing areas are different, so that sensing results of the sensors are fused, and more accurate information about surrounding objects is obtained.

Currently, three-dimensional object detection is one of key technologies of an autopilot sensing module, and can provide information such as position, size, direction, and category of an object in a three-dimensional space, so as to complete modeling, path planning, and the like of the three-dimensional space.

In the related technology, a sensor detects a three-dimensional object, a frame overlapping problem exists, false detection and missed detection are caused, and therefore the correlation precision and the fusion robustness of multi-sensor fusion are greatly influenced.

In order to solve the above problems, an embodiment of the present invention provides a detection method based on inclined Non-Maximum Suppression (INMS), where the method performs intersection operation on a currently traversed detection frame and the remaining other detection frames based on an inclined frame to obtain a corresponding intersection point set, and calculates an intersection-to-parallel ratio of every two detection frames according to an area of a convex edge formed by judging the intersection point set, but the method mostly performs detection filtering based on single-frame observation information, ignores an effect of the remaining timestamp information on a sensing result, and thus has a problem of low accuracy in detecting an object.

In a related technology, a sensor data filtering and fusing method for automatic driving is provided, the method is divided into a spatial filtering and fusing method and an immediate temporal filtering and fusing method, the spatial filtering and fusing method carries out spatial clustering processing on data points after a sensing system of automatic driving obtains a frame of original data from a sensor, an effective clustering result is determined, and noise is eliminated; performing relevance tracking on the clustering result so as to obtain position and speed information, historical information and prediction information of a certain object; then, estimating characteristic information, and calculating length, width and orientation information of the clustered objects; and then, a time filtering fusion method is carried out to judge the original target into a determined target, a suspicious target and a pseudo target. The method does not need to consider the types and the number of the sensors, although any sensor data with large noise and small noise can be used, the calculation amount is small, and the method is simple, flexible and effective, the method mostly carries out detection and filtration based on single-frame observation information, ignores the effect of other timestamp information on a sensing result, and still has the problem of low accuracy rate of object detection.

In another related technology, a non-maximum value inhibition method for multi-target detection is provided, the method obtains the category and the confidence score of each detected target, and sorts the targets according to the confidence scores to obtain a detection queue; acquiring a target with the highest confidence score in the detection queue, recording the target as a target A, and judging whether the confidence score of the target A meets a category preset condition or not; if the confidence score of the target A meets the preset condition of the category of the target A, judging whether the detection frame of the target A is overlapped with the detection frames of other targets; if the target A is judged to be overlapped with the detection frames of other targets, the target mark with the overlapped detection frame of the target A is used as a target B, whether the target A and the target B are in the same category is judged, and whether the target B is inhibited is judged by adopting a corresponding inhibition algorithm according to the judgment result.

In another related technology, a 3D target detection method based on multi-information fusion is provided, wherein a driving image and point cloud information are respectively collected and preprocessed; based on the driving image, a 2D target frame and a score thereof are obtained by using a 2D target detector; obtaining a candidate 3D target frame by using a 3D target detector based on the point cloud information; the method comprises the steps of screening candidate 3D target frames by utilizing the corresponding relation between the 2D target frames and the candidate 3D target frames and the scores of the 2D target frames to obtain the 3D target frames, namely, the method makes up the defect of sparse point cloud by introducing visual information, and restricts the 3D target frames by 2D detection results, so that the recall rate of the 3D target frames is improved, and the probability of false detection and missed detection is reduced.

In order to solve the above problems, embodiments of the present invention provide a multi-sensor three-dimensional target detection filtering and fusion method based on INMS, which aims at the problem that fusion association precision is reduced due to frame overlapping easily occurring in the multi-sensor three-dimensional target detection process, the multi-sensor including a laser radar, a millimeter wave radar and a vision sensor acquires a detection frame, the acquired detection frame is preprocessed and filtered, and detection results of the vision sensor and the millimeter wave radar are respectively filtered according to a certain score (score) threshold and a speed threshold, where the score threshold and the speed threshold may be values set in advance according to actual conditions or experience; projecting the detected 3D rbox to the ground to obtain a 2D rbox, thereby obtaining the information of the center point, the size and the orientation of the detection frame; aiming at the problems that in the related technology, detection and filtering are mostly carried out based on single-frame observation information, the effect of the rest timestamp information on a sensing result is ignored, the information of the rest timestamps is introduced into the INMS for processing a single detection frame, a time threshold is set according to the occurrence time of the detection frame, overlapping targets with shorter occurrence time are filtered, and the matching degree (score) corresponding to the detection frame is set as the mean value of the matching degrees of all the occurrence frames, so that the problems of overlapping, jumping and unstable correlation results of the detection targets can be effectively improved, and the fusion performance of a multi-sensor on target detection is improved.

Embodiments of the present invention are further described below.

The embodiment of the invention mainly comprises two aspects of acquisition and processing of the detection frame.

Fig. 2 is a flowchart of a method for acquiring a detection frame according to an embodiment of the present invention, and as shown in fig. 2, the acquiring of the detection frame according to the embodiment of the present invention may include the following steps:

in step S201, data of each sensor is acquired.

In this embodiment, the multi-sensor may include a laser radar (lidar), a millimeter wave radar (radar), and a vision sensor (camera), and the detection data is acquired using the lidar, the millimeter wave radar, and the vision sensor.

Step S202, filtering the acquired detection data based on the score threshold or the speed threshold.

In this embodiment, the acquired target is subjected to preprocessing filtering.

Alternatively, the three-dimensional detection result (3Dbox) obtained by the lidar may be projected to the ground to obtain a two-dimensional detection frame (2D rbox), and information such as a center point, a size, and orientation information of the rbox may be obtained.

Optionally, the detection result obtained by the camera may be filtered based on a score threshold, for example, the three-dimensional detection frame may be filtered by the score threshold (3D rbox), the detection result obtained by filtering is projected onto a plane to obtain a corresponding plane detection frame, and information such as a central point, a size, and orientation information of the detection frame is obtained, where the score threshold may be an empirical value, may be a value obtained according to an actual situation or an experimental test, and the determination of the score threshold is not specifically limited here.

Optionally, for the detection data obtained by radar, the obtained detection result may be filtered through a speed threshold, so as to obtain a detection frame that needs to be processed.

And step S203, inputting the data after all the sensor screening into the inclination non-maximum value suppression device, and processing the data.

And inputting the detection frame after screening into the improved inclined non-maximum value inhibition, and processing the data by using the improved inclined non-maximum value inhibition.

In the related art, non-maximum suppression is a post-processing module in a target detection frame, which can be used to delete highly redundant detection frames, fig. 3 is a schematic diagram of a non-maximum suppression identification process according to the related art, as shown in fig. 3, a plurality of detection frames are generated for an object a and an object B during detection, and the non-maximum suppression nature is to remove redundancy of the plurality of detection frames of each object to obtain a final detection result, which can be used to obtain a target detection frame with a maximum final matching degree, however, a horizontal rectangular frame is generally used for target detection, and a three-dimensional target output by a sensor has direction uncertainty, so that the embodiment of the present invention adopts oblique non-maximum suppression for a rectangular frame with a direction, and introduces information of remaining time stamps for a single target in oblique non-maximum suppression, and sets a time threshold according to the occurrence duration thereof, filtering part of overlapped targets with short occurrence time, and setting the target scores as the average value of all the occurrence frame scores, wherein fig. 4(a) is a schematic diagram of a processing result without non-maximum suppression according to the embodiment of the invention, and fig. 4(b) is a schematic diagram of a processing result without non-maximum suppression according to the embodiment of the invention, as shown in fig. 4(a) and fig. 4(b), the embodiment of the invention can improve the problems of jump of associated targets and unstable associated results in fusion, and simultaneously, the number of time frames is utilized to filter the overlapped part, thereby effectively improving the fusion performance of the multi-sensor three-dimensional targets.

Fig. 5 is a flowchart of a processing method for non-maximum suppression according to an embodiment of the present invention, and as shown in fig. 5, the processing of the INSM according to an embodiment of the present invention may include the following steps:

in step S501, the detection frames filtered by the sensors are processed.

Optionally, the number of time frames (frame _ cnt) of each detection frame and the confidence of each frame are obtained, the confidence mean of each detection frame is determined, and the score of each frame of the detection frame is changed into the corresponding confidence mean.

Optionally, all the detection frames in the same frame are sorted according to the confidence threshold to obtain a sorting table (rbox _ list), where it should be noted that the sorting may be in an ascending order or a descending order, and the main purpose of the sorting is to ensure that all the detection frames can be processed and to prevent detection frames from being omitted in the processing process.

Step S502, screening the detection frames by using the intersection ratio threshold value.

Optionally, the intersection and comparison between each two detection frames is determined in sequence from the first detection frame in the rbox _ list, and the intersection and comparison between each two detection frames may be determined by performing intersection operation on the detection frame and other detection frames to obtain a corresponding intersection point set, and determining the intersection and comparison between each two detection frames according to the area of the convex edge formed by judging the intersection point set, where the intersection and comparison may be determined by the following formula:

IoU＝area_r1r2/(area_r1+area_r2-arear1r2),

the area _ r1 and the area _ r2 are areas of two detection frames respectively, and the area _ r1r2 is an area of a convex edge formed by intersecting the two detection frames.

Alternatively, if the intersection ratio is greater than the intersection ratio threshold (T), step S503 is implemented, and if the intersection ratio is not greater than the intersection ratio threshold (T), the detection frame is retained.

Step S503, calculating the frame number difference between the detection frames, and filtering the overlapped detection frames by using the frame number difference threshold value.

Optionally, if the cross-to-parallel ratio is greater than the cross-to-parallel ratio threshold, which indicates that there is a possibility of overlapping between the two detection frames, the frame number difference between the two detection frames is determined, and the frame number difference between the detection frames is calculated.

Optionally, judging the size of the frame number difference between the frame number difference threshold and the detection frame, if the frame number difference between the detection frames is not greater than the frame number difference threshold (time _ threshold), retaining the detection frame with a larger matching degree, and removing the detection frame with a smaller matching degree; if the frame number difference is larger than the frame number difference threshold value, the detection frame with longer corresponding time frame number is reserved, and the detection frame with shorter corresponding time frame number is removed.

Alternatively, the operations are iterated until all the detection frames in the rbox _ list are filtered through the processing of steps S502 and S504 for a plurality of detection frames.

And step S504, acquiring the filtered target, and performing multi-sensor data fusion.

And acquiring the filtered target, performing data fusion of multiple sensors, and processing the detection frame subjected to multiple filtering to obtain the object to be identified.

In the related technology, the NMS algorithm can only realize the detection of a two-dimensional horizontal rectangular frame, and has limitation on the filtering of a space object in a scene, the embodiment of the invention provides an INMS-based detection algorithm to realize a multi-sensor three-dimensional space target detection filtering method.

Further, aiming at the problems that in the related art, detection and filtering are carried out based on single-frame observation information, and the effect of other timestamp information on a sensing result is ignored, the embodiment of the invention improves the existing INMS, introduces the information of other timestamps into a single target in the INMS, sets a time threshold according to the occurrence time of the target, filters partial overlapped targets with shorter occurrence time, and sets the matching degree of a detection frame as the mean value of the matching degrees of all the occurrence frames, thereby improving the stability of the filtering.

In the embodiment of the invention, the purpose of filtering the overlapped objects with shorter occurrence time of the part is achieved by comparing the time frame numbers among the plurality of detection frames, thereby realizing the technical effect of improving the accuracy of target identification and solving the technical problem of low accuracy of target identification.

Example 3

According to the embodiment of the invention, the invention further provides an object recognition device. It should be noted that the identification device of the object can be used to execute the identification method of the object in embodiment 1.

Fig. 6 is a schematic diagram of an object recognition apparatus according to an embodiment of the present invention. As shown in fig. 6, the apparatus 600 for identifying an object may include: an acquisition unit 602, a first determination unit 604, a second determination unit 606, and a recognition unit 608.

An obtaining unit 602, configured to obtain multiple detection frames of an object in a video, where each detection frame is used to indicate a position of the object in the video.

The first determining unit 604 is configured to determine a time frame number of the plurality of detection frames in the video, where the time frame number is used to represent a time length of each detection frame appearing in the video continuously.

A second determining unit 606, configured to determine a target detection frame among the plurality of detection frames based on the time frame number and the frame number difference threshold.

An identifying unit 608, configured to identify an object in the video based on the target detection box.

Optionally, the second determining unit 606 includes: the device comprises a first determining module, a second determining module and a judging module, wherein the first determining module is used for determining the intersection ratio of a first detection frame in a plurality of detection frames and a second detection frame in the plurality of detection frames, and the first detection frame and the second detection frame are any two detection frames in the plurality of detection frames at the same moment; responding to the fact that the intersection ratio is larger than the intersection ratio threshold value, and acquiring the time frame number of the first detection frame and the time frame number of the second detection frame; and determining a target detection frame in the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame and the frame number difference threshold.

Optionally, the first determining module includes: the first determining submodule is used for determining a first frame number difference between the time frame number of the first detection frame and the time frame number of the second detection frame; a target detection frame is determined from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold.

Optionally, the first determining sub-module is further configured to determine, as the target detection frame, a detection frame that is a long time frame of the first detection frame and the second detection frame in response to the absolute value of the first frame number difference being greater than the frame number difference threshold.

Optionally, the first determining sub-module is further configured to determine, as the target detection frame, a detection frame with a large matching degree in the first detection frame and the second detection frame in response to that the absolute value of the first frame number difference is not greater than the frame number difference threshold, where the matching degree is used to characterize a matching degree of the corresponding detection frame and the object.

Optionally, the apparatus further comprises: the third determining unit is used for acquiring the matching degree of each detection frame at each moment in the video to obtain at least one matching degree of each detection frame; and determining the quotient of the sum of the at least one matching degree of the detection frame and the number of the at least one matching degree as the target matching degree of the detection frame.

Optionally, the apparatus further comprises: a fourth determination unit configured to determine a frame number difference threshold based on the history detection data processed for the object.

In the embodiment of the invention, a plurality of detection frames of an object in a video are acquired through an acquisition unit, wherein each detection frame is used for representing the position of the object in the video; respectively determining the time frame number of a plurality of detection frames in the video through a first determination unit, wherein the time frame number is used for representing the time length of each detection frame in the video; determining, by a second determining unit, a target detection frame among the plurality of detection frames based on the number of time frames and the frame number difference threshold; by the recognition unit, an object is recognized in the video based on the object detection box. That is to say, the embodiment of the present invention achieves the purpose of filtering overlapping objects with short occurrence time by comparing the number of time frames between a plurality of detection frames, thereby achieving the technical effect of improving the accuracy of target identification and solving the technical problem of low accuracy of target identification.

Example 4

According to an embodiment of the present invention, there is also provided a computer-readable storage medium including a stored program, wherein the program executes the identification method of an object described in embodiment 1.

Example 5

According to an embodiment of the present invention, there is also provided a processor configured to execute a program, where the program executes the method for identifying an object described in embodiment 1.

Example 6

According to an embodiment of the present invention, there is also provided a vehicle for running a program, wherein the program executes the method for identifying an object described in embodiment 1 when running.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for identifying an object, comprising:

acquiring a plurality of detection frames of an object in a video, wherein each detection frame is used for representing the position of the object in the video;

respectively determining the time frame number of a plurality of detection frames in the video, wherein the time frame number is used for representing the time length of each detection frame in the video;

determining a target detection frame among the plurality of detection frames based on the time frame number and the frame number difference threshold;

identifying the object in the video based on the target detection box.

2. The method of claim 1, wherein determining a target detection box among a plurality of detection boxes based on the number of temporal frames and the frame number difference threshold comprises:

determining the intersection ratio of a first detection frame in the plurality of detection frames and a second detection frame in the plurality of detection frames, wherein the first detection frame and the second detection frame are any two detection frames in the plurality of detection frames at the same time;

responding to the fact that the intersection ratio is larger than an intersection ratio threshold value, and acquiring the time frame number of the first detection frame and the time frame number of the second detection frame;

determining the target detection frame among the first detection frame and the second detection frame based on the time frame number of the first detection frame, the time frame number of the second detection frame, and the frame number difference threshold.

3. The method of claim 2, wherein determining a target detection box from the first detection box and the second detection box based on the time frame number of the first detection box, the time frame number of the second detection box, and the frame number difference threshold comprises:

determining a first frame number difference between the time frame number of the first detection frame and the time frame number of the second detection frame;

determining the target detection frame from the first detection frame and the second detection frame based on the first frame number difference and the frame number difference threshold.

4. The method of claim 3, wherein determining a target detection box from the first detection box and the second detection box based on the first frame number difference and the frame number difference threshold comprises:

determining the detection frame of the first detection frame and the second detection frame that is the number of time frames long as the target detection frame in response to the absolute value of the first frame number difference being greater than the frame number difference threshold.

5. The method of claim 3, wherein determining a target detection box from the first detection box and the second detection box based on the first frame number difference and the frame number difference threshold comprises:

and in response to that the absolute value of the first frame number difference is not greater than the frame number difference threshold, determining a detection frame with a high target matching degree in the first detection frame and the second detection frame as the target detection frame, wherein the target matching degree is used for representing the matching degree of the corresponding detection frame and the object.

6. The method of claim 1, further comprising:

acquiring the matching degree of each detection frame corresponding to each moment in the video to obtain at least one matching degree of each detection frame;

and determining the quotient between the sum of at least one matching degree of the detection frame and the number of the at least one matching degree as the target matching degree of the detection frame.

7. The method of claim 1, further comprising:

determining the frame number difference threshold based on historical detection data processed for the object.

8. A method for identifying an object, comprising:

an acquisition unit, configured to acquire a plurality of detection frames of an object in a video, where each detection frame is used to represent a position of the object in the video;

a first determining unit, configured to determine time frame numbers of a plurality of detection frames in the video respectively, where the time frame numbers are used to characterize a time length of each detection frame appearing in the video continuously;

a second determining unit, configured to determine a target detection frame among the plurality of detection frames based on the time frame number and the frame number difference threshold;

an identification unit configured to identify the object in the video based on the target detection box.

9. A computer-readable storage medium, comprising a stored program, wherein the program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of any one of claims 1 to 7.

10. A vehicle, characterized by being configured to perform the method of any one of claims 1 to 7.