CN117930169A

CN117930169A - Target detection method, terminal device and computer readable storage medium

Info

Publication number: CN117930169A
Application number: CN202311831193.0A
Authority: CN
Inventors: 邵池
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-04-26

Abstract

The present application relates to the field of image processing technologies, and in particular, to a target detection method, a terminal device, and a computer readable storage medium. The method comprises the following steps: acquiring radar point cloud data of a target environment at a moment t and a plurality of first images shot by the target environment, wherein each first image corresponds to one shooting visual angle; detecting a target object in the target environment according to the radar point cloud data to obtain at least one three-dimensional detection frame; detecting the target object in each first image to obtain at least one two-dimensional detection frame in each first image; matching processing is carried out according to the three-dimensional detection frame and the two-dimensional detection frame, and a matching result is obtained; and determining a final detection frame of the target object according to the matching result. By the method, the detection precision and reliability of target detection can be effectively improved.

Description

Target detection method, terminal device and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target detection method, a terminal device, and a computer readable storage medium.

Background

With the development of computer vision and perception technology, object detection has become an indispensable technology in many application fields. Target detection refers to detecting and identifying various target objects in an image, such as pedestrians, vehicles, and the like, by analyzing the image.

In a conventional target detection method, a two-dimensional image is collected by a camera, and an image detection process is performed on the two-dimensional image to detect a target object in the image. However, the existing target detection method has low detection precision and poor reliability.

Disclosure of Invention

The embodiment of the application provides a target detection method, terminal equipment and a computer readable storage medium, which can effectively improve the detection precision and reliability of target detection.

In a first aspect, an embodiment of the present application provides a target detection method, including:

acquiring radar point cloud data of a target environment at a moment t and a plurality of first images shot by the target environment, wherein each first image corresponds to one shooting visual angle;

Detecting a target object in the target environment according to the radar point cloud data to obtain at least one three-dimensional detection frame;

detecting the target object in each first image to obtain at least one two-dimensional detection frame in each first image;

Matching processing is carried out according to the three-dimensional detection frame and the two-dimensional detection frame, and a matching result is obtained;

And determining a final detection frame of the target object according to the matching result.

In the embodiment of the application, the target detection is carried out on a plurality of first images obtained by shooting different shooting visual angles of the target environment, and under the condition that the shot target object is not clear or can not be shot at a certain shooting visual angle, a more accurate target detection result can be obtained, and the reliability of the target detection result is improved; furthermore, by combining the radar point cloud data, depth information of the target object is fused, and detection accuracy of target detection can be effectively improved. By the method, the detection precision and reliability of target detection can be effectively improved.

In a possible implementation manner of the first aspect, the performing a matching process according to the three-dimensional detection frame and the two-dimensional detection frame to obtain a matching result includes:

Projecting each three-dimensional detection frame onto each first image to obtain a corresponding two-dimensional projection detection frame of each three-dimensional detection frame in each first image;

and carrying out matching processing according to the projection detection frame and the two-dimensional detection frame to obtain a matching result.

In the embodiment of the application, the data of the three-dimensional detection frame is generally smaller than the data of the two-dimensional detection frame, so that the three-dimensional detection frame is projected into the image, the data processing amount can be effectively reduced, and the target detection efficiency is improved.

In a possible implementation manner of the first aspect, the projecting each three-dimensional detection frame onto each first image to obtain a corresponding two-dimensional projection detection frame of each three-dimensional detection frame in each first image includes:

projecting the three-dimensional coordinates of each vertex of the three-dimensional detection frame onto the first image to obtain corresponding two-dimensional projection coordinates of each vertex of the three-dimensional detection frame in the first image;

And determining the projection detection frame corresponding to the three-dimensional detection frame in the first image according to the projection coordinates corresponding to each vertex of the three-dimensional detection frame in the first image.

In the embodiment of the application, the calculation is more convenient according to the projection mode of the vertexes of the three-dimensional detection frame, and the algorithm complexity is reduced.

In a possible implementation manner of the first aspect, the projecting the three-dimensional coordinate of each vertex of the three-dimensional detection frame onto the first image to obtain the two-dimensional projection coordinate corresponding to each vertex of the three-dimensional detection frame in the first image includes:

Acquiring pose data of the three-dimensional detection frame, wherein the pose data are used for representing the position and the form of the three-dimensional detection frame;

Calculating the three-dimensional coordinates of each vertex of the three-dimensional detection frame according to the pose data of the three-dimensional detection frame;

Acquiring a conversion matrix between a radar coordinate system corresponding to the radar point cloud data and a camera coordinate system corresponding to the first image;

And projecting the three-dimensional coordinates of each vertex of the three-dimensional detection frame onto the first image according to the transformation matrix to obtain the projection coordinates.

In a possible implementation manner of the first aspect, the determining the projection detection frame corresponding to the three-dimensional detection frame in the first image according to the projection coordinates corresponding to each vertex of the three-dimensional detection frame in the first image includes:

calculating the minimum abscissa, the maximum abscissa, the minimum ordinate and the maximum ordinate of the projection coordinates corresponding to all the vertexes of the three-dimensional detection frame respectively;

and determining the projection detection frame corresponding to the three-dimensional detection frame according to the minimum abscissa, the maximum abscissa, the minimum ordinate and the maximum ordinate corresponding to the three-dimensional detection frame.

In a possible implementation manner of the first aspect, the matching result includes a target detection frame corresponding to each of the projection detection frames;

The matching processing is performed according to the projection detection frame and the two-dimensional detection frame to obtain a matching result, and the matching processing comprises the following steps:

for each projection detection frame, calculating the similarity between the projection detection frame and each two-dimensional detection frame in each first image;

And acquiring target detection frames corresponding to the projection detection frames from all the two-dimensional detection frames according to the similarity.

In a possible implementation manner of the first aspect, the obtaining, according to the similarity, a target detection frame corresponding to the projection detection frame from all the two-dimensional detection frames includes:

obtaining candidate detection frames from all the two-dimensional detection frames according to the similarity, wherein the similarity corresponding to the candidate detection frames is larger than a preset threshold;

And determining a target detection frame corresponding to the projection detection frame from all the candidate detection frames according to the similarity, wherein the similarity corresponding to the target detection frame is the maximum value of the similarities corresponding to all the candidate detection frames.

In the embodiment of the application, the two-dimensional detection frame which is most matched with the three-dimensional detection frame can be determined according to the similarity between the projection detection frame and the two-dimensional detection frame of the three-dimensional detection frame by traversing the first image and the three-dimensional detection frame, so that the two-dimensional detection result which is matched with the three-dimensional detection result is determined. By the method, the organic fusion of the two-dimensional detection and the three-dimensional detection is realized, and false detection targets in the two-dimensional detection result and the three-dimensional detection result can be effectively filtered, so that the accuracy and the reliability of target detection are improved.

In a possible implementation manner of the first aspect, the detecting, according to the radar point cloud data, a target object in the target environment, to obtain at least one three-dimensional detection box includes:

Inputting the radar point cloud data into a trained three-dimensional detection model to obtain at least one three-dimensional detection frame;

the detecting the target object in each first image to obtain at least one two-dimensional detection frame in each first image comprises the following steps:

And inputting the first image into a trained two-dimensional detection model to obtain at least one two-dimensional detection frame in the first image.

In the embodiment of the application, the three-dimensional image detection is performed by using the trained three-dimensional detection model, and the two-dimensional image detection is performed by using the trained two-dimensional detection frame model, so that the detection precision can be improved, and the detection efficiency can be improved.

In a second aspect, an embodiment of the present application provides an object detection apparatus, including:

The data acquisition unit is used for acquiring radar point cloud data of a target environment at the moment t and a plurality of first images shot by the target environment, wherein each first image corresponds to one shooting visual angle;

The three-dimensional detection unit is used for detecting a target object in the target environment according to the radar point cloud data to obtain at least one three-dimensional detection frame;

the two-dimensional detection unit is used for detecting the target object in each first image to obtain at least one two-dimensional detection frame in each first image;

The result matching unit is used for carrying out matching processing according to the three-dimensional detection frame and the two-dimensional detection frame to obtain a matching result;

and the target tracking unit is used for determining a final detection frame of the target object according to the matching result.

In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the object detection method according to any one of the first aspects when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the object detection method according to any one of the first aspects.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when run on a terminal device, causes the terminal device to perform the object detection method according to any one of the first aspects above.

It will be appreciated that the advantages of the second to fifth aspects may be found in the relevant description of the first aspect, and are not described here again.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a target detection method according to an embodiment of the present application;

Fig. 2 is a schematic view of a photographing angle according to an embodiment of the present application;

FIG. 3 is a schematic diagram of vertices of a three-dimensional inspection box provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of detection frame matching provided by an embodiment of the present application;

FIG. 5 is a block diagram of an object detection apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.

Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise.

Based on the above, the embodiment of the application provides a target detection method. In the embodiment of the application, the target detection is carried out on a plurality of first images obtained by shooting different shooting visual angles of the target environment, and under the condition that the shot target object is not clear or can not be shot at a certain shooting visual angle, a more accurate target detection result can be obtained, and the reliability of the target detection result is improved; furthermore, by combining the radar point cloud data, depth information of the target object is fused, and detection accuracy of target detection can be effectively improved. By the method, the detection precision and reliability of target detection can be effectively improved.

Referring to fig. 1, a flowchart of a target detection method according to an embodiment of the present application is provided, by way of example and not limitation, and the method may include the following steps:

s101, acquiring radar point cloud data of a target environment at the moment t and a plurality of first images shot by the target environment.

Wherein each first image corresponds to a shooting visual angle.

In the embodiment of the application, in order to ensure the accuracy of detection, the first image and the radar point cloud data are acquired at the same time (such as time t).

Exemplary, referring to fig. 2, a schematic diagram of a photographing view angle according to an embodiment of the present application is shown. As an example and not by way of limitation, as shown in fig. 2, in some application scenarios, three cameras may be provided to capture the target environment from the left, middle, and right sides, respectively. The radar may be provided at the left camera, at the middle camera or at the right camera. As shown in (a) of fig. 2, an image taken by the intermediate camera; as shown in (b) of fig. 2, an image taken by the left camera; as shown in fig. 2 (c), an image captured by the right camera is shown. As can be seen from fig. 2, the target objects contained in the photographed images obtained from different photographing angles may be different. As shown in (a) and (c) of fig. 2, one photographed image contains a person and the other photographed image does not contain a person due to a difference in photographing angle.

Because the same target environment is shot through a plurality of shooting visual angles, the situation that the shooting is unclear or a target object cannot be shot due to the shooting visual angles can be effectively reduced. In addition, the characteristic information of the target environment under different angles can be obtained through shooting through a plurality of shooting visual angles, and the target detection precision is improved.

S102, detecting a target object in the target environment according to the radar point cloud data to obtain at least one three-dimensional detection frame.

In some embodiments, step S102 may include:

And inputting the radar point cloud data into the trained three-dimensional detection model to obtain at least one three-dimensional detection frame.

In the embodiment of the application, the three-dimensional detection frame can comprise the center point coordinates, the shape data and the rotation matrix of the detection frame. The rotation matrix is used for representing the gesture of the three-dimensional detection frame relative to the coordinate axes of the radar coordinate system. For example, if the three-dimensional inspection frame is a rectangular parallelepiped inspection frame, the shape data thereof may include a length value, a width value, and a height value. If the three-dimensional detection frame is a cylindrical detection frame, the shape data of the three-dimensional detection frame can comprise a bottom surface center coordinate, a height value and a bottom surface radius.

It should be noted that, in order to facilitate the subsequent projection, the shape of the three-dimensional detection frame is required to be consistent with the shape of the two-dimensional detection frame. For example, if the three-dimensional detection frame is a cuboid, the two-dimensional detection frame is a rectangle; if the three-dimensional detection frame is a cylinder, the two-dimensional detection frame is rectangular; if the three-dimensional detection frame is a sphere, the two-dimensional detection frame is circular.

In the embodiment of the application, before the three-dimensional detection model is applied, the three-dimensional detection model is trained to obtain the trained three-dimensional detection model, and the trained three-dimensional detection model is used for image detection, so that the detection precision can be improved, and the detection efficiency can be improved.

In some implementations, the process of training the three-dimensional detection model may include:

Acquiring multiple groups of sample data, wherein each group of sample data comprises radar point cloud data detected by a radar and a real tag of a target object corresponding to the sample data, and the real tag can comprise a number and/or an object class; training the three-dimensional detection model according to the sample data until the detection precision of the three-dimensional detection model reaches the preset precision, and obtaining the trained three-dimensional detection model.

In one example of a training model, sample data is input into a three-dimensional detection model, a predictive label of a detected target object is output, and the predictive label may include a confidence that the target object belongs to a certain object class; calculating a loss value according to the confidence coefficient; if the loss value is greater than or equal to a preset threshold value, updating model parameters of the three-dimensional detection model according to the loss value; and continuing to train the updated three-dimensional detection model according to the sample data until the loss value is smaller than a preset threshold value, and obtaining the trained three-dimensional detection model.

It should be noted that the foregoing is only an example of training a model, and in practical application, other training manners, such as controlling the number of iterations, etc., may also be used. In addition, as long as the model capable of realizing three-dimensional target detection can be applied to the embodiment of the present application, the specific model structure of the three-dimensional detection model is not particularly limited in the embodiment of the present application.

S103, detecting the target object in each first image to obtain at least one two-dimensional detection frame in each first image.

In some embodiments, step S103 may include:

In the embodiment of the application, the two-dimensional detection frame can comprise the center point coordinates and the shape data of the detection frame. For example, if the two-dimensional inspection frame is a rectangular inspection frame, the shape data thereof may include a length value and a height value. If the two-dimensional detection frame is a circular detection frame, the shape data of the two-dimensional detection frame can comprise a circle center coordinate and a radius.

In the embodiment of the application, the two-dimensional detection model is trained before being applied, so that the trained two-dimensional detection model is obtained, and the trained two-dimensional detection model is utilized for image detection, so that the detection precision can be improved, and the detection efficiency can be improved.

In some implementations, the process of training the two-dimensional detection model may include:

Acquiring a plurality of sample images, wherein each sample image carries a real label of a target object, and the real label can comprise a number and/or an object category; training the two-dimensional detection model according to the sample image until the detection precision of the two-dimensional detection model reaches the preset precision, and obtaining the trained two-dimensional detection model.

In one example of a training model, a sample image is input into a two-dimensional detection model, a predictive label of a detected target object is output, and the predictive label can include a confidence that the target object belongs to a certain object class; calculating a loss value according to the confidence coefficient; if the loss value is greater than or equal to a preset threshold value, updating model parameters of the two-dimensional detection model according to the loss value; and training the updated two-dimensional detection model according to the sample image until the loss value is smaller than a preset threshold value, and obtaining the trained two-dimensional detection model.

It should be noted that the foregoing is only an example of training a model, and in practical application, other training manners, such as controlling the number of iterations, etc., may also be used. In addition, as long as the model capable of realizing the two-dimensional target detection can be applied to the embodiment of the present application, the specific model structure of the two-dimensional detection model is not particularly limited in the embodiment of the present application.

And S104, carrying out matching processing according to the three-dimensional detection frame and the two-dimensional detection frame to obtain a matching result.

In the embodiment of the application, the matching processing aims to find the two-dimensional detection frame most similar to the three-dimensional detection frame. It can be understood that if the three-dimensional detection frame is matched with the two-dimensional detection frame, the target object corresponding to the three-dimensional detection frame is identical to the target object corresponding to the two-dimensional detection frame.

In some embodiments, step S104 may include:

S201, projecting each three-dimensional detection frame onto each first image to obtain a corresponding two-dimensional projection detection frame of each three-dimensional detection frame in each first image.

S202, matching processing is carried out according to the projection detection frame and the two-dimensional detection frame, and a matching result is obtained.

It can be appreciated that in other embodiments, the two-dimensional detection frames may be projected into the radar point cloud data to obtain a corresponding three-dimensional projection detection frame of each two-dimensional detection frame in the radar point cloud data; and carrying out matching processing according to the three-dimensional projection detection frame and the three-dimensional detection frame to obtain a matching result. However, since there are multiple first images, and each first image may include multiple two-dimensional detection frames, if the two-dimensional detection frames are projected into a three-dimensional detection frame, the data processing amount is large, and the target detection efficiency may be reduced. In the manner described in the above steps S201 to S202, since the data of the three-dimensional detection frame is generally smaller than the data of the two-dimensional detection frame, the three-dimensional detection frame is projected into the image, which can effectively reduce the data processing amount and facilitate the improvement of the target detection efficiency.

In some embodiments, step S201 may include:

It will be appreciated that in other embodiments, the projection may be based on the center coordinates and shape data of the three-dimensional inspection box. For example, projecting the center coordinates of the three-dimensional detection frame onto the first image; calculating the length of a line segment corresponding to each of the length value, the height value and the width value on the first image; and then determining a projection detection frame according to the projection coordinates of the center coordinates on the first image and the length of the line segment. However, this method requires a line segment calculation, which is cumbersome. The projection of the points is relatively simple, so that the projection mode is more convenient to calculate according to the vertex of the three-dimensional detection frame in the embodiment, and the algorithm complexity is reduced.

In some implementations, the step of projecting the three-dimensional coordinates onto the first image may include:

In the embodiment of the application, the pose data can comprise a central coordinate, a length value, a height value, a width value and a rotation matrix of the three-dimensional detection frame. The center coordinates, the length values, the height values and the width values are used for representing the positions of the three-dimensional detection frames, and the length values, the height values, the width values and the rotation matrix are used for representing the shapes of the three-dimensional detection frames.

Exemplary, referring to fig. 3, a schematic diagram of a vertex of a three-dimensional detection frame according to an embodiment of the present application is shown. By way of example and not limitation, for a three-dimensional inspection box as shown in fig. 3, if its pose data is (x, y, z, l, w, h, R). Wherein, (x, y, z) represents the center coordinates of the three-dimensional detection frame, l represents the length value of the three-dimensional detection frame, w represents the width value of the three-dimensional detection frame, h represents the height value of the three-dimensional detection frame, and R represents the rotation matrix of the three-dimensional detection frame. The three-dimensional coordinates of each vertex of the three-dimensional detection frame may be calculated by:

corner 0: r (x-l/2, y-w/2, z-h/2)

Corner 1: r (x+l/2, y-w/2, z-h/2)

Corner 2: r (x+l/2, y+w/2, z-h/2)

Corner 3: r (x-l/2, y+w/2, z-h/2)

Corner 4: r (x-l/2, y-w/2, z+h/2)

Corner point 5: r (x+l/2, y-w/2, z+h/2)

Corner 6: r (x+l/2, y+w/2, z+h/2)

Corner 7: r (x-l/2, y+w/2, z+h/2)

Where, represents matrix multiplication.

In the embodiment of the present application, one way to obtain the conversion matrix includes: calibrating internal parameters of a camera; calibrating a conversion matrix between an internal reference of the camera and the radar.

In some implementations, the camera is pre-calibrated to obtain camera parameters. For example, a calibration plate with a checkerboard is prepared, the size of the checkerboard is known, and the calibration plate is photographed at different angles by a camera to obtain a group of images; detecting characteristic points (such as calibration plate corner points) in the image to obtain pixel coordinate values of the calibration plate corner points, and calculating to obtain physical coordinate values of the calibration plate corner points according to the known checkerboard size and the origin of the world coordinate system; and according to the relation between the physical coordinate values and the pixel coordinate values, obtaining a camera internal parameter matrix, a camera external parameter matrix and a distortion coefficient.

It should be noted that, the calibration method of the camera internal parameter and the calibration method of the conversion matrix in the embodiment of the application are not particularly limited.

Exemplary, the specific implementation of projection is:

Knowing the radar point cloud coordinates p= (X, Y, Z), the known radar coordinate system and the camera coordinate system external reference c_t_l, the camera coordinate c= (X _c,Y_c,Z_c) can be obtained using the formula c=c_t_l×p, where×represents a matrix multiplication. The projection of the point cloud on the image can be obtained according to the following conversion formula from camera coordinates to pixel coordinates:

Where (u, v) denotes pixel coordinates, and f _x、f_y、u₀、v₀ is a camera internal reference.

In some implementations, the step of determining the projected detection frame may include:

Continuing with the example shown in FIG. 3, vertex A of the two-dimensional inspection box may be determined from a minimum abscissa (the abscissa of vertices 0 and 4 of the three-dimensional inspection box) and a minimum ordinate (the ordinate of vertices 0 and 1 of the three-dimensional inspection box); the vertex B of the two-dimensional detection frame may be determined according to a maximum abscissa (the abscissa of the vertices 2 and 6 of the three-dimensional detection frame) and a minimum ordinate; the vertex C of the two-dimensional detection frame may be determined from the minimum abscissa and the maximum ordinate (the ordinate of the vertices 6 and 7 of the three-dimensional detection frame); the vertex D of the two-dimensional detection frame may be determined according to a maximum abscissa and a maximum ordinate.

In some embodiments, the matching result includes a target detection box corresponding to each of the projection detection boxes. Accordingly, step S102 may include:

One way of calculating the similarity may be: and calculating the intersection ratio between the projection detection frame and the two-dimensional detection frame. Wherein the intersection ratio refers to the ratio of the intersection to the union between two images.

Of course, the similarity may be calculated in other ways, for example, a ratio of a coincidence area between the projection detection frame and the two-dimensional detection frame to a total area of the projection detection frame or the two-dimensional detection frame may be used as the similarity. In the embodiment of the application, the calculation mode of the similarity is not particularly limited.

In some implementations, the determining of the target detection box may include:

Exemplary, referring to fig. 4, a schematic diagram of detection frame matching provided by an embodiment of the present application is shown. By way of example and not limitation, continuing to take the application scenario shown in fig. 2 as an example, i.e. setting 3 shooting angles of view, correspondingly, 3 first images may be obtained per detection. As shown in fig. 4, for a certain three-dimensional detection frame, it is projected into 3 first images, respectively, to obtain a projected detection frame, such as detection frame 00 shown in (a) and (b) in fig. 4. Since the pixel coordinates after the projection of the three-dimensional detection frame are out of the range of the pixel coordinates of the 3 rd first image, the projection detection frame of the three-dimensional detection frame is not displayed in the 3 rd first image as shown in (c) of fig. 4.

For the first image, as shown in fig. 4 (a), two-dimensional detection frames, i.e., a detection frame 11 and a detection frame 12, are included. For the second first image, as shown in (b) of fig. 4, two-dimensional detection frames, i.e., a detection frame 21 and a detection frame 22, are included. The similarity between the projection detection frame and each two-dimensional detection frame is calculated respectively, namely, the similarity s1 between the detection frame 00 and the detection frame 11, the similarity s2 between the detection frame 00 and the detection frame 12, the similarity s3 between the detection frame 00 and the detection frame 21, and the similarity s3 between the detection frame 00 and the detection frame 22 are calculated. If the similarity s1 and the similarity s3 are greater than the preset threshold, and the similarity s2 and the similarity s4 are less than the preset threshold, the detection frame 11 corresponding to the similarity s1 and the detection frame 21 corresponding to the similarity s3 are used as candidate detection frames. Of the candidate detection frames, if s3 is greater than s1, the detection frame 21 corresponding to s3 is determined as the target detection frame with which the detection frame 00 matches.

S105, determining a final detection frame of the target object according to the matching result.

The matching result comprises a target detection frame corresponding to each projection detection frame. In some implementations, the projected detection frame included in the matching result may be determined as the final detection frame of the target object, or the target detection frame included in the matching result may be determined as the final detection frame of the target object.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Corresponding to the target detection method described in the above embodiments, fig. 5 is a block diagram of the structure of the target detection apparatus provided in the embodiment of the present application, and for convenience of explanation, only the portion related to the embodiment of the present application is shown.

Referring to fig. 5, the apparatus includes:

the data acquisition unit 51 is configured to acquire radar point cloud data of a target environment at a time t, and a plurality of first images obtained by capturing the target environment, where each of the first images corresponds to one capturing view angle.

And the three-dimensional detection unit 52 is configured to detect a target object in the target environment according to the radar point cloud data, so as to obtain at least one three-dimensional detection frame.

And a two-dimensional detection unit 53, configured to detect the target object in each of the first images, so as to obtain at least one two-dimensional detection frame in each of the first images.

And a result matching unit 54, configured to perform matching processing according to the three-dimensional detection frame and the two-dimensional detection frame, so as to obtain a matching result.

And the target tracking unit 55 is used for determining a final detection frame of the target object according to the matching result.

Optionally, the result matching unit 54 is further configured to:

Optionally, the matching result includes a target detection frame corresponding to each projection detection frame. Accordingly, the result matching unit 54 is further configured to:

Optionally, the result matching unit 54 is further configured to:

Optionally, the three-dimensional detection unit 52 is further configured to:

Optionally, the two-dimensional detection unit 53 is further configured to:

It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.

In addition, the object detection device shown in fig. 5 may be a software unit, a hardware unit, or a unit combining soft and hard, which are built in an existing terminal device, or may be integrated into the terminal device as an independent pendant, or may exist as an independent terminal device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

Fig. 6 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 6, the terminal device 6 of this embodiment includes: at least one processor 60 (only one shown in fig. 6), a memory 61, and a computer program 62 stored in the memory 61 and executable on the at least one processor 60, the processor 60 implementing the steps in any of the various target detection method embodiments described above when executing the computer program 62.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 6 is merely an example of the terminal device 6 and is not meant to be limiting as to the terminal device 6, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.

The Processor 60 may be a central processing unit (Central Processing Unit, CPU), the Processor 60 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 61 may in some embodiments be an internal storage unit of the terminal device 6, such as a hard disk or a memory of the terminal device 6. The memory 61 may in other embodiments also be an external storage device of the terminal device 6, such as a plug-in hard disk provided on the terminal device 6, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like. Further, the memory 61 may also include both an internal storage unit and an external storage device of the terminal device 6. The memory 61 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, other programs, etc., such as program codes of the computer program. The memory 61 may also be used for temporarily storing data that has been output or is to be output.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the respective method embodiments described above.

The embodiments of the present application provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of detecting an object, comprising:

2. The method of claim 1, wherein the performing a matching process according to the three-dimensional detection frame and the two-dimensional detection frame to obtain a matching result includes:

3. The method of claim 2, wherein projecting each of the three-dimensional detection frames onto each of the first images to obtain a corresponding two-dimensional projection detection frame of each of the three-dimensional detection frames in each of the first images comprises:

4. The method of claim 3, wherein projecting the three-dimensional coordinates of each vertex of the three-dimensional inspection frame onto the first image to obtain the corresponding two-dimensional projection coordinates of each vertex of the three-dimensional inspection frame in the first image, comprises:

5. The object detection method according to claim 3, wherein the determining the projection detection frame corresponding to the three-dimensional detection frame in the first image from the projection coordinates corresponding to each vertex of the three-dimensional detection frame in the first image includes:

6. The method of claim 2, wherein the matching result includes a target detection frame corresponding to each of the projection detection frames;

7. The method of claim 6, wherein the obtaining the target detection frame corresponding to the projection detection frame from all the two-dimensional detection frames according to the similarity comprises:

8. The method of claim 1, wherein detecting the target object in the target environment according to the radar point cloud data to obtain at least one three-dimensional detection frame comprises:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 8 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 8.