CN113505732A

CN113505732A - Visual target determination method and device, storage medium and electronic device

Info

Publication number: CN113505732A
Application number: CN202110845758.5A
Authority: CN
Inventors: 陈向阳; 李冬冬; 李乾坤; 殷俊; 王凯
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2021-10-15

Abstract

The invention discloses a method and a device for determining a visual target, a storage medium and an electronic device, wherein the method comprises the following steps: detecting an interested area of the image through a radar sensor to obtain first information of a radar target in the interested area, and detecting the interested area through an image sensor to obtain a plurality of second information of a plurality of visual targets in the interested area; respectively calculating target distances of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information, and respectively calculating depth difference values of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information; and determining a first visual target matched with the radar target in the plurality of visual targets according to the target distance and the depth difference value.

Description

Visual target determination method and device, storage medium and electronic device

Technical Field

The invention relates to the field of communication, in particular to a visual target determination method and device, a storage medium and an electronic device.

Background

In modern society, security and protection are more and more valued by the public, security and protection products are also in endlessly, the application field of security and protection is also continuously expanded, and the security and protection technology is also rapidly developed. The millimeter wave radar-based area monitoring technology is a big hot spot of the last five years research. The traditional security terminal equipment is mainly a visible light camera, but the visible light camera cannot work at night; although there are drawbacks to infrared cameras that complement visible light cameras, this undoubtedly increases cost and operational difficulty. In addition, the optical sensor is also influenced by weather, and the monitoring effect cannot be satisfactory in heavy fog days or rainy and snowy days. The millimeter wave radar actively transmits electromagnetic waves and receives signals with the same frequency, so that the detection probability for moving objects or objects with large radar reflection areas is very high, and the detection probability for static objects is low. The millimeter wave radar can work 24 hours all day, and is less influenced by weather.

The millimeter wave radar can monitor various targets, extract the targets in which the user is interested from the various targets, and terminate/filter the targets or false targets in which the user is not interested as soon as possible. One of the purposes of object trajectory classification is to filter/filter objects. For example, in a park, a 3-level wind is occasionally blown, trees shake to form a low-speed target track moving in a small range, the target type is a non-human, non-vehicle or non-animal target, and the target type does not need to be reported or a track ending method is called as soon as possible to delete the target track. For example, if the user is interested in a person or a car, and if a small dog passes through the garden, the trajectory should be terminated in time since it is not the target of the user. If the track is formed by the pedestrian, the radar outputs the track information of the pedestrian to the camera, and the camera takes pictures or records the pictures according to the track space position information provided by the radar.

In the prior art, when matching a radar target and a visual camera target, coordinates of a center point of a circumscribed rectangular frame of the visual target to be visually detected need to be traversed, where a point closest to a projection point of the radar target in a visual camera coordinate system is regarded as a target detected by both a radar and a camera.

Aiming at the problems that in the related art, the matching of a visual target and a radar target is only carried out through the distance in the matching process, the matching accuracy of the visual target and the radar target is directly influenced, and the like, an effective solution is not provided.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a visual target, a storage medium and an electronic device, which are used for at least solving the problems that in the related technology, the matching of the visual target and a radar target is only carried out through the distance in the matching process, the matching accuracy of the visual target and the radar target is directly influenced, and the like.

According to an embodiment of the present invention, there is provided a method of determining a visual target, including: detecting an interested area of an image through a radar sensor to obtain first information of a radar target in the interested area, and detecting the interested area through an image sensor to obtain a plurality of second information of a plurality of visual targets in the interested area, wherein the first information comprises: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor; calculating target distances of the radar target and the plurality of visual targets respectively according to the first position information and the plurality of second position information, and calculating depth difference values of the radar target and the plurality of visual targets respectively according to the first depth information and the plurality of second depth information; and determining a first visual target matched with the radar target in a plurality of visual targets according to the target distance and the depth difference value.

In an exemplary embodiment, determining a first visual target among the plurality of visual targets based on the distance and the depth difference comprises: respectively carrying out weighted summation on the target distance and the depth difference value to obtain a plurality of first numerical values; and determining a first numerical value with the minimum numerical value in the plurality of first numerical values as target data, and determining a visual target corresponding to the target data as a first visual target.

In an exemplary embodiment, the weighted summation of the target distance and the depth difference value respectively to obtain a plurality of first values includes: the first value K is determined by the following formula: k ═ α × m + β × n, where α is a non-negative number and a weighted value corresponding to the target distance, β is a non-negative number and a weighted value corresponding to the depth difference, m is the target distance, and n is the depth difference.

In an exemplary embodiment, calculating target distances of the radar target and the plurality of visual targets from the first position information and the plurality of second position information, respectively, includes: determining the target distance m by the following formula:

calculating depth difference values of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information respectively, wherein the depth difference values comprise: determining the depth difference n according to the following formula:

wherein u is_ciIs the abscissa value, u, of the visual target_riIs the abscissa value, v, of the radar target_ciIs the ordinate value, v, of the visual target_riIs the ordinate value of the radar target, d_ciFirst depth information for the visual object, d_riSecond depth information for the radar target.

In an exemplary embodiment, after the detecting the region of interest of the image by the radar sensor to obtain the first information of the radar target in the region of interest, and before the detecting the region of interest by the image sensor to obtain the second information of the visual targets in the region of interest, the method further includes: mapping the radar target to a visual image acquired by an image sensor according to the first position information of the radar target; determining a central point of the radar target in a visual image, and constructing the region of interest on the visual image by taking the central point of the radar target as a center; and detecting the interested region through a target detection model to obtain the plurality of visual targets.

In an exemplary embodiment, before obtaining the first information of the radar target and the second information of the plurality of visual targets of the region of interest, the method further comprises: acquiring radar data acquired by the radar sensor and image data acquired by the image sensor, wherein the radar data comprises: first temporal data, the image data comprising: second time data; and combining the target radar data and the target image data of which the time difference between the first time data and the second time data is smaller than a preset threshold value to obtain multimedia data, wherein the region of interest is indicated in the multimedia data.

According to another embodiment of the present invention, there is also provided a visual target determining apparatus, including an obtaining module, configured to detect a region of interest of an image through a radar sensor, to obtain first information of the radar target in the region of interest, and detect the region of interest through an image sensor, to obtain a plurality of second information of a plurality of visual targets in the region of interest, where the first information includes: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor; a calculating module, configured to calculate target distances of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information, and calculate depth differences of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information; and the determining module is used for determining a first visual target matched with the radar target in a plurality of visual targets according to the target distance and the depth difference value.

In an exemplary embodiment, the determining module is further configured to perform weighted summation on the target distance and the depth difference value respectively to obtain a plurality of first values; and determining a first numerical value with the minimum numerical value in the plurality of first numerical values as target data, and determining a visual target corresponding to the target data as a first visual target.

According to a further embodiment of the present invention, a computer-readable storage medium is also provided, in which a computer program is stored, wherein the computer program is configured to carry out the steps of any of the above-described method embodiments when executed.

According to yet another embodiment of the present invention, there is also provided an electronic device, including a memory in which a computer program is stored and a processor configured to execute the computer program to perform the steps in any of the above method embodiments.

By the invention, a method for determining a visual target is provided, which comprises the following steps: detecting an interested area of an image through a radar sensor to obtain first information of a radar target in the interested area, and detecting the interested area through an image sensor to obtain a plurality of second information of a plurality of visual targets in the interested area, wherein the first information comprises: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor; calculating target distances of the radar target and the plurality of visual targets respectively according to the first position information and the plurality of second position information, and calculating depth difference values of the radar target and the plurality of visual targets respectively according to the first depth information and the plurality of second depth information; according to the target distance and the depth difference value, a first visual target matched with the radar target is determined in the plurality of visual targets, namely the first visual target matched with the radar target is determined according to the distance between the radar target and the plurality of visual targets and the depth difference value between the radar target and the plurality of visual targets.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a mobile terminal of a method for determining a visual target according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of visual target determination according to an embodiment of the present invention;

FIG. 3 is a schematic matching diagram of a method of visual target determination according to an alternative embodiment of the present invention;

FIG. 4 is a flow chart of a method for visual target determination in accordance with an alternative embodiment of the present invention;

fig. 5 is a block diagram of a device for determining a visual target according to an embodiment of the present invention.

Detailed Description

The invention will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method provided by the embodiment of the application can be executed in a mobile terminal or a similar operation device. Taking the example of the application in a mobile terminal, fig. 1 is a block diagram of a hardware structure of the mobile terminal of a method for determining a visual target according to an embodiment of the present invention. As shown in fig. 1, the mobile terminal may include one or more (only one shown in fig. 1) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA) and a memory 104 for storing data, which in an exemplary embodiment may also include a transmission device 106 for communication functions and an input-output device 108. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration, and does not limit the structure of the mobile terminal. For example, the mobile terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration with equivalent functionality to that shown in FIG. 1 or with more functionality than that shown in FIG. 1.

The memory 104 may be used for storing computer programs, for example, software programs and modules of application software, such as a computer program corresponding to the method for determining the visual target of the mobile terminal in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the computer program stored in the memory 104, so as to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the mobile terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the mobile terminal. In one example, the transmission device 106 includes a Network adapter (NIC), which can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In this embodiment, a method for determining a visual target is provided, which is applied to the mobile terminal described above, and fig. 2 is a flowchart of a method for determining a visual target according to an embodiment of the present invention, where the flowchart includes the following steps:

step S202, detecting an area of interest of an image through a radar sensor to obtain first information of a radar target in the area of interest, and detecting the area of interest through an image sensor to obtain a plurality of second information of a plurality of visual targets in the area of interest, wherein the first information comprises: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor;

step S204, respectively calculating target distances of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information, and respectively calculating depth difference values of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information;

step S206, according to the target distance and the depth difference value, a first visual target matched with the radar target is determined in the plurality of visual targets.

Through the steps, the radar sensor is used for detecting the region of interest of the image to obtain first information of the radar target in the region of interest, and the image sensor is used for detecting the region of interest to obtain a plurality of second information of a plurality of visual targets in the region of interest, wherein the first information comprises: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor; calculating target distances of the radar target and the plurality of visual targets respectively according to the first position information and the plurality of second position information, and calculating depth difference values of the radar target and the plurality of visual targets respectively according to the first depth information and the plurality of second depth information; according to the target distance and the depth difference value, a first visual target matched with the radar target is determined in the plurality of visual targets, namely the first visual target matched with the radar target is determined according to the distance between the radar target and the plurality of visual targets and the depth difference value between the radar target and the plurality of visual targets.

It should be noted that the first position information is preferably position information of a center point of the radar target, the second position information is preferably position information of a center point of the visual target, and when the radar target, and/or the visual target indicates an irregularly-shaped object, the radar target, and/or the visual target is configured with a minimum circumscribed rectangle, and the position information of the center point of the minimum circumscribed rectangle is used as the position information of the radar target, and/or the center point of the visual target.

Specifically, first depth information of the radar target is determined by the radar sensor, and second depth information of the visual target is determined by using a depth neural network model.

That is to say, traversing all visual targets in the region of interest, calculating a depth information difference between the second depth information of any one of the visual targets and the first depth information of the radar target, and a distance between the visual target and the radar target, performing weighted summation on the depth difference and the distance to obtain a plurality of first values, sorting the plurality of first values, determining the visual target corresponding to the smallest first value in the first values as the first visual target, and at this time, determining that the object corresponding to the first visual target is consistent with the object corresponding to the radar target.

In the embodiment of the present invention, the first value K is determined by the following formula, that is, the depth difference and the distance are weighted and summed, and the weighted values α and β are determined by the requirements of the scene.

In an exemplary embodiment, the first location information is based on a plurality of the first location informationSecond position information for calculating target distances of the radar target and the plurality of visual targets, respectively, including: determining the target distance m by the following formula:

In particular, the distance m between the radar target and the visual target is defined by

Determination of u_ciIs the abscissa value, u, of the visual target_riIs the abscissa value, v, of the radar target_ciIs the ordinate value, v, of the visual target_riIs an ordinate value of the radar target, further, u_ciPreferably the abscissa value, u, of the center point of said visual target_riIs an abscissa value, v, of the center point of the radar target_ciIs the ordinate value, v, of the center point of the visual target_riThe longitudinal coordinate value of the center point of the radar target is taken as the coordinate value; the depth difference n is determined by

Determination of d_ciFirst depth information for the visual object, d_riFor the second depth information of the radar target, further, the depth difference n can be further represented by | d_ci-d_riAnd l is determined.

In an exemplary embodiment, after a region of interest of an image is detected by a radar sensor to obtain first information of a radar target in the region of interest, and before a plurality of second information of a plurality of visual targets in the region of interest are obtained by detecting the region of interest by an image sensor, the radar target is mapped onto a visual image acquired by the image sensor according to first position information of the radar target; determining a central point of the radar target in a visual image, and constructing the region of interest on the visual image by taking the central point of the radar target as a center; and detecting the interested region through a target detection model to obtain the plurality of visual targets.

That is to say, the radar target is mapped on the visual image through the mapping function, an interested area with a specified size is constructed by taking the radar target as the center, more than one visual target may be in the interested area, and the interested area is subjected to target detection through the target detection model to obtain one or more visual targets. The target detection model can be a Yolo v4 model, and compared with a Yolo v3 model, the Yolo v4 model has higher detection precision and higher detection speed.

In an exemplary embodiment, the acquiring the first information of the radar target and the second information of the plurality of visual targets of the region of interest is preceded by acquiring radar data acquired by the radar sensor and image data acquired by the image sensor, wherein the radar data includes: first temporal data, the image data comprising: second time data; and combining the target radar data and the target image data of which the time difference between the first time data and the second time data is smaller than a preset threshold value to obtain multimedia data, wherein the region of interest is indicated in the multimedia data.

Specifically, radar data acquired by a radar sensor and image data acquired by an image sensor are acquired. In the process of acquiring radar data and image data, data delay has a significant influence on the labeling result. In the process of transmitting the radar data and the image data from the image sensor and the radar sensor to the SOC of the computer processor, the number of intermediate nodes is as small as possible; and according to the first time data and the second time data in the radar data and the image data, binding the target radar data and the target image data, of which the time difference between the first time data and the second time data is smaller than a preset threshold value, so as to realize the time synchronization of the target radar data and the target image data.

Specifically, time synchronization is divided into software time synchronization and hardware time synchronization. The software time synchronization method has larger error, but has low cost and is flexible and configurable. The hardware time synchronization method has small error, but high cost, needs additional design circuit and is not easy to modify. For a low-speed target tracking scene, a software time synchronization method is adopted; and for a high-speed target tracking scene, a hardware time synchronization method is adopted. After the time synchronization process is executed, the information in the target radar data and the target image data is almost data at the same time, and the target radar data and the target image data are bound and then output together.

In order to better understand the process of the above method for determining a visual target, the following describes a flow of the above method for determining a visual target with reference to an optional embodiment, but the flow is not limited to the technical solution of the embodiment of the present invention.

To better describe the process of the method of determining the visual target above, the following terms are first explained:

monocular depth estimation: using the depth neural network model MonoDepth2, the distance between the visual target acquired by the monocular camera and the image sensor was estimated.

Millimeter wave radar: setting a measurement period, namely a signal transceiving period, as required, generally setting the measurement period to be 0.1 second, the operating frequency to be 10HZ, and obtaining measurement information of a radar target by the millimeter wave radar, wherein the measurement information includes at least one of the following: distance, angle, radial velocity (RadialSpeed), radar reflection area (RCS).

In an alternative embodiment, a millimeter wave radar (corresponding to the radar sensor in the above-mentioned embodiment) acquires the speed and position of the radar target, and a camera (corresponding to the image capturing device in the above-mentioned embodiment) acquires the appearance detail information of the visual target. For a target object in the real world, the target object is represented in the form of point cloud coordinates in a radar coordinate system, and in a visual image, the visual image is processed through a neural network model (corresponding to the target detection model in the above embodiment), and pixels containing the target object are marked by a rectangular frame. If the target object is observed by the millimeter wave radar and the camera at the same time, the appearance detail information of the target object can be obtained and the speed and position information can be obtained by performing target matching on the radar coordinate system point cloud target and the camera coordinate system pixel target.

Fig. 3 is a matching schematic diagram of the method for determining a visual target according to the alternative embodiment of the present invention, and as shown in fig. 3, a solid line rectangle on the periphery represents a visual image, a five-pointed star represents a projection of a radar target on the visual image after being transformed by a mapping function, an area of interest to be detected with a specified size is constructed with the five-pointed star as the center, and a dashed line rectangle represents the area of interest to be detected. As shown in fig. 3, a circle represents two different visual targets, and a circular circumscribed rectangle is a visual target obtained by performing target detection on the region of interest to be detected through a neural network model.

Further, the center point (u) of the rectangular frame corresponding to the visual target is determined_ci，v_ci) As position information (u) of the visual target_ci，v_ci) Traversing the position information (u) of the visual target_ci，v_ci) Calculating it and the radar target center point (u)_ri，v_ri) Is a distance of

And obtaining depth information of the radar target obtained by the measurement of the millimeter wave radar, namely the distance d from the radar target to the millimeter wave radar_riUsing a depth neural network MonoDepth2 to perform depth estimation on the visual target acquired by the camera to obtain the distance d from the visual target to the camera_ci(ii) a Traversing each visual target, calculating depthInformation d_ciAnd d_riDifference between them

And taking the visual target with the minimum weighted value as the first visual target matched with the radar target.

Fig. 4 is a flowchart of a method for determining a visual target according to an alternative embodiment of the present invention, and as shown in fig. 4, the process of the method for determining a visual target may be implemented by the following steps:

step S401: collecting radar data and video data;

and acquiring radar data acquired by the millimeter wave radar and video data acquired by the camera. In the process of acquiring radar data and video data, data delay has a significant influence on the labeling result. In the process of transmitting radar data and video data from the millimeter wave radar and the camera to the computer processor SOC, the number of intermediate nodes is reduced as few as possible, and therefore data transmission efficiency is improved, and time delay is reduced.

Step S402: time synchronization;

specifically, time synchronization is divided into software time synchronization and hardware time synchronization. The software time synchronization method has larger error, but has low cost and is flexible and configurable. The hardware time synchronization method has small error, but high cost, needs additional design circuit and is not easy to modify. For a low-speed target tracking scene, a software time synchronization method is adopted; and for a high-speed target tracking scene, a hardware time synchronization method is adopted. After the time synchronization process is executed, the information in the radar data and the video data is almost data at the same time, and the radar data and the image data are bound and then output together.

Step S403: extracting radar data from the bound radar data and image data;

step S404: predicting and updating radar data by using a Kalman filter, and filtering interference signals;

step S405: acquiring a radar target from the filtered radar data;

step S406: according to the coordinate system mapping function, the radar target coordinate (u)_ri，v_ri) Mapping the image data to a pixel coordinate system corresponding to the image data;

step S407: acquiring camera image data;

step S408: with (u)_ri，v_ri) Generating an interested area to be detected with a specified size for the center;

step S409: in an area of interest to be detected, performing visual target detection through a trained first deep neural network model to obtain minimum circumscribed rectangle information of a visual target, wherein the minimum circumscribed rectangle information comprises parameters such as coordinates, length and width and the like;

step S410: determining depth information d of a plurality of visual targets acquired by a camera using a second depth neural network model_ci；

Step S411: traversing the center point (u) of the rectangular box of the visual target_ci，v_ci) Calculating it and radar mapping target center point (u)_ri，v_ri) Is a distance of

Step S412: calculating a difference between the depth information of each visual target and the depth information of the radar target provided by the radar

Step S413: weighted summation is performed on the distance and depth difference values found in step S411 and step S412,

taking a first visual target matched with the radar target by the visual target with the minimum weighted value;

step S414: and (6) ending.

In the embodiment of the invention, when the coordinates of the center point of the circumscribed rectangle of the visual target are traversed, the depth information of the data target acquired by the monocular camera is estimated by using the depth neural network model, the distance between the projection point of the radar target in the coordinate system of the visual camera and the center point of each rectangular frame is calculated, the depth difference value between the depth of the monocular visual target and the depth of the radar target is calculated, the distance and the depth difference value are weighted and summed, and the visual target with the minimum weighted value is taken as the first visual target matched with the radar target.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a device for determining a visual target is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

FIG. 5 is a block diagram of an apparatus for determining a visual target according to an embodiment of the present invention; as shown in fig. 5, includes:

an obtaining module 52, configured to detect a region of interest of an image through a radar sensor to obtain first information of a radar target in the region of interest, and detect the region of interest through an image sensor to obtain a plurality of second information of a plurality of visual targets in the region of interest, where the first information includes: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor;

a calculating module 54, configured to calculate target distances of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information, and calculate depth difference values of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information;

a determining module 56, configured to determine, according to the target distance and the depth difference, a first visual target matching the radar target from among a plurality of visual targets.

Through the above modules, detecting an area of interest of an image through a radar sensor to obtain first information of a radar target in the area of interest, and detecting the area of interest through an image sensor to obtain a plurality of second information of a plurality of visual targets in the area of interest, wherein the first information includes: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor; calculating target distances of the radar target and the plurality of visual targets respectively according to the first position information and the plurality of second position information, and calculating depth difference values of the radar target and the plurality of visual targets respectively according to the first depth information and the plurality of second depth information; according to the target distance and the depth difference value, a first visual target matched with the radar target is determined in the plurality of visual targets, namely the first visual target matched with the radar target is determined according to the distance between the radar target and the plurality of visual targets and the depth difference value between the radar target and the plurality of visual targets.

In an exemplary embodiment, the calculation module is further configured to determine the first value K by: k ═ α × m + β × n, where α is a non-negative number and a weighted value corresponding to the target distance, β is a non-negative number and a weighted value corresponding to the depth difference, m is the target distance, and n is the depth difference.

In an exemplary embodiment, the calculation module is further configured to determine the target distance by the following formula

wherein uci is the abscissa value of the visual target, u_riIs the abscissa value, v, of the radar target_ciIs the ordinate value, v, of the visual target_riIs the ordinate value of the radar target, d_ciFirst depth information for the visual object, d_riSecond depth information for the radar target.

Determination of u_ciIs the abscissa value, u, of the visual target_riIs the abscissa value, v, of the radar target_ciIs the ordinate value, v, of the visual target_riIs an ordinate value of the radar target, further, u_ciPreferably the abscissa value of the center point of the visual object,u_riis an abscissa value, v, of the center point of the radar target_ciIs the ordinate value, v, of the center point of the visual target_riThe longitudinal coordinate value of the center point of the radar target is taken as the coordinate value; the depth difference n is determined by

In an exemplary embodiment, the determining module is further configured to map the radar target onto a visual image acquired by an image sensor according to the first position information of the radar target; determining a central point of the radar target in a visual image, and constructing the region of interest on the visual image by taking the central point of the radar target as a center; and detecting the interested region through a target detection model to obtain the plurality of visual targets.

That is to say, a radar target is mapped on a visual image through a mapping function, an area of interest with a specified size is constructed by taking the radar target as a center, more than one visual target may be in the area of interest, the area of interest is subjected to target detection through a target detection model to obtain one or more visual targets, the target detection model may be a Yolo v4 model, and compared with a Yolo v3 model, the Yolo v4 model has higher detection accuracy and higher detection speed.

In an exemplary embodiment, the acquisition module is further configured to acquire radar data acquired by the radar sensor and image data acquired by the image sensor, where the radar data includes: first temporal data, the image data comprising: second time data; and combining the target radar data and the target image data of which the time difference between the first time data and the second time data is smaller than a preset threshold value to obtain multimedia data, wherein the region of interest is indicated in the multimedia data.

An embodiment of the present invention further provides a storage medium including a stored program, wherein the program executes any one of the methods described above.

In an exemplary embodiment, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

s1, detecting a region of interest of an image by a radar sensor to obtain first information of a radar target in the region of interest, and detecting the region of interest by an image sensor to obtain a plurality of second information of a plurality of visual targets in the region of interest, where the first information includes: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor;

s2, respectively calculating the target distance of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information, and respectively calculating the depth difference value of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information;

s3, determining a first visual target matched with the radar target in the plurality of visual targets according to the target distance and the depth difference value.

In an exemplary embodiment, in the present embodiment, the storage medium may include, but is not limited to: various media capable of storing computer programs, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

In an exemplary embodiment, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

In an exemplary embodiment, in the present embodiment, the processor may be configured to execute the following steps by a computer program:

s1, detecting the interested area of the image through a radar sensor to obtain first information of a radar target in the interested area, and detecting the interested area through an image sensor to obtain a plurality of second information of a plurality of visual targets in the interested area,

wherein the first information comprises: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor;

In an exemplary embodiment, for specific examples in this embodiment, reference may be made to the examples described in the above embodiments and optional implementation manners, and details of this embodiment are not described herein again.

It will be apparent to those skilled in the art that the various modules or steps of the invention described above may be implemented using a general purpose computing device, which may be centralized on a single computing device or distributed across a network of computing devices, and in one exemplary embodiment may be implemented using program code executable by a computing device, such that the steps shown and described may be executed by a computing device stored in a memory device and, in some cases, executed in a sequence different from that shown and described herein, or separately fabricated into individual integrated circuit modules, or multiple ones of them fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for determining a visual target, comprising:

detecting an interested area of an image through a radar sensor to obtain first information of a radar target in the interested area, and detecting the interested area through an image sensor to obtain a plurality of second information of a plurality of visual targets in the interested area, wherein the first information comprises: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor;

calculating target distances of the radar target and the plurality of visual targets respectively according to the first position information and the plurality of second position information, and calculating depth difference values of the radar target and the plurality of visual targets respectively according to the first depth information and the plurality of second depth information;

and determining a first visual target matched with the radar target in a plurality of visual targets according to the target distance and the depth difference value.

2. The method of claim 1, wherein determining a first visual target among a plurality of visual targets based on the target distance and the depth difference comprises:

respectively carrying out weighted summation on the target distance and the depth difference value to obtain a plurality of first numerical values;

and determining a first numerical value with the minimum numerical value in the plurality of first numerical values as target data, and determining a visual target corresponding to the target data as a first visual target.

3. A method for determining a visual target according to claim 2, wherein the weighted summation of the target distance and the depth difference value respectively to obtain a plurality of first values comprises:

the first value K is determined by the following formula:

k ═ α × m + β × n, where α is a non-negative number and a weighted value corresponding to the target distance, β is a non-negative number and a weighted value corresponding to the depth difference, m is the target distance, and n is the depth difference.

4. The method for determining a visual target according to claim 1, wherein calculating target distances of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information comprises:

determining the target distance m by the following formula:

calculating depth difference values of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information respectively, wherein the depth difference values comprise:

determining the depth difference n according to the following formula:

5. The method for determining a visual target according to claim 1, wherein after the first information of the radar target in the region of interest is obtained by detecting the region of interest of the image through a radar sensor, and before the second information of the visual targets in the region of interest is obtained by detecting the region of interest through an image sensor, the method further comprises:

mapping the radar target to a visual image acquired by an image sensor according to the first position information of the radar target;

determining a central point of the radar target in a visual image, and constructing the region of interest on the visual image by taking the central point of the radar target as a center;

and detecting the interested region through a target detection model to obtain the plurality of visual targets.

6. The method of visual target determination of claim 5, wherein prior to obtaining the first information of radar targets and the second information of the plurality of visual targets for the region of interest, the method further comprises:

acquiring radar data acquired by the radar sensor and image data acquired by the image sensor, wherein the radar data comprises: first temporal data, the image data comprising: second time data;

and combining the target radar data and the target image data of which the time difference between the first time data and the second time data is smaller than a preset threshold value to obtain multimedia data, wherein the region of interest is indicated in the multimedia data.

7. An apparatus for determining a visual target, comprising:

an obtaining module, configured to detect a region of interest of an image through a radar sensor to obtain first information of a radar target in the region of interest, and detect the region of interest through an image sensor to obtain a plurality of second information of a plurality of visual targets in the region of interest, where the first information includes: first position information of the radar target and first depth information of the radar target, the second information including: second position information of a visual target and second depth information of the visual target, wherein the first depth information is used for indicating the distance between the radar target and the radar sensor, and the second depth information is used for indicating the distance between the visual target and the image sensor;

a calculating module, configured to calculate target distances of the radar target and the plurality of visual targets according to the first position information and the plurality of second position information, and calculate depth differences of the radar target and the plurality of visual targets according to the first depth information and the plurality of second depth information;

and the determining module is used for determining a first visual target matched with the radar target in a plurality of visual targets according to the target distance and the depth difference value.

8. The apparatus for determining a visual target of claim 7, comprising:

the determining module is further configured to perform weighted summation on the target distance and the depth difference value respectively to obtain a plurality of first numerical values; and determining a first numerical value with the minimum numerical value in the plurality of first numerical values as target data, and determining a visual target corresponding to the target data as a first visual target.

9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 6.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 6 by means of the computer program.