CN117636231A

CN117636231A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN117636231A
Application number: CN202210998109.3A
Authority: CN
Inventors: 王陈
Original assignee: Zhejiang Uniview Technologies Co Ltd
Current assignee: Zhejiang Uniview Technologies Co Ltd
Priority date: 2022-08-19
Filing date: 2022-08-19
Publication date: 2024-03-01

Abstract

The embodiment of the application discloses a target detection method, a target detection device, electronic equipment and a storage medium. The method comprises the following steps: acquiring an image of a building through a depth camera to obtain a depth image, and determining the position information of a target in the depth image; determining the position relation between the target and the predicted landing area according to the position information of the target; the predicted landing area is an area through which the predicted object passes when thrown from the building; if the target is located within the predicted drop zone, the target is determined to be a target thrown from the building. According to the technical scheme, the interference of the interference object in the environment on the target detection can be reduced, the problem of misjudgment of the high terrorist object is avoided, and therefore the accuracy and the instantaneity of detecting the object thrown from the building are improved.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of security protection, in particular to a target detection method, a target detection device, electronic equipment and a storage medium.

Background

High altitude parabolic is called "pain suspended above a city", and high altitude parabolic behavior has been attracting attention, which brings great social hazard. Management and remediation of high altitude parabolic matters are related to social security and stability and life safety of masses.

At present, in the scheme for detecting the high-altitude parabolic object, the detection scheme is greatly influenced by the environment, for example, people or objects moving in a building are targeted in a monitoring picture, the detection of the high-altitude parabolic object is influenced, and the detection accuracy is further reduced. Moreover, the current high-altitude parabolic detection scheme needs to compare a plurality of image frames in real time, and consumes a large amount of processor.

Disclosure of Invention

The application provides a target detection method, a target detection device, electronic equipment and a storage medium, so as to improve the accuracy of high-altitude casting target detection.

In one embodiment, an embodiment of the present application provides a target detection method, including:

acquiring an image of a building through a depth camera to obtain a depth image, and determining the position information of a target in the depth image;

determining the position relation between the target and the predicted landing area according to the position information of the target; wherein the predicted landing zone is a zone through which the predicted object passes when thrown from the building;

and if the target is positioned in the predicted falling area, determining that the target is a target thrown from the building.

In one embodiment, an embodiment of the present application provides an object detection apparatus, including:

The position information determining module is used for acquiring images of the building through the depth camera to obtain a depth image and determining position information of a target in the depth image;

the position relation determining module is used for determining the position relation between the target and the predicted landing area according to the position information of the target; wherein the predicted landing zone is a zone through which the predicted object passes when thrown from the building;

and the throwing target determining module is used for determining the target to be thrown out of the building if the target is positioned in the predicted falling area.

In one embodiment, an embodiment of the present application provides an electronic device, including:

one or more processors;

a storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the target detection method as described in any of the embodiments above.

In one embodiment, the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method as described in any of the above embodiments.

The embodiment of the application provides a target detection method, a device, electronic equipment and a storage medium, and the implementation scheme is that a depth camera is used for acquiring an image of a building to obtain a depth image, and position information of a target in the depth image is determined; determining the position relation between the target and the predicted landing area according to the position information of the target; the predicted landing area is an area through which the predicted object passes when thrown from the building; if the target is located within the predicted drop zone, the target is determined to be a target thrown from the building. According to the technical scheme, the target can be detected from the three-dimensional space by determining the position relation between the target and the predicted landing area, so that the interference of other objects in the environment on the target detection is reduced, and the accuracy and the instantaneity of the target detection are improved.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is a flow chart of a target detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a predicted landing zone provided in one embodiment of the present application;

FIG. 3 is a flowchart of another object detection method according to an embodiment of the present disclosure;

FIG. 4 is a schematic view of an image area of a building provided in an embodiment of the present application;

FIG. 5 is a flowchart of a further object detection method according to an embodiment of the present application;

FIG. 6 is a schematic plan view of a building according to an embodiment of the present application;

fig. 7 is a schematic view of a high altitude parabolic dish according to an embodiment of the present disclosure;

FIG. 8 is a schematic structural diagram of an object detection device according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and not limiting thereof. Furthermore, embodiments and features of embodiments in this application may be combined with each other without conflict. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present application are shown in the drawings.

Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts steps as a sequential process, many of the steps may be implemented in parallel, concurrently, or with other steps. Furthermore, the order of the steps may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

For a better understanding of the embodiments of the present application, the following description of the related art is provided.

Fig. 1 is a flowchart of a target detection method provided in an embodiment of the present application, where the embodiment of the present application may be applicable to a case of detecting a target. Typically, the embodiment of the application may be applied to a case of determining whether an object in a depth image is a high-altitude parabolic object. Specifically, the object detection method may be performed by an object detection device, which may be implemented in software and/or hardware and integrated in an electronic apparatus.

As shown in fig. 1, the method specifically includes the following steps:

S110, acquiring an image of a building through a depth camera to obtain a depth image, and determining the position information of a target in the depth image.

The target may be a target appearing in the depth image, and the target may be obtained by filtering according to depth information of each pixel point in the depth image, for example, performing preliminary filtering according to the depth information of each pixel point in the depth image, and determining a depth value of a pixel point belonging to the stationary object. Specifically, the depth value of each pixel point in at least two frames of depth images acquired by the depth camera is analyzed, if the depth value of each pixel point in at least two frames of depth images or in the depth images acquired within a preset time period does not change, determining that the area where the pixel point with the unchanged depth value is the area where the stationary object is located, and recording the depth value of the corresponding pixel point in the area where the stationary object is located. In the actual detection process, an object corresponding to a pixel point with a depth value smaller than the recorded depth value is determined as a target. For example, for a certain pixel point (20, 10) in the depth image, the depth value of the pixel point is not changed in at least two frames of the depth image or in the depth image acquired in the time period of the preset duration, and is 20, and the depth value 20 of the pixel point of the region where the static object is located is recorded. In the currently acquired depth image, if the depth value of the pixel point becomes 15, the pixel point can be determined to be the pixel point of the target appearing before the building, and the region formed by the pixel points meeting the above conditions is determined to be the target. Specifically, for a plurality of pixel points meeting the above conditions, which appear in the depth image, a clustering algorithm may be used to perform clustering, and each cluster is used as a target, so as to distinguish a plurality of targets. The scheme can detect the situation that the target is thrown off the building. The depth camera may be provided around the building, for example, in a front, rear, side, or other position of the building. In this embodiment of the present application, the depth camera may be disposed on the ground or at a position closer to the ground, so that a difference between a depth value of a building in the depth image and a depth value of the ground is large, so as to facilitate recognition of the building. One relative position of the depth camera and the building is shown in fig. 2, and fig. 2 shows a case where the depth camera is provided on the front surface of the building, and the building is photographed from the front surface of the building. If an object is thrown from a building, such as the bottle of FIG. 2, the depth camera may acquire a depth image containing the thrown object, and further determine whether the object is a thrown object from the building based on the depth image.

And the depth camera acquires images of the building to obtain depth images. The depth image may include objects in a background area outside of the building. The depth image may determine coordinates of the target in a three-dimensional coordinate system with the depth camera as an origin, the image of the depth image in the xoy plane is a two-dimensional image, and the value of the z-axis is a depth value. In the embodiment of the present application, for easy understanding, coordinate conversion may be performed to convert coordinates with the depth camera as an origin into coordinates with a point on a building as an origin, for example, a foot drop obtained by making a vertical line from the depth camera to the building, and a coordinate system is established with a point closest to the depth camera in the foot drop as an origin, as shown in fig. 2. The position information of the object can be determined according to the coordinates of the object in the three-dimensional coordinate system. By the scheme that the image collector acquires the plane image, only the plane coordinates of the target can be determined, and the specific position of the target in the three-dimensional space is difficult to determine. According to the method and the device, the depth image is acquired through the depth camera, and then the position information of the target in the three-dimensional space in the depth image is determined, so that the position of the target is determined more accurately, and the accuracy of detecting the movement of the target subsequently is improved.

S120, determining the position relation between the target and the predicted landing area according to the position information of the target; wherein the predicted landing zone is the zone through which the predicted object passes when thrown from the building.

For example, an area through which an object is thrown from a building may be predicted in advance, and the area may be regarded as a predicted landing area. As shown in fig. 2, if an object is thrown off a building, the object falls along a parabolic trajectory, and an area including the entire parabolic trajectory when the object falls is taken as a predicted landing area. It is possible that the object is thrown off from an arbitrary position of the building, and thus an area located in front of a side of the building facing the depth camera in a plane perpendicular to the ground can be regarded as a predicted landing area, as an area surrounded by a broken line in fig. 2. The extent of the predicted landing zone in the x-direction and y-direction may be determined from the extent of the plane of the building perpendicular to the ground, and the extent in the z-direction may be determined from the actual situation, for example from the value of the landing point in the z-direction when the object is thrown off the building.

Specifically, a three-dimensional coordinate range of the predicted landing area in the coordinate system, that is, a coordinate range in the x-direction, a coordinate range in the y-direction, and a coordinate range in the z-direction is determined. And determining whether the three-dimensional coordinates of the target are positioned in the three-dimensional coordinate range of the predicted landing area according to the position information of the target, namely the three-dimensional coordinates of the target in the coordinate system, so as to determine the position relationship between the target and the predicted landing area.

And S130, if the target is positioned in the predicted falling area, determining that the target is a target thrown from the building.

For example, if there is a floating or moving object between the building and the depth camera, or there is a moving object inside the building, etc., the object may affect the accuracy of target detection. For example, if an object falls inside a building, the falling track may be from top to bottom, and may be easily misinterpreted as a high-altitude parabolic object. In the embodiment of the application, according to the position relation between the target and the predicted landing area, whether the target is thrown outwards from the building is judged more accurately from the three-dimensional space, so that the interference of moving objects in the building or other moving objects in the external environment of the building is avoided. In the embodiment of the application, if the target is located in the predicted drop area, the target is determined to be a target thrown from the building.

Specifically, if the target is located inside the predicted landing zone, such as in fig. 2, the target is a bottle that is located within the predicted landing zone surrounded by the dashed line, the target is determined to be a target thrown from the building, and if the target is located outside the predicted landing zone, the target is determined not to be a target thrown from the building. Compared with the scheme of only detecting the target height change to judge whether the object is thrown out of the building, the scheme of the embodiment of the application can eliminate the interference of other height changes but not objects actually thrown out of the building, and improves the accuracy of target detection. In addition, the method detects the height change of the target in two adjacent frames of images in real time to judge whether the target is thrown out of the building or not, and needs to perform a large amount of image recognition processing and calculation in real time, so that the processing amount is large, and the detection efficiency is reduced. In the scheme, the position relation between the target and the predicted landing area is only determined according to the position information of the target, the processing amount required for determining the position relation is small, and only one frame of depth image is required to be processed, so that the efficiency of target detection is improved.

In this embodiment, further optimizing S130, if the target is located in the predicted fall area, determining that the target is a target thrown from the building includes: and if the target is positioned in the predicted falling area and the height value of the target is determined to be reduced according to at least two frames of depth images of the target, determining that the target is a target thrown from the building.

Specifically, on the basis of the scheme, in order to further verify that the target is a target thrown from a building, at least two frames of depth images containing the target can be acquired, and the height change of the target is determined according to the at least two frames of depth images. In the previous frame of depth image, identifying a target with a depth value smaller than the pixel depth value corresponding to the static object, determining a specific position of the target in the depth image, in the next frame of depth image, identifying the target with a depth value smaller than the pixel depth value corresponding to the static object again, determining a specific position of the target in the depth image, and determining that the height value of the target gradually decreases if the specific position of the target in the next frame of depth image is located below the specific position of the target in the previous frame of depth image or if the specific position of the target in the next frame of depth image is located below the specific position of the target in the previous frame of depth image and is located on the same vertical straight line, and through analysis of the multi-frame of depth images, the specific positions of the targets are all in the relation. If the target is located in the predicted fall area and the height value of the target is determined to be gradually reduced, the target is determined to fall in the predicted fall area, the target is determined to be a target thrown from a building, and interference of objects which do not fall although in the predicted fall area, such as interference of birds flying in the vicinity of the building in the predicted fall area, is excluded, thereby improving accuracy of target detection.

The embodiment of the application provides a target detection method, a device, electronic equipment and a storage medium, and the implementation scheme is that a depth camera is used for acquiring an image of a building to obtain a depth image, and position information of a target in the depth image is determined; determining the position relation between the target and the predicted landing area according to the position information of the target; the predicted landing area is an area through which the predicted object passes when thrown from the building; and if the target is positioned in the predicted falling area, determining that the target is a target thrown from the building. According to the technical scheme, the target can be detected from the three-dimensional space by determining the position relation between the target and the predicted landing area, so that the interference of other objects in the environment on the target detection is reduced, and the accuracy and the instantaneity of the target detection are improved.

Fig. 3 is a flowchart of another object detection method according to an embodiment of the present application. This embodiment is optimized based on the above embodiment. It should be noted that technical details not described in detail in this embodiment may be found in any of the above embodiments.

Specifically, as shown in fig. 3, the method specifically includes the following steps:

S210, acquiring an image of a building through a depth camera to obtain a depth image, and determining the position information of a target in the depth image.

S220, if the pixel coordinates of the target are located in the image area of the building in the depth image and the depth value of the target is located in the depth value range of the predicted landing area, determining that the target is located in the predicted landing area.

Wherein the location information includes pixel coordinates and depth values of the object. The pixel coordinates and depth values of the target may be those of a center pixel of the target. The image area of the building is an area formed by pixel points corresponding to the building in the depth image, as shown in fig. 4, and the closed area in fig. 4 is the image area of the building. The image area of the building can be screened and determined according to the depth value of each pixel point in the depth image. The range of depth values of the predicted landing zone may be determined according to practical situations, for example, assuming that there is an object that is thrown outward from a building, predicting the depth values of landing points of different thrown objects, and/or determining the range of depth values of the predicted landing zone according to the maximum and minimum values of the depth values of the landing points of the objects when throwing the same object with different forces. If the pixel coordinates of the target are located within the image area of the building, such as within the closed area in fig. 4, and the depth value is within the depth value range of the predicted landing area, then the target is determined to be within the predicted landing area. If the depth value of the object is not within the range of depth values of the predicted landing zone, it is determined that the object is not located within the predicted landing zone, but is located at another location that can be captured by the depth camera.

Specifically, in the embodiment of the present application, if the target is identified according to the depth image, it may be determined whether the pixel coordinates of the target in the depth image are located in the image area of the building, if so, it is further determined whether the depth value of the target is located in the depth value range of the predicted landing area, so as to exclude the interference of the object inside the building or the suspended object or the flying object outside the building. If not, it can be determined that the target is not a target thrown from a building, and the subsequent depth value comparison operation is not performed, so that the processing amount is reduced, and the target detection efficiency is improved.

And S230, if the target is positioned in the predicted falling area, determining that the target is a target thrown from the building.

According to the target detection method provided by the embodiment of the application, the optimization is performed on the basis of the embodiment, if the pixel coordinates of the target are located in the image area of the building in the depth image and the depth value of the target is located in the depth value range of the predicted landing area, the target is determined to be located in the predicted landing area, and therefore whether the target is thrown from the building or not is determined more comprehensively by combining the position of the target on the image plane and the depth value, and accuracy and efficiency of target detection are improved.

Fig. 5 is a flowchart of another object detection method according to an embodiment of the present application. This embodiment is optimized based on the above embodiment. It should be noted that technical details not described in detail in this embodiment may be found in any of the above embodiments.

Specifically, as shown in fig. 5, the method specifically includes the following steps:

s310, acquiring an image of the building through a depth camera to obtain a depth image.

S320, determining an image area of the building in the depth image according to gradient values of depth values between adjacent pixel points in the depth image.

For example, the depth image may include a background area, such as a photographed sky, tree, road, etc., in addition to the area where the building is located. In the embodiment of the application, the edge of the building is detected according to the depth image, and the area surrounded by the edge is used as the image area of the building, so that whether the target is thrown off the building or not can be judged later.

Specifically, the distance from the building to the depth camera is different from the distance from the background to the depth camera, i.e. the depth value of the building is different from the depth value of the background. The edges of the building and thus the image area of the building can be determined from the gradient values of the depth values. In the depth image, the depth value of the edge of the building facing the camera has a larger jump with the depth value of the surrounding background, so that the gradient value of the depth value between adjacent pixels in the depth image is determined, the pixels belonging to the edge of the building are screened out according to the gradient value, and the area surrounded by the pixels is taken as the image area of the building, as shown in fig. 4, and the closed area in fig. 4 is the image area of the building.

Specifically, filtering processing can be performed on the depth image, smoothness of the depth image is improved, noise points in the depth image are removed, and pixel points with larger depth value jump are removed. For example, if there is a pixel point in the depth image whose depth value is greater than or less than the depth value of the surrounding pixels of the pixel point and the depth value difference from the surrounding pixels is greater, the pixel point is determined to be a noise point, and the pixel point is not analyzed later. For example, if the depth value of a certain pixel is 40, and the depth values of all other pixels except the pixel are 10 in a 9×9 region centered on the pixel, or the depth values are within the range of [ 10-2,10+2 ], the pixel is determined to be a noise point, and the pixel is not analyzed later. For pixel points in the depth image, determining gradient values of depth values of adjacent pixel points, and comparing the gradient values with a preset gradient value threshold. In the embodiment of the application, adjacent pixel points are defined as pixel point pairs, a smaller gradient threshold value is preset, gradient values of depth values of the adjacent pixel points in a depth image are compared with the preset gradient threshold value, pixel point pairs with gradient values larger than the gradient threshold value are screened out, then the gradient threshold value is increased, gradient values of the screened pixel point pairs are compared with the increased gradient threshold value, pixel point pairs with gradient values larger than the increased gradient threshold value are screened out, then the gradient threshold value is increased, and the like is pushed out until the screened pixel point pairs can form a polygon, and the ratio of the area of the polygon to the area of the depth image is larger than the preset area threshold value. Specifically, when the depth camera is set, the position and the angle of the depth camera are adjusted so that the image area ratio of the building in the depth image shot by the depth camera is larger than a preset area threshold value, and therefore the object throwing behavior from the building can be monitored fully and comprehensively through the depth camera. When the image area of the building is identified according to the depth value of the pixel point, if the screened pixel point can form a polygon and the ratio of the area of the polygon to the area of the depth image is larger than a preset area threshold value, the area corresponding to the polygon is determined to be the image area of the building, so that the interference of the polygon areas such as a balcony and a window is eliminated. In another implementation scheme, for pixel points in the depth image, determining gradient values of depth values of adjacent pixel points, screening out pixel point pairs with the largest gradient values, determining whether the pixel point pairs can form a polygon, if not, screening out pixel point pairs with the next largest gradient values, determining whether the pixel point pairs with the largest gradient values and the pixel point pairs with the next largest gradient values can form a polygon, and the like until the screened pixel point pairs can form the polygon. In this scheme, although theoretically, the pixel point pair with the largest gradient value can be considered as the pixel point pair of the building edge, in practice, due to the acquisition error of the depth image, the depth value of the pixel point may have an error, and it may be difficult to form a polygon only by the pixel point pair with the largest gradient value, so that the pixel point pair screened by combining the gradient value with the next largest gradient value or further reducing the gradient value forms a polygon, thereby determining the image area of the building. The adjacent pixel points, namely the pixel point pairs, are screened according to the gradient values, and the pixel point close to the center of the polygon can be used as the edge pixel point of the building, so that the area surrounded by the edge pixel point is used as the image area of the building.

S330, determining the plane areas corresponding to different planes on the building in the image area of the building according to the gradient values of the depth values of the adjacent pixel points in the to-be-detected area in the image area of the building and the average gradient of the depth values of the adjacent pixel points in the same plane on the building in the depth image.

Wherein the area to be detected is an area of the image area that is not currently determined to correspond to a planar area of a different plane on the building. For example, as shown in fig. 4, if any planar area has not been determined, the entire image area of the building is taken as the area to be detected. As shown in fig. 6, if the planar area a has been determined, an area other than the planar area a in the image area is taken as an area to be detected.

Illustratively, as shown in FIG. 2, the walls of a building may not all be planes parallel to the xoy plane, and may be planes perpendicular to the xoz plane and at an angle to the xoy plane. The corresponding predicted landing areas are different for the wall surfaces of the building in different directions. In the embodiment of the application, according to the depth value of the pixel point in the area to be detected in the range of the building and the average gradient value of the depth value, the plane area corresponding to different planes on the building in the image area of the building is identified. For example, the building image area obtained in fig. 4 is identified to obtain each planar area, and as shown in fig. 6, three planar areas, i.e., a planar area a, a planar area B, and a planar area C, are identified. Specifically, gradient values of depth values of adjacent pixel points in one plane are the same, an average gradient of depth values of the pixel points belonging to the same plane can be determined, the pixel points in the plane are screened according to the average gradient, the pixel points belonging to the same plane are obtained, and then plane areas corresponding to different planes on a building in an image area of the building are determined.

In the embodiment of the present application, the determining process of the average gradient in S330 is optimized, and the determining process of the average gradient includes: determining the vertical average gradient of the to-be-detected area according to the depth value of the midpoint of the upper edge, the depth value of the midpoint of the lower edge and the distance between the midpoint of the upper edge and the midpoint of the lower edge of the to-be-detected area in the image area of the building; determining a horizontal average gradient according to the average value of gradient values of a right adjacent pixel point and a left adjacent pixel point of the pixel points on the middle line of the region to be detected; the middle line is a connecting line of the midpoint of the upper edge and the midpoint of the lower edge of the region to be detected; determining the average gradient from the vertical average gradient and the horizontal average gradient. Exemplary, as shown in FIG. 4, the entire range of the building is taken as the area to be detected, the point A is the midpoint of the upper edge of the area to be detected, the point B is the midpoint of the lower edge of the area to be detected, and the absolute value of the difference between the depth value of the point A and the depth value of the point B is calculatedThe ratio of the pixel distance between points a and B is taken as the vertical average gradient. The pixel distance is the distance between the pixel point corresponding to the point A and the pixel point corresponding to the point B. For example, the pixel point corresponding to the point a is (x _a ，y _a ) The pixel point corresponding to the B point is (x) _b ，y _b ) The pixel distance between the A point and the B point isAnd aiming at the connection line of the point A and the point B, calculating the difference between the depth value of the right adjacent pixel point and the depth value of the left adjacent pixel point of each pixel point on the AB connection line to be used as the gradient value corresponding to each pixel point on the AB connection line, and then calculating the average value of the gradient values corresponding to each pixel point on the AB connection line to be used as the horizontal average gradient. The sum of the vertical average gradient and the horizontal average gradient is taken as the average gradient.

In this embodiment of the present application, further optimizing S330, determining a plane area corresponding to different planes on a building in an image area of the building according to a gradient value of depth values of adjacent pixels in an area to be detected in the image area of the building and an average gradient of depth values of adjacent pixels in the same plane on the building in the depth image, includes: subtracting a preset adjusting parameter from the average gradient basis to serve as a first end point of a gradient range, adding the preset adjusting parameter from the average gradient basis to serve as a second end point of the gradient range, and determining the gradient range; taking an area formed by adjacent pixel points with gradient values in the gradient range as a plane area corresponding to the same plane on a building in an image area of the building; and taking other areas in the image area as new areas to be detected, determining the average gradient and gradient range in the new areas to be detected, and determining the plane areas corresponding to the same plane on the building in the areas to be detected until the image area of the building is divided into the plane areas corresponding to different planes on the building.

The preset adjusting parameters can be determined according to actual conditions, and an acceptable error range is indicated. Specifically, assuming that the average gradient is T, the preset adjustment parameter is k, and the gradient range is determined to be (T-k, T+k). If the gradient value is within the gradient range (T-k, T+k), the adjacent pixel points are determined to be located in the same plane on the building, and thus the area composed of the adjacent pixel points whose gradient value is within the gradient range is taken as the plane area corresponding to the same plane on the building in the image area of the building. In this way, the planar areas belonging to the same plane on the building in the area to be detected are determined. The above-described plane recognition step is continued with respect to the area of the image area of the building, which has not been determined as the plane area of the same plane on the building, as a new area to be detected until the image area of the building is divided into plane areas of different planes, and as shown in fig. 6, the image area is divided into plane areas a, B, and C.

S340, determining a predicted landing area according to the plane area and the depth value range.

Wherein the depth value range is determined according to the distance from the landing point to the building when the predicted target is thrown from the building.

Illustratively, the object thrown from the building falls in a parabolic trajectory, and the depth camera captures that the pixel coordinates of the object should be within the image area of the building when the object is thrown from the building. The landing point is located at a distance from the building, and a predetermined depth range can be determined based on the distance. Specifically, experiments can be performed with different objects in advance, different objects are thrown down from a building under the condition that the safety environment of affected personnel and objects in front of the building is guaranteed, the position of the landing point of the objects is recorded, and the depth value of the landing point relative to the depth camera is determined according to the position of the landing point. Or under the condition of ensuring the safety environment of no affected personnel and objects in front of the building, throwing different objects or the same object from the building with different forces, recording the position of the landing point of the object, and determining the depth value of the landing point relative to the depth camera according to the position of the landing point. And taking the minimum depth value as the lower limit of the depth value range, and taking the maximum depth value as the upper limit of the depth value range, so as to obtain the depth value range. Thus, the range of the predicted landing zone in the x-direction and the range in the y-direction can be determined from the planar zone, and the range in the z-direction can be determined from the range of depth values, thereby determining the predicted landing zone.

In this embodiment, further optimizing S340, determining a predicted landing area according to the planar area and the depth value range includes: and overlapping the outline of the depth camera under the view angle with the outline of the at least one plane area, and taking the area with the depth value meeting the depth value range as a predicted landing area.

For example, as shown in fig. 6, assuming that the planar area under the view angle of the depth camera is the planar area a, the planar area B, and the planar area C, the contour of at least one planar area may be determined for at least one of the planar areas, as the contour of the predicted landing area under the view angle of the depth camera, and the predicted landing area may be determined in combination with the depth value range. As shown in fig. 7, the outline of the closed area pointed by the arrow corresponding to the maximum depth value of the predicted landing area or the outline of the closed area pointed by the arrow corresponding to the minimum depth value of the predicted landing area is the outline of the predicted landing area, and the area between the closed area pointed by the arrow corresponding to the maximum depth value of the predicted landing area and the closed area pointed by the arrow corresponding to the minimum depth value of the predicted landing area is the predicted landing area.

S350, determining the position information of the target in the depth image.

S360, determining the position relation between the target and the predicted landing area according to the position information of the target; wherein the predicted landing zone is the zone through which the predicted object passes when thrown from the building.

And S370, if the target is positioned in the predicted falling area, determining that the target is a target thrown from the building.

For example, as shown in fig. 7, if the target appears in the predicted falling area and the height value decreases, the target is in a falling trend, and the target is determined to be a target thrown from a building. In the embodiment of the application, in order to improve the accuracy of target detection, the position information of the target in each frame of depth image can be determined according to at least two frames of depth images, so that the descending track of the target is determined, and if the descending track meets the parabolic track, the target is determined to be the target thrown out of the building.

According to the target detection method provided by the embodiment of the application, optimization is performed on the basis of the embodiment, and the image area of a building in the depth image is determined according to the gradient value of the depth value between adjacent pixel points in the depth image; determining plane areas corresponding to different planes on a building in an image area of the building according to gradient values of depth values of adjacent pixel points in an area to be detected in the image area of the building and average gradients of depth values of adjacent pixel points in the same plane on the building in a depth image; the area to be detected is an area which is not currently determined to be a plane area corresponding to different planes on the building in the image area; according to the plane area and the depth value range, a predicted landing area is determined, so that the plane in the building is accurately and efficiently determined automatically according to the depth image, the predicted landing area is determined to detect whether the target is a target thrown from the building, the whole process is automatically completed, the detection efficiency is improved, whether the target is the target thrown from the building is determined by establishing the predicted landing area, the interference of objects moving in the building is eliminated, and the accuracy of target detection is improved.

Fig. 8 is a schematic structural diagram of an object detection device according to an embodiment of the present application. The object detection device provided in this embodiment includes:

the location information determining module 410 is configured to acquire a depth image from a building through a depth camera, and determine location information of a target in the depth image;

a position relation determining module 420, configured to determine a position relation between the target and the predicted landing area according to position information of the target; wherein the predicted landing zone is a zone through which the predicted object passes when thrown from the building;

and a throwing target determining module 430, configured to determine that the target is a target thrown from the building if the target is located in the predicted drop area.

In an embodiment of the present application, the location information includes pixel coordinates and a depth value of the target;

the positional relationship determination module 420 includes:

and the comparison unit is used for determining that the target is positioned in the predicted landing area if the pixel coordinates of the target are positioned in the image area of the building in the depth image and the depth value of the target is positioned in the depth value range of the predicted landing area.

In an embodiment of the present application, the apparatus further includes:

The image area determining module is used for determining an image area of a building in the depth image according to gradient values of depth values between adjacent pixel points in the depth image;

the plane area determining module is used for determining plane areas corresponding to different planes on the building in the image area of the building according to gradient values of depth values of adjacent pixel points in the to-be-detected area in the image area of the building and average gradients of depth values of adjacent pixel points in the same plane on the building in the depth image; the area to be detected is an area which is not currently determined to be a plane area corresponding to different planes on the building in the image area;

the landing area determining module is used for determining a predicted landing area according to the plane area and the depth value range; wherein the depth value range is determined according to the distance from the landing point to the building when the predicted target is thrown from the building.

In an embodiment of the present application, the apparatus further includes:

the vertical average gradient determining module is used for determining the vertical average gradient of the to-be-detected area according to the depth value of the midpoint of the upper edge, the depth value of the midpoint of the lower edge and the distance between the midpoint of the upper edge and the midpoint of the lower edge of the to-be-detected area in the image area of the building;

The horizontal average gradient determining module is used for determining a horizontal average gradient according to the average value of gradient values of a right adjacent pixel point and a left adjacent pixel point of the pixel point on the middle line of the region to be detected; the middle line is a connecting line of the midpoint of the upper edge and the midpoint of the lower edge of the region to be detected;

and the average gradient determining module is used for determining the average gradient according to the vertical average gradient and the horizontal average gradient.

In an embodiment of the present application, a building plane determination module includes:

the gradient range determining unit is used for subtracting a preset adjusting parameter from the average gradient as a first end point of the gradient range, adding the preset adjusting parameter from the average gradient as a second end point of the gradient range, and determining the gradient range;

and the plane determining unit is used for taking the area formed by the adjacent pixel points with gradient values within the gradient range as a plane area corresponding to the same plane on the building in the image area of the building.

And the loop execution unit is used for taking other areas in the image area as new areas to be detected, determining the average gradient and gradient range in the new areas to be detected, and determining the plane area corresponding to the same plane on the building in the areas to be detected until the image area of the building is divided into plane areas corresponding to different planes on the building.

In this embodiment of the present application, the landing area determining module is specifically configured to:

and overlapping the outline of the depth camera under the view angle with the outline of at least one plane area, and taking the area with the depth value meeting the depth value range as a predicted landing area.

In this embodiment of the present application, the cast-out target determining module 430 is specifically configured to determine that the target is a target cast out of the building if the target is located in the predicted drop area and it is determined that the height value of the target is reduced according to at least two frames of depth images of the target.

The object detection device provided by the embodiment of the application can be used for executing the object detection method provided by any embodiment, and has corresponding functions and beneficial effects.

Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device 10 may also represent various forms of mobile equipment, such as personal digital assistants, cellular telephones, smartphones, user equipment, wearable devices (e.g., helmets, eyeglasses, watches, etc.), and other similar computing equipment. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 9, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks, wireless networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the target detection method.

In some embodiments, the object detection method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. One or more steps of the methods described above may be performed when the computer program is loaded into RAM 13 and executed by processor 11. Alternatively, in other embodiments, the processor 11 may be configured to perform the target detection method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out the methods of the present application may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this application, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device 10, the electronic device 10 having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the electronic device 10. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solutions of the present application are achieved, and the present application is not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of target detection, the method comprising:

2. The method of claim 1, wherein the location information includes pixel coordinates and depth values of the object;

determining the position relation between the target and the predicted landing area according to the position information of the target, wherein the method comprises the following steps:

and if the pixel coordinates of the target are positioned in the image area of the building in the depth image and the depth value of the target is positioned in the depth value range of the predicted landing area, determining that the target is positioned in the predicted landing area.

3. The method of claim 1, wherein the determining of the predicted landing zone comprises:

determining an image area of a building in the depth image according to gradient values of depth values between adjacent pixel points in the depth image;

determining plane areas corresponding to different planes on a building in an image area of the building according to gradient values of depth values of adjacent pixel points in an area to be detected in the image area of the building and average gradients of depth values of adjacent pixel points in the same plane on the building in a depth image; the area to be detected is an area which is not currently determined to be a plane area corresponding to different planes on the building in the image area;

Determining a predicted landing area according to the plane area and the depth value range; wherein the depth value range is determined according to the distance from the landing point to the building when the predicted target is thrown from the building.

4. A method according to claim 3, wherein determining the average gradient comprises:

determining the vertical average gradient of the to-be-detected area according to the depth value of the midpoint of the upper edge, the depth value of the midpoint of the lower edge and the distance between the midpoint of the upper edge and the midpoint of the lower edge of the to-be-detected area in the image area of the building;

determining a horizontal average gradient according to the average value of gradient values of a right adjacent pixel point and a left adjacent pixel point of the pixel points on the middle line of the region to be detected; the middle line is a connecting line of the midpoint of the upper edge and the midpoint of the lower edge of the region to be detected;

determining the average gradient from the vertical average gradient and the horizontal average gradient.

5. A method according to claim 3, wherein determining planar areas in the image area of the building corresponding to different planes on the building based on gradient values of depth values of adjacent pixels in the area to be detected in the image area of the building and an average gradient of depth values of adjacent pixels in the depth image belonging to the same plane on the building comprises:

Subtracting a preset adjusting parameter from the average gradient basis to serve as a first end point of a gradient range, adding the preset adjusting parameter from the average gradient basis to serve as a second end point of the gradient range, and determining the gradient range;

taking an area formed by adjacent pixel points with gradient values in the gradient range as a plane area corresponding to the same plane on a building in an image area of the building;

and taking other areas in the image area as new areas to be detected, determining the average gradient and gradient range in the new areas to be detected, and determining the plane areas corresponding to the same plane on the building in the areas to be detected until the image area of the building is divided into the plane areas corresponding to different planes on the building.

6. A method according to claim 3, wherein determining a predicted landing zone from the planar zone and depth value range comprises:

7. The method of claim 1, wherein determining that the target is a target thrown from the building if the target is within a predicted drop zone comprises:

And if the target is positioned in the predicted falling area and the height value of the target is determined to be reduced according to at least two frames of depth images of the target, determining that the target is a target thrown from the building.

8. An object detection device, the device comprising:

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the object detection method according to any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the object detection method according to any one of claims 1-7.