CN111951328A

CN111951328A - Object position detection method, device, equipment and storage medium

Info

Publication number: CN111951328A
Application number: CN202010778896.1A
Authority: CN
Inventors: 顾会建; 戴一凡; 王宝宗; 史宏涛; 路萍; 章烨
Original assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Current assignee: Tsinghua University; Suzhou Automotive Research Institute of Tsinghua University
Priority date: 2020-08-05
Filing date: 2020-08-05
Publication date: 2020-11-17

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for detecting the position of an object, wherein the method comprises the following steps: acquiring a target image to be detected; performing image segmentation on the target image to obtain each target sub-image corresponding to the target image; based on a target detection network model, detecting a target object for each target subimage, and determining a target detection result corresponding to each target subimage; and performing fusion processing on each target detection result corresponding to the target image, and determining target position information corresponding to the target object. By the technical scheme of the embodiment of the invention, the condition of missed detection caused by zooming can be avoided when the high-resolution image is detected, and the recall rate and the accuracy rate of the object position detection are improved.

Description

Object position detection method, device, equipment and storage medium

Technical Field

The embodiments of the present invention relate to computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a position of an object.

Background

With the rapid development of computer technology, the target object in the image can be automatically detected and identified, and the position information of the target object can be obtained.

At present, the object position can be detected based on a deep learning target detection network model. For example, when the resolution of the image to be detected is far higher than the resolution of the input image corresponding to the target detection network model, the image to be detected needs to be scaled to a greater extent so that the resolution of the image to be detected is reduced to the resolution of the input image, and then the scaled image to be detected is input into the target detection network model for object position detection. However, a large amount of image information is lost after image scaling to a large extent, which may result in missed detection and greatly reduce the detection accuracy of the object position.

Disclosure of Invention

The embodiment of the invention provides an object position detection method, device and equipment and a storage medium, which are used for improving the recall rate and accuracy of object position detection during high-resolution image detection.

In a first aspect, an embodiment of the present invention provides an object position detection method, including:

acquiring a target image to be detected;

performing image segmentation on the target image to obtain each target sub-image corresponding to the target image;

based on a target detection network model, detecting a target object for each target subimage, and determining a target detection result corresponding to each target subimage;

and performing fusion processing on each target detection result corresponding to the target image, and determining target position information corresponding to the target object.

In a second aspect, an embodiment of the present invention further provides an object position detection apparatus, including:

the target image acquisition module is used for acquiring a target image to be detected;

the target image segmentation module is used for carrying out image segmentation on the target image to obtain each target sub-image corresponding to the target image;

the target object detection module is used for detecting a target object for each target sub-image based on a target detection network model and determining a target detection result corresponding to each target sub-image;

and the target position information determining module is used for fusing all target detection results corresponding to the target images and determining target position information corresponding to the target object.

In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of object position detection as provided by any of the embodiments of the invention.

In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the object position detection method according to any of the embodiments of the present invention.

The embodiment of the invention has the following advantages or beneficial effects:

the target image to be detected is subjected to image segmentation to obtain each target sub-image corresponding to the target image, so that the target sub-images obtained after segmentation can meet the requirement of the resolution ratio of the input image of the target detection network model, the target object detection can be performed on each target sub-image based on the target detection network model, the target detection result corresponding to each target sub-image is determined, the target detection results corresponding to the target images are subjected to fusion processing to determine the target position information of the target object in the target image, the position detection of the target object can be performed on the complete target image, the condition of missed detection caused by image scaling is avoided, and the recall rate and the accuracy rate of the position detection of the object are greatly improved.

Drawings

Fig. 1 is a flowchart of an object position detection method according to an embodiment of the present invention;

FIG. 2 is an example of segmentation of a target image according to an embodiment of the present invention;

fig. 3 is a flowchart of an object position detection method according to a second embodiment of the present invention;

fig. 4 is an example of a shooting mode by an unmanned aerial vehicle according to a second embodiment of the present invention;

fig. 5 is a schematic structural diagram of an object position detection apparatus according to a third embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of an object position detection method according to an embodiment of the present invention, which is applicable to a case of performing object position detection on an image with high resolution, and especially applicable to a scene in which object position detection is performed on an image acquired by an unmanned aerial vehicle. The method may be performed by an object position detection device, which may be implemented by means of software and/or hardware, integrated in an apparatus having data processing functionality. As shown in fig. 1, the method specifically includes the following steps:

and S110, acquiring a target image to be detected.

The target image may be a high-resolution image captured by a camera. The image resolution of the target image is higher than the image resolution detectable by the target detection network model. For example, the image resolution of the target image photographed by the drone is 4096 × 2160, and the image resolution detectable by the target detection network model is 608 × 608. The target image may or may not include the target object image. If the target image includes the target object image, the position information of the target object in the target image can be detected.

Specifically, a target scene can be shot by using a high-resolution camera to obtain a clear high-resolution image as a target image to be detected, so that the accuracy of target object detection is improved.

And S120, carrying out image segmentation on the target image to obtain each target sub-image corresponding to the target image.

The target sub-image may refer to a partial image area in the target image obtained after the target image is divided. The number of target sub-images may be plural. The size of each target sub-image may or may not be equal, which may be set based on specific business requirements. The image resolution of the target sub-image is matched with the image resolution detectable by the target detection network model, for example, the image resolution of the target sub-image is equal to the image resolution detectable by the target detection network model or is within a preset range corresponding to the image resolution detectable by the target detection network model.

Specifically, the target image may be uniformly or non-uniformly segmented based on the image resolution of the target image and the image resolution detectable by the target detection network model, and the high-resolution target image is segmented into each target sub-image with a suitable resolution, so that the target detection network model may be used to directly detect the position of the object in each target sub-image, and the condition of missing detection caused by image information loss due to image scaling is avoided.

Exemplarily, S120 may include: determining a target image segmentation parameter according to target image resolution information corresponding to a target image and input image resolution information corresponding to a target detection network model; and according to the target image segmentation parameters, carrying out image segmentation on the target image to obtain each target sub-image corresponding to the target image.

Wherein the target image resolution information may include: width information and height information of the target image resolution. The input image resolution information may include: width information and height information of the image resolution input to the target detection network model. The target image segmentation parameters may include: the number of target image divisions in the width direction and the number of target image divisions in the height direction.

Specifically, the target image may be divided into target sub-images with appropriate resolution by performing uniform division or non-uniform division on the target image according to the target image division parameters. For example, the target image may be divided uniformly in the width direction based on the number of target image divisions in the width direction, and the target image may be divided uniformly in the height direction based on the number of target image divisions in the height direction, so that respective target sub-images having the same size are obtained, and the target detection may be performed more conveniently in the following.

S130, detecting the target object of each target sub-image based on the target detection network model, and determining a target detection result corresponding to each target sub-image.

The target object may refer to any object to be identified. The target object may be one or more. For example, in an unmanned driving scenario, the target object may refer to different types of cars, trucks, pedestrians, and so on. The target detection network model may refer to a network model that is trained in advance for identifying and locating target objects. For example, the target detection network model may be, but is not limited to, a residual network model ResNet (such as ResNet101 or ResNet50) or a YOLO (young Only Look one) network model (such as YOLO V4). The target detection result may be used to characterize whether the target object exists in the target sub-image and the position information of the target object. It should be noted that the resolution of the input image required by the target detection network model is fixed, and the specific numerical value thereof may be determined based on the sample data. For example, the resolution of the input image required by the object detection network model is 608 × 608.

Specifically, the target detection network model may be trained in advance based on a target object and sample data to be identified, so that the target detection network model obtained after training may accurately perform detection of the target object. For example, the target detection network model may be trained based on a COCO (Common Objects in Context) dataset. After the trained target detection network model is obtained, each target sub-image can be input into the target detection network model, and a target detection result corresponding to each target sub-image can be obtained according to the output of the target detection network model. It should be noted that, if the resolution of the segmented target sub-image is equal to the resolution of the input image required by the target detection network model, the target sub-image is detected in the target detection network model to keep the original size. If the resolution of the segmented target sub-image is not equal to the resolution of the input image required by the target detection network model, fine scaling adjustment is required when the target sub-image is detected in the target detection network model. Compared with the method for zooming the high-resolution target image to a large extent, the method for zooming the high-resolution target image to a small extent does not cause loss of a large amount of image information by adjusting the target sub-image to a small extent, so that a more accurate detection result can be obtained, and the universality of scene application is improved.

And S140, fusing the target detection results corresponding to the target images, and determining target position information corresponding to the target object.

The target position information may refer to coordinate position information of the target object in the target image. For example, a coordinate system is established with the upper left corner of the target image as the origin, and target position information corresponding to the target object is characterized in the coordinate system, for example, the target position information may include: coordinates of the upper left corner of the target detection frame where the target object is located, and the width and height of the target detection frame.

Specifically, after the target detection result of each target sub-image corresponding to the target image is obtained, the target detection results may be fused and spliced, for example, coordinate conversion is performed on the position information of the detected target object in the corresponding target sub-image to obtain the target position information of the target object in the target image, so that the position of the target object can be detected for the complete target image in an image segmentation manner, thereby avoiding missing detection caused by image information loss, and greatly improving the detection accuracy of the object position.

Exemplarily, S140 may include: acquiring first position information of a target object in corresponding target sub-images based on each target detection result corresponding to the target image; and performing coordinate conversion on the first position information corresponding to the target sub-image according to the second position information of the reference pixel points in the target sub-image in the target image, and determining the target position information of the target object in the target image.

The first position information of the target object in the corresponding target sub-image may be coordinate position information corresponding to the target object obtained in the first coordinate system. The first coordinate system is a coordinate system established by taking the upper left corner of the corresponding target sub-image as an origin. In this embodiment, any pixel in the target sub-image may be used as a reference pixel, for example, a first pixel (i.e., an origin of the first coordinate system) at an upper left corner of the target sub-image may be used as a reference pixel, so that the coordinate conversion operation may be simplified, and the detection efficiency may be improved. The second position information of the reference pixel point in the target image may refer to coordinate position information corresponding to the reference pixel point in a second coordinate system. The second coordinate system is a coordinate system established by taking the upper left corner of the target image as an origin. The target position information may refer to coordinate position information corresponding to the target object obtained in the second coordinate system.

Specifically, each target sub-image containing the target object is obtained based on the target detection result corresponding to each target sub-image, and for each target sub-image containing the target object, a coordinate transformation relationship from a first coordinate system to a second coordinate system is determined based on second position information of a reference pixel point in the target sub-image in the target image and position information of the reference pixel point in the corresponding target sub-image (i.e. a first coordinate system), and based on the coordinate transformation relationship, coordinate transformation is performed on the first position information corresponding to the target sub-image to obtain position information of the target object in the second coordinate system, i.e. target position information of the target object in the target image, so that the position of the target object in the high-resolution image can be accurately obtained, and the problem of poor target detection effect of a target detection network model in the high-resolution image is solved, the detection effect is greatly improved.

According to the technical scheme, the target image to be detected is subjected to image segmentation to obtain each target sub-image corresponding to the target image, so that the segmented target sub-images can meet the requirement of the resolution ratio of the input image of the target detection network model, the target object can be detected for each target sub-image based on the target detection network model, the target detection result corresponding to each target sub-image is determined, the target detection results corresponding to the target images are subjected to fusion processing, the target position information corresponding to the target object is determined, the position detection of the target object can be performed for the complete target image, the condition of missed detection caused by the loss of the image information is avoided, and the detection accuracy of the object position is greatly improved.

On the basis of the above technical solution, determining a target image segmentation parameter according to target image resolution information corresponding to a target image and input image resolution information corresponding to a target detection network model may include: determining the target image segmentation times in the width direction according to the target image resolution width information corresponding to the target image and the input image resolution width information corresponding to the target detection network model; and determining the target image segmentation times in the height direction according to the target image resolution height information corresponding to the target image and the input image resolution height information corresponding to the target detection network model.

Specifically, the target image resolution width information may be divided by the input image resolution width information, and the obtained calculation result, which is determined as the target image division number of times a in the width direction, may be subjected to the rounding-up process. Similarly, the target image resolution height information may be divided by the input image resolution height information, and the obtained calculation result may be subjected to rounding-up processing, and the obtained rounding result is determined as the target image division number b in the height direction, so that the target image may be divided into a × b target sub-images, so as to solve the problem of detection effect caused by image scaling.

On the basis of the above technical solution, performing image segmentation on the target image according to the target image segmentation parameters to obtain each target sub-image corresponding to the target image, which may include: determining overlapping area information according to the size information of the target object; determining an image segmentation position based on the target image segmentation parameter and the image overlapping region information; and carrying out image segmentation on the target image based on the image segmentation position to obtain each target sub-image corresponding to the target image.

The size information of the target object may include, among other things, width information and height information of the target object in the captured image. The overlap area information may refer to length information of the overlap area of two target sub-images adjacent in the width direction and the height direction, such as a distance between two position points EB and a distance between two position points FC in fig. 2. The image segmentation locations may include: each divided position in the width direction and each divided position in the height direction.

Specifically, the information of the overlap area can be determined based on the maximum size information of the target object, so that the overlap area of two adjacent target sub-images can completely cover the target object, and therefore, the target sub-images containing complete target objects can be ensured to exist after the target image is segmented, and the accuracy and the integrity of the position detection of the target object are improved. For example, the maximum size information of the target object may be taken as the length information of the overlapping area. In the target image width direction, each division position may be determined in the width direction based on the number of times of division of the target image in the width direction and the length information of the overlap region. For example, as an example of the target image division shown in fig. 2, the position point B may be taken as the division position of the first target sub-image in the width direction. In order to ensure the overlapping area in the width direction, the position point E may be used as the dividing position of the second target sub-image in the width direction, and similarly, the dividing positions of other target sub-images in the width direction may be determined, so that it may be ensured that each two adjacent target sub-images have an overlapping area with a length of EB distance in the width direction. In the target image height direction, the respective division positions may be determined in the height direction based on the number of times of division of the target image in the height direction and the length information of the overlap region. For example, the position point C may be set as the dividing position of the first target sub-image in the height direction in fig. 2. In order to ensure the overlapping area in the height direction, the position point F may be used as the dividing position of the second target sub-image in the height direction, and similarly, the dividing positions of other target sub-images in the height direction may be determined, so that it may be ensured that each two adjacent target sub-images also have the overlapping area with the length of FC distance in the height direction. Based on the cutting position corresponding to each target sub-image, image segmentation may be performed on the target image to obtain target sub-images having overlapping areas in both the width direction and the length direction, for example, an image area composed of ABCD is a first target sub-image in fig. 2; the second target sub-image in the width direction is an image area composed of EGHM. By detecting and fusing the target object for each target subimage with the overlapping area, complete target position information can be obtained more conveniently, and the accuracy and efficiency of target object position detection are further ensured.

Example two

Fig. 3 is a flowchart of an object position detection method according to a second embodiment of the present invention, and this embodiment describes, in detail, the extracted information of the travel track of the target object on the target road based on the road video shot by the unmanned aerial vehicle for the target road based on the above embodiments. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.

Referring to fig. 3, the object position detection method provided in this embodiment includes the following steps:

s310, acquiring a road video shot by the unmanned aerial vehicle aiming at a target road, and taking each frame of road image in the road video as a target image.

The target road may refer to a road on which the target object travels. For example, an expressway may be taken as a target road to extract travel locus information of a vehicle traveling on the expressway as a target object.

Specifically, the unmanned aerial vehicle can be used for carrying out video shooting on the target road to obtain a road video. The detection of the object position may be performed with each frame of road image in the road video as the target image one by one, i.e., the operations of steps S320-S340 are performed. Compared with the situation that road videos are collected by a camera fixedly installed on a road test, the camera installation height and the shooting angle of the camera can be eliminated by utilizing the movable unmanned aerial vehicle to shoot the videos, so that longer track information can be extracted later, the situation that the appearance size of a target object is obviously changed near and far away from the camera is avoided, and the actual operation condition of the target object can be more fully reflected by the extracted track information. Meanwhile, the condition that the shot target object is shielded mutually can be avoided, for example, the condition that the cart shields the trolley, so that the condition that the target object is missed to be detected or the target detection frame is inaccurate can be avoided, and the accuracy of object detection is further improved.

And S320, carrying out image segmentation on the target image to obtain each target sub-image corresponding to the target image.

S330, detecting the target object of each target sub-image based on the target detection network model, and determining a target detection result corresponding to each target sub-image.

For example, in an expressway driving scene, a vehicle in a target road may be detected as a target object so as to extract information on a running track of the vehicle on the target road. In the embodiment, the VisDrone2019 unmanned aerial vehicle database can be used as sample data to train the target detection network model, so that the trained target detection network model can more accurately detect the target object in the unmanned aerial vehicle image with higher resolution, and the recall rate and the accuracy rate of object position detection are further improved.

And S340, fusing the target detection results corresponding to the target images, and determining target position information corresponding to the target object.

And S350, performing prediction regression and position matching on target position information of the target object in each frame of target image based on a preset target tracking algorithm, and determining target track information corresponding to the target object.

The preset target tracking algorithm may be an algorithm for identifying and tracking the same target object. For example, the preset target tracking algorithm may be, but is not limited to, Deep Sort multi-target tracking algorithm. Specifically, when target position information corresponding to each frame of target image is obtained, prediction regression and position matching can be performed on each two adjacent target images, for example, prediction regression can be performed on the target position information corresponding to each frame of target image based on information such as the type, the driving speed, the acceleration and the like of a target object, matching can be performed on the target position information corresponding to the next frame of target image based on a prediction result, and position information corresponding to the same target object in each frame of target image is determined, so that each piece of obtained position information can be used as target track information of the target object on a target road, and then abundant automatic driving test scene data is obtained, accurate data support is provided for automatic driving technology development and scene test, and development and test verification of the automatic driving technology are accelerated.

According to the technical scheme, the road video is shot on the target road by the unmanned aerial vehicle, and the target object is detected on each frame of road image in the road video, so that the target position information of the target object in each frame of target image is subjected to prediction regression and position matching based on a preset target tracking algorithm, more accurate target track information of the target object on the target road is extracted, abundant and accurate automatic driving test scene data can be quickly obtained, accurate data support is provided for automatic driving technology development and scene test, and development and test verification of the automatic driving technology are accelerated.

On the basis of the above technical solution, before S310, the method may further include: based on the preset shooting angle and the preset shooting height, the camera of the unmanned aerial vehicle is controlled to shoot along the diagonal direction of the target road, so that the target road is located on the diagonal of the shot target image.

Specifically, fig. 4 shows an example of a drone shooting mode. As shown in fig. 4, the road length in each frame of the road image is the longest by selecting the diagonal line of the target road for shooting, so that a longer road shooting range can be obtained, a longer driving track can be extracted, and the reference value of the test data can be improved.

For example, the preset photographing angle may be: is perpendicular to the road surface of the target road by 90 degrees; the value range of the preset shooting height can be as follows: the preset shooting height is greater than or equal to 300 meters and less than or equal to 350 meters. As shown in fig. 4, the camera of the unmanned aerial vehicle is vertically shot by 90 degrees downwards, so that the shot target object has no change in appearance size, and the obtained track information is ensured to have no deviation. If other shooting angles are adopted, for example, 45 degrees, although the shot road surface length is long, the appearance size of the shot target object changes, so that the obtained track information obviously deviates from the actual condition, and when the information such as the acceleration and the speed of the target object is subsequently calculated, the deviation from the actual condition is large, and even the situation that the information does not conform to a dynamic model occurs, so that the accuracy of the extracted track information can be further ensured by utilizing 90-degree vertical downward shooting. A large number of realization effects prove that when the shooting height of the unmanned aerial vehicle is within the range of 300-350 meters and the resolution of the shot image is 4096 multiplied by 2160, the detection effect meeting the requirements can be obtained, and further the accuracy of extracting the track information is further improved.

The following is an embodiment of an object position detection apparatus provided in an embodiment of the present invention, and the apparatus and the object position detection method in the foregoing embodiments belong to the same inventive concept, and details that are not described in detail in the embodiment of the object position detection apparatus may refer to the embodiment of the object position detection method described above.

EXAMPLE III

Fig. 5 is a schematic structural diagram of an object position detection apparatus according to a third embodiment of the present invention, which is applicable to the case of detecting the object position of an image with high resolution. As shown in fig. 5, the apparatus specifically includes: a target image acquisition module 510, a target image segmentation module 520, a target object detection module 530, and a target location information determination module 540.

The target image acquiring module 510 is configured to acquire a target image to be detected; a target image segmentation module 520, configured to perform image segmentation on the target image to obtain each target sub-image corresponding to the target image; a target object detection module 530, configured to perform target object detection on each target sub-image based on the target detection network model, and determine a target detection result corresponding to each target sub-image; and the target position information determining module 540 is configured to perform fusion processing on each target detection result corresponding to the target image, and determine target position information corresponding to the target object.

According to the technical scheme, the target image to be detected is subjected to image segmentation to obtain each target sub-image corresponding to the target image, so that the segmented target sub-images can meet the requirement of the resolution ratio of the input image of the target detection network model, the target object can be detected for each target sub-image based on the target detection network model, the target detection result corresponding to each target sub-image is determined, the target detection results corresponding to the target images are subjected to fusion processing, the target position information corresponding to the target object is determined, the position detection of the target object can be performed for the complete target image, the condition of missed detection caused by the loss of the image information is avoided, and the recall rate and the accuracy rate of the object position detection are greatly improved.

Optionally, the target image segmentation module 520 includes:

the target image segmentation parameter determining unit is used for determining a target image segmentation parameter according to target image resolution information corresponding to a target image and input image resolution information corresponding to a target detection network model;

and the target image segmentation unit is used for carrying out image segmentation on the target image according to the target image segmentation parameters to obtain each target sub-image corresponding to the target image.

Optionally, the target image segmentation parameter determining unit is specifically configured to:

determining the target image segmentation times in the width direction according to the target image resolution width information corresponding to the target image and the input image resolution width information corresponding to the target detection network model; and determining the target image segmentation times in the height direction according to the target image resolution height information corresponding to the target image and the input image resolution height information corresponding to the target detection network model.

Optionally, the target image segmentation unit is specifically configured to:

determining overlapping area information according to the size information of the target object; determining an image segmentation position based on the target image segmentation parameter and the image overlapping region information; and carrying out image segmentation on the target image based on the image segmentation position to obtain each target sub-image corresponding to the target image.

Optionally, the target location information determining module 540 is specifically configured to:

acquiring first position information of a target object in corresponding target sub-images based on each target detection result corresponding to the target image; and performing coordinate conversion on the first position information corresponding to the target sub-image according to the second position information of the reference pixel points in the target sub-image in the target image, and determining the target position information of the target object in the target image.

Optionally, the target image obtaining module 510 is specifically configured to: acquiring a road video shot by an unmanned aerial vehicle for a target road, and taking each frame of road image in the road video as a target image;

the device also includes: and the target track information determining module is used for performing predictive regression and position matching on the target position information of the target object in each frame of target image based on a preset target tracking algorithm after fusing the target detection results corresponding to the target images and determining the target position information corresponding to the target object.

Optionally, the apparatus further comprises:

and the control module is used for controlling the camera of the unmanned aerial vehicle to shoot along the diagonal direction of the target road based on the preset shooting angle and the preset shooting height before acquiring the road video shot by the unmanned aerial vehicle for the target road so as to enable the target road to be located on the diagonal of the shot target image.

Optionally, the preset shooting angle is: is perpendicular to the road surface of the target road by 90 degrees; the value range of the preset shooting height is as follows: the preset shooting height is greater than or equal to 300 meters and less than or equal to 350 meters.

The object position detection device provided by the embodiment of the invention can execute the object position detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the object position detection method.

It should be noted that, in the embodiment of the object position detecting device, the included units and modules are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Example four

Fig. 6 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention. Fig. 6 illustrates a block diagram of an exemplary device 12 suitable for use in implementing embodiments of the present invention. The device 12 shown in fig. 6 is only an example and should not bring any limitations to the functionality and scope of use of the embodiments of the present invention.

As shown in FIG. 6, device 12 is in the form of a general purpose computing device. The components of device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, and commonly referred to as a "hard drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with device 12, and/or with any devices (e.g., network card, modem, etc.) that enable device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with the other modules of the device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, to implement the steps of an object position detection method provided by the embodiment of the present invention, the method including:

acquiring a target image to be detected;

carrying out image segmentation on the target image to obtain each target sub-image corresponding to the target image;

based on the target detection network model, detecting a target object for each target subimage, and determining a target detection result corresponding to each target subimage;

Of course, those skilled in the art can understand that the processor can also implement the technical solution of the object position detection method provided in any embodiment of the present invention.

EXAMPLE five

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a method of detecting a position of an object as provided in any of the embodiments of the present invention, the method comprising:

acquiring a target image to be detected;

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. An object position detection method, characterized by comprising:

acquiring a target image to be detected;

2. The method of claim 1, wherein performing image segmentation on the target image to obtain each target sub-image corresponding to the target image comprises:

determining a target image segmentation parameter according to target image resolution information corresponding to the target image and input image resolution information corresponding to the target detection network model;

and according to the target image segmentation parameters, carrying out image segmentation on the target image to obtain each target sub-image corresponding to the target image.

3. The method of claim 2, wherein determining the target image segmentation parameters according to the target image resolution information corresponding to the target image and the input image resolution information corresponding to the target detection network model comprises:

determining the target image segmentation times in the width direction according to the target image resolution width information corresponding to the target image and the input image resolution width information corresponding to the target detection network model;

and determining the target image segmentation times in the height direction according to the target image resolution height information corresponding to the target image and the input image resolution height information corresponding to the target detection network model.

4. The method according to claim 2, wherein performing image segmentation on the target image according to the target image segmentation parameters to obtain each target sub-image corresponding to the target image comprises:

determining overlapping area information according to the size information of the target object;

determining an image segmentation position based on the target image segmentation parameter and the image overlap region information;

and carrying out image segmentation on the target image based on the image segmentation position to obtain each target sub-image corresponding to the target image.

5. The method according to claim 1, wherein performing fusion processing on each target detection result corresponding to the target image to determine target position information corresponding to the target object comprises:

acquiring first position information of a target object in corresponding target sub-images based on each target detection result corresponding to the target image;

and performing coordinate conversion on the first position information corresponding to the target sub-image according to second position information of the reference pixel points in the target sub-image in the target image, and determining the target position information of the target object in the target image.

6. The method according to any one of claims 1 to 5, wherein acquiring the target image to be detected comprises:

acquiring a road video shot by an unmanned aerial vehicle for a target road, and taking each frame of road image in the road video as a target image;

after performing fusion processing on each target detection result corresponding to the target image and determining target position information corresponding to the target object, the method further includes:

and performing prediction regression and position matching on target position information of the target object in each frame of target image based on a preset target tracking algorithm, and determining target track information corresponding to the target object.

7. The method of claim 6, further comprising, before obtaining the road video of the target road taken by the drone:

based on the preset shooting angle and the preset shooting height, the camera of the unmanned aerial vehicle is controlled to shoot along the diagonal direction of the target road, so that the target road is located on the diagonal of the shot target image.

8. The method of claim 7, wherein the preset shooting angle is: 90 degrees perpendicular to the road surface of the target road;

the value range of the preset shooting height is as follows: the preset shooting height is greater than or equal to 300 meters and less than or equal to 350 meters.

9. An object position detecting device characterized by comprising:

10. An apparatus, characterized in that the apparatus comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the object position detection method of any one of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the object position detection method according to any one of claims 1 to 8.