CN114581798A

CN114581798A - Target detection method and device, flight equipment and computer readable storage medium

Info

Publication number: CN114581798A
Application number: CN202210149587.7A
Authority: CN
Inventors: 赵晓丹; 李勇; 潘屹峰; 黄吴蒙; 董哲盟; 周成虎
Original assignee: Guangzhou Imapcloud Intelligent Technology Co ltd
Current assignee: Guangzhou Imapcloud Intelligent Technology Co ltd
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-06-03

Abstract

The embodiment of the invention provides a target detection method, a target detection device, flight equipment and a computer readable storage medium, and relates to the technical field of computer vision. Firstly, a frame image set corresponding to the real-time inspection video is obtained, and then target detection is sequentially carried out on each frame image in the frame image set by using a target detection algorithm to obtain position information and classification information of a target detection object. And then, according to the position information and the classification information of the target detection object, obtaining a detection result image corresponding to each frame of image. And finally synthesizing the target video marked with the target detection object according to each frame of detection result image. The flight equipment can directly process the collected real-time inspection video to obtain the target video of the marked target detection object, the efficiency of the inspection video processing result is improved, and the real-time processing requirement of a user is met.

Description

Target detection method and device, flight equipment and computer readable storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a target detection method, a target detection device, flight equipment and a computer-readable storage medium.

Background

In the prior art, the unmanned aerial vehicle executes the polling task, usually, the unmanned aerial vehicle collects a video to be detected, the video is copied to the analysis server through the SD card after the video is collected and returned, and the analysis server performs target detection processing on the video. However, the efficiency of obtaining the result of the polling task is low, and the real-time requirement of the user cannot be met.

Disclosure of Invention

The present invention is directed to a target detection method, apparatus, flight device and computer readable storage medium, which can solve the problems of the prior art.

Embodiments of the invention may be implemented as follows:

in a first aspect, the present invention provides a target detection method applied to a flight device, the method including:

acquiring a frame image set corresponding to a real-time inspection video;

sequentially carrying out target detection on each frame image in the frame image set by using a target detection algorithm to obtain position information and classification information of a target detection object;

obtaining a detection result image corresponding to each frame of image according to the position information and the classification information of the target detection object, wherein the position and the category of the target detection object are marked in the detection result image;

and synthesizing the target video marked with the target detection object according to each frame of detection result image in sequence.

In an optional implementation manner, the step of sequentially performing target detection on each frame image in the frame image set by using a target detection algorithm to obtain position information and classification information of a target detection object includes:

preprocessing each frame image;

extracting features in the preprocessed image to obtain a feature image;

adopting down-sampling enhancement semantic features and adopting up-sampling enhancement positioning features for all the features in the feature image;

and performing regression prediction processing on all the features to obtain the position information and the classification information of the target detection object.

In an optional embodiment, before the step of acquiring a frame image set corresponding to the real-time inspection video, the method further includes:

reading a weight file obtained by pre-training, wherein the weight file comprises weight information of each node in the neural network corresponding to the target detection algorithm;

and updating the weight of each node in the neural network according to the weight file so as to complete the initialization of the target detection algorithm.

In an optional embodiment, the step of obtaining a frame image set corresponding to the real-time inspection video includes:

acquiring the real-time inspection video;

and decoding the real-time polling video to obtain a frame image set corresponding to the real-time polling video.

In an alternative embodiment, the method further comprises:

and sequentially pushing each frame of detection result image to a user side so that the user side plays videos according to the received each frame of detection result image.

In a second aspect, the present invention provides an object detection apparatus for use in a flying device, the apparatus comprising:

the video acquisition module is used for acquiring a frame image set corresponding to the real-time inspection video;

the data processing module is used for sequentially carrying out target detection on each frame image in the frame image set by utilizing a target detection algorithm to obtain position information and classification information of a target detection object;

the data processing module is further configured to obtain a detection result image corresponding to each frame of image according to the position information and the classification information of the target detection object, where the position and the category of the target detection object are marked in the detection result image;

and the data processing module is also used for synthesizing the target video marked with the target detection object according to each frame of detection result image in sequence.

In an optional implementation manner, the data processing module is specifically configured to:

preprocessing each frame image;

extracting features in the preprocessed image to obtain a feature image;

In an optional embodiment, the system further includes a model initialization module, where the model initialization module is specifically configured to:

In a third aspect, the invention provides a flight device, which comprises a video acquisition unit and a processing unit, wherein the video acquisition unit is in communication connection with the processing unit;

the video acquisition unit is used for acquiring the inspection video, encoding the inspection video into inspection video data and sending the inspection video data to the processing unit;

the processing unit is configured to perform the object detection method according to any of the preceding embodiments.

In a fourth aspect, the present invention provides a computer-readable storage medium storing a computer program for execution by a processor to implement the object detection method of any one of the preceding embodiments.

The embodiment of the invention provides a target detection method, a device, flight equipment and a computer readable storage medium. And then, according to the position information and the classification information of the target detection object, obtaining a detection result image corresponding to each frame of image. And finally synthesizing the target video marked with the target detection object according to each frame of detection result image. The beneficial effects are that: the flight equipment can directly process the acquired real-time inspection video to obtain the target video of the marked target detection object, the efficiency of the inspection video processing result is improved, and the real-time processing requirement of a user is met.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of a flight device according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of a target detection method according to an embodiment of the present invention.

Fig. 3 is a second schematic flowchart of a target detection method according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a target detection algorithm according to an embodiment of the present invention.

Fig. 5 is a third schematic flowchart of a target detection method according to an embodiment of the present invention.

Fig. 6 is a schematic network structure diagram of the algorithm Yolov5 s.

Fig. 7 is a fourth flowchart of a target detection method according to an embodiment of the present invention.

Fig. 8 is a schematic structural diagram of an object detection apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined or explained in subsequent figures.

In the description of the present invention, it should be noted that, if the terms "upper", "lower", "inner", "outer", etc. are used to indicate the orientation or positional relationship based on the orientation or positional relationship shown in the drawings or the orientation or positional relationship which the product of the present invention is used to usually place, it is only for convenience of description and simplification of the description, but it is not intended to indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

In the prior art, a user sometimes needs an unmanned aerial vehicle to perform a cruise task, and then the obtained video data is processed to obtain a patrol result. Usually, an unmanned aerial vehicle collects a video to be detected, the video is copied to an analysis server through an SD card after the video is collected and navigated back, and the analysis server performs target detection processing on the video. However, the timeliness of the inspection task result is low, and the real-time requirement of the user cannot be met.

In view of the above, embodiments of the present invention provide a target detection method to improve the above problems, which will be described in detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a flight device according to an embodiment of the present invention. The flight device 100 can include a video capture unit 110 and a processing unit 120, the video capture unit 110 communicatively coupled to the processing unit 120.

The video collecting unit 110 is configured to collect the inspection video, and encode the inspection video into inspection video data, and transmit the inspection video data to the processing unit 120.

The processing unit 120 is adapted to perform the steps of the method embodiments described below.

In alternative examples, the flying apparatus 100 may be, but is not limited to, a drone, a patrol robot, and the like.

When flight equipment 100 is unmanned aerial vehicle, this unmanned aerial vehicle can be used for carrying out the task of patrolling and examining, for example use unmanned aerial vehicle to patrol whether someone wanders or swims in the field around reservoir or the river, use unmanned aerial vehicle to patrol whether there is damage or abnormal conditions in the insulator of transformer substation, etc.. When the flying apparatus 100 is a patrol robot, the patrol robot may be used for real-time video patrol, such as checking whether a person does not wear a mask or the like using the patrol robot.

In alternative examples, processing unit 120 may be a data processing module that may be, but is not limited to, an Nvidia family edge computing development board, an Atlas 200AI acceleration module, or the like.

In a possible implementation manner, when the flight device is an unmanned aerial vehicle, the video acquisition unit of the flight device may be a camera module mounted on the unmanned aerial vehicle, and the processing unit of the flight device may be an edge calculation module mounted on the unmanned aerial vehicle. Compared with an Nvidia series edge calculation development board as an edge calculation module, the Atlas 200AI acceleration module has a higher recognition speed and can better meet the real-time requirement as the edge calculation module, and the latter has a lower cost than the former, and can be selected according to the actual situation in the practical application, which is not limited herein.

It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that the heeling apparatus 100 may include more or fewer components than shown in FIG. 1 or may have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a schematic flow chart of a target detection method according to an embodiment of the present invention. The execution subject of the method is the flight equipment, and the method comprises the following steps:

s202, acquiring a frame image set corresponding to the real-time patrol video.

In this embodiment, a frame image set corresponding to the real-time inspection video needs to be obtained first, where the frame image set may include multiple frame images forming the real-time inspection video.

S203, sequentially carrying out target detection on each frame image in the frame image set by using a target detection algorithm to obtain position information and classification information of a target detection object.

In this embodiment, each frame image in the frame image set is sequentially input into the neural network structure corresponding to the target detection algorithm, and then the position information and classification information of the target detection object in each frame image are output.

And S204, obtaining a detection result image corresponding to each frame of image according to the position information and the classification information of the target detection object.

In this embodiment, for any frame image in the frame image set, the position information and the classification information of the target detection object obtained by the frame image set may be superimposed on the image to obtain a corresponding detection result image. That is, the position and the category of the target detection object in the detection result image have been marked.

And S205, synthesizing a target video of the marked target detection object according to each frame of detection result image.

In this embodiment, each frame of the obtained detection result image may be sequentially synthesized into a target video, where the target video includes the marked target detection object.

The target detection method provided by the embodiment of the invention comprises the steps of firstly obtaining a frame image set corresponding to a real-time inspection video, and then sequentially carrying out target detection on each frame image in the frame image set by using a target detection algorithm to obtain the position information and the classification information of a target detection object. And then, according to the position information and the classification information of the target detection object, obtaining a detection result image corresponding to each frame of image. And finally synthesizing the target video marked with the target detection object according to each frame of detection result image. Therefore, the flight equipment can directly process the collected real-time routing inspection video to obtain the target video of the marked target detection object, the efficiency of routing inspection video processing results is improved, and the real-time processing requirements of users are met.

It is understood that object detection is a computer vision technique to identify and locate objects in an image or video. Object detection can be understood as two parts, object localization and object classification. Targeting is predicting the exact location of an object in an image (which may be marked with a bounding box), while object classification is identifying the category (e.g., person/vehicle, etc.) to which it belongs.

It can be understood that the target detection algorithm may be, but is not limited to, algorithms such as Yolov3, Yolov5, fast RCNN, and the like, a training process of the target detection algorithm may be performed in a server by using a labeled data set, a weight file corresponding to the target detection algorithm may be obtained after the training is completed, and the weight file may include weight information of each node in a neural network corresponding to the target detection algorithm.

In an alternative embodiment, the weight file may be stored in the flight device in advance, and when the cruise task is required to perform target detection, the application may be read in real time. Referring to fig. 3, fig. 3 is a second schematic flow chart of a target detection method according to an embodiment of the present invention, before step S202, the method may further include the steps of:

and S200, reading a weight file obtained by pre-training.

In this embodiment, the file extension of the weight file may be in the pt format, and the weight file may be pre-trained for storage in the processing unit of the flight device.

S201, updating the weight of each node in the neural network according to the weight file to complete the initialization of the target detection algorithm.

In this embodiment, after reading the weight file obtained by training in advance, the weights of the nodes in the neural network may be updated according to the weight file.

In an alternative example, when the target detection algorithm is Yolov5 and the processing unit of the flight device is an Atlas 200AI acceleration module, after reading a weight file obtained through pre-training, the weight file may be converted into an om model, and then the om model is loaded to update the weight of each node in the neural network, and when the target detection algorithm is subsequently adopted, the om model may be directly used to detect and identify the target detection object in the image. Thus, the Atlas 200AI acceleration module is combined with the Yolov5 algorithm to obtain the om model by pre-loading, so that the speed of target detection on each frame image in the subsequent frame image set can be increased, and the timeliness requirement of the user can be further met.

It should be noted that the Yolov5 algorithm includes four types of Yolov5s, Yolov5m, Yolov5l and Yolov5 x. Wherein, the four types are sorted according to the recognition speed as follows: yolov5s > Yolov5m > Yolov5l > Yolov5x, the recognition speed is reduced in sequence, and the four types are sorted according to the recognition precision: yolov5s < Yolov5m < Yolov5l < Yolov5x, and the recognition precision is sequentially enhanced. In an alternative example, when the target detection algorithm is the Yolov5 algorithm, the target detection algorithm may be any one of Yolov5s, Yolov5m, Yolov5l and Yolov5x, and the type selected by the specific target detection algorithm may be considered comprehensively according to the recognition speed requirement and the recognition accuracy requirement of the user.

Referring to fig. 4, the structure of the target detection algorithm includes four parts: input end, backbone network, neck network, prediction end. The process of object detection for each frame image in the set of frame images is described below in conjunction with fig. 4.

In an optional implementation manner, the target detection is performed on each frame of image step by step through a neural network structure corresponding to a target detection algorithm. Referring to fig. 5, taking the target detection algorithm as Yolov5S as an example, for step S203, it may include the following sub-steps:

s203-a, preprocessing is carried out aiming at each frame image.

The input data is each frame of image, and the preprocessing can process the size of each frame of image at the input end. In an alternative example, the pre-processing may be performed by scaling the input image size to 608 × 608 × 3.

S203-b, extracting the features in the preprocessed image to obtain a feature image.

It will be appreciated that a plurality of target detection objects may be included in the image, and each target detection object may correspond to a set of features. The backbone network can extract the features in the preprocessed image to obtain a feature image.

S203-c, adopting down-sampling to enhance semantic features and adopting up-sampling to enhance positioning features for all features in the feature image.

In this embodiment, the neck network may employ downsampling to enhance semantic features, and upsampling to enhance localization features.

S203-d, performing regression prediction processing on all the features to obtain position information and classification information of the target detection object.

In this embodiment, the prediction side may include a convolutional layer and a fully-connected layer, and the position information and the classification information of the target detection object can be obtained by performing regression prediction processing on all the features by the convolutional layer and the fully-connected layer.

Taking the target detection algorithm Yolov5s as an example, please refer to fig. 6, and a network structure of the Yolov5s algorithm is described below.

The CBL module consists of a convolution layer, a batch normalization layer and a Leaky relu activation function. The Res unit references a residual error structure in the ResNet network and is used for constructing a deep network. CSP1_ X module mirrors CSPNet network structure, and is composed of CBL module, X Res units and convolutional layer. CSP2_ X module mirrors CSPNet network structure, and is composed of convolutional layer and 2X +1 CBL modules. The Focus module Concat (merges) a plurality of slice results and sends the merged slice results to the CBL module. The SPP module constructs a spatial pyramid in a maximal pooling manner of 1 × 1, 5 × 5, 9 × 9 and 13 × 13, and fuses the multi-scale features.

In the model training stage, the input end can carry out Mosaic data enhancement, self-adaptive anchor frame calculation and self-adaptive picture scaling, and when the trained model is used for target detection, only the self-adaptive picture scaling can be carried out at the input end.

The Focus structure and the CSP structure in the backbone network are combined for feature extraction, and the scale invariance of the model is increased through the SPP structure.

The FPN + PAN structure in the neck network enhances semantic features and positioning features, and a CSP2 structure designed by referring to CSPNet is adopted, so that the network feature fusion capability is enhanced.

The prediction end adopts GIOU loss as a bounding box loss function, and the determination of the target bounding box in the bounding boxes adopts DIOU non-maximum value inhibition.

In an optional implementation mode, the real-time inspection video is shot by a video acquisition unit of the flight equipment and is transmitted to a processing unit of the flight equipment. For step S202, it includes the sub-steps of:

s202-a, acquiring a real-time patrol video.

In this embodiment, after the video acquisition unit of the flight device obtains the real-time patrol video through shooting, the video acquisition unit can encode the real-time patrol video into a video stream format, and then the video acquisition unit pushes the stream to the processing unit in real time.

S202-b, decoding the real-time inspection video to obtain a frame image set corresponding to the real-time inspection video.

In this embodiment, after receiving the real-time polling video in the video stream format, the processing unit of the flight device may decode the real-time polling video first and then further obtain a corresponding frame image set.

In an optional implementation mode, the target video can be sent to the user through a communication module of the flight device, so that the user can watch the detection result of the patrol video in real time. Referring to fig. 7, after step S205, the target detection method further includes the steps of:

s206, sequentially pushing each frame of detection result image to the user side so that the user side plays the video according to the received each frame of detection result image.

In this embodiment, each frame of detection result image is obtained through processing, and the detection result image can be pushed to the user side in real time, so that the user side performs video playing according to each frame of received detection result image. The user terminal may be a terminal or a client of the user.

In an alternative example, the processing unit of the flight device may include a communication module, which may be, but is not limited to, an internet card, a 4G card, a 5G card, and the like, and the communication module may push each obtained frame of detection result image to the user terminal in real time.

It should be noted that the target detection method of the present invention is not limited to the specific sequence of the above steps. It should be understood that, in other embodiments, the order of some steps in the object detection method according to the present invention may be interchanged according to actual needs, or some steps may be omitted or deleted.

Based on the above-mentioned target detection method, an embodiment of the invention further provides a target detection apparatus 300, please refer to fig. 8, and fig. 8 is a schematic structural diagram of the target detection apparatus according to the embodiment of the invention. The execution subject of the object detection device 300 is a flight device, and the object detection device 300 includes a video acquisition module 320 and a data processing module 330.

The video obtaining module 320 is configured to obtain a frame image set corresponding to the real-time inspection video.

The data processing module 330 is configured to perform target detection on each frame image in the frame image set in sequence by using a target detection algorithm, so as to obtain position information and classification information of a target detection object.

The data processing module 330 is further configured to obtain a detection result image corresponding to each frame of image according to the position information and the classification information of the target detection object.

And the position and the category of the marked target detection object in the detection result image.

The data processing module 330 is further configured to synthesize a target video of the marked target detection object sequentially according to each frame of detection result image.

In this embodiment, the video acquiring module 320 may be configured to perform the above step S202, and the data processing module 330 may be configured to perform the above steps S203-S206. For the related contents of the video acquisition module 320 and the data processing module 330, reference may be made to the corresponding detailed description above.

Further, the data processing module 330 may be specifically configured to:

preprocessing each frame image;

extracting features in the preprocessed image to obtain a feature image;

adopting down-sampling to enhance semantic features and adopting up-sampling to enhance positioning features for all features in the feature image;

and performing regression prediction processing on all the characteristics to obtain the position information and the classification information of the target detection object.

Further, the object detection apparatus 300 further includes a model initialization module 310, and the model initialization module 310 may specifically be configured to:

and updating the weight of each node in the neural network according to the weight file to finish the initialization of the target detection algorithm.

In this embodiment, the model initialization module 310 may be configured to perform the above steps S200 and S201, and reference may be made to the foregoing detailed description about relevant contents of the model initialization module 310. The object detection apparatus 300 is used for executing the methods provided by the foregoing embodiments, and the implementation principles and technical effects thereof are similar and will not be described herein again.

These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a System-on-a-Chip (SoC).

Further, based on the above object detection method, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the above method embodiment. The computer readable storage medium may be, but is not limited to, various media that can store program codes, such as a usb disk, a removable hard disk, a ROM, a RAM, a PROM, an EPROM, an EEPROM, a FLASH disk, or an optical disk.

To sum up, the embodiment of the present invention provides a target detection method, an apparatus, a flight device, and a computer-readable storage medium, which first obtain a frame image set corresponding to a real-time inspection video, and then sequentially perform target detection on each frame image in the frame image set by using a target detection algorithm to obtain position information and classification information of a target detection object. And then, according to the position information and the classification information of the target detection object, obtaining a detection result image corresponding to each frame of image. And finally, synthesizing the target video marked with the target detection object according to each frame of detection result image in sequence. Therefore, the flight equipment can directly process the collected real-time routing inspection video to obtain the target video of the marked target detection object, the efficiency of routing inspection video processing results is improved, and the real-time processing requirements of users are met.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An object detection method is applied to flight equipment, and the method comprises the following steps:

acquiring a frame image set corresponding to a real-time inspection video;

2. The method according to claim 1, wherein the step of sequentially performing object detection on each frame image in the frame image set by using an object detection algorithm to obtain the position information and the classification information of the object to be detected comprises:

preprocessing each frame image;

extracting features in the preprocessed image to obtain a feature image;

3. The method of claim 1, wherein prior to the step of obtaining a set of frame images corresponding to a real-time inspection video, the method further comprises:

4. The method of claim 1, wherein the step of obtaining a set of frame images corresponding to the real-time inspection video comprises:

acquiring the real-time inspection video;

5. The method of claim 1, further comprising:

6. An object detection device, for use in a flying apparatus, the device comprising:

7. The apparatus of claim 6, wherein the data processing module is specifically configured to:

preprocessing each frame of image;

extracting features in the preprocessed image to obtain a feature image;

8. The apparatus of claim 6, further comprising a model initialization module, the model initialization module specifically configured to:

9. The flight equipment is characterized by comprising a video acquisition unit and a processing unit, wherein the video acquisition unit is in communication connection with the processing unit;

the processing unit is adapted to perform the object detection method of any of claims 1 to 5.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the object detection method of any one of claims 1 to 5.