CN112966546A - Embedded attitude estimation method based on unmanned aerial vehicle scout image - Google Patents
Embedded attitude estimation method based on unmanned aerial vehicle scout image Download PDFInfo
- Publication number
- CN112966546A CN112966546A CN202110004413.7A CN202110004413A CN112966546A CN 112966546 A CN112966546 A CN 112966546A CN 202110004413 A CN202110004413 A CN 202110004413A CN 112966546 A CN112966546 A CN 112966546A
- Authority
- CN
- China
- Prior art keywords
- aerial vehicle
- unmanned aerial
- scout image
- network
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000012545 processing Methods 0.000 claims abstract description 31
- 238000012549 training Methods 0.000 claims abstract description 15
- 238000002372 labelling Methods 0.000 claims abstract description 8
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000004927 fusion Effects 0.000 claims description 16
- 238000000926 separation method Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 13
- 238000002156 mixing Methods 0.000 claims description 10
- 238000011176 pooling Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 230000003628 erosive effect Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 4
- 230000010339 dilation Effects 0.000 claims description 4
- 230000002146 bilateral effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 7
- 238000013507 mapping Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000005215 recombination Methods 0.000 description 4
- 230000006798 recombination Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 101000742346 Crotalus durissus collilineatus Zinc metalloproteinase/disintegrin Proteins 0.000 description 2
- 101000872559 Hediste diversicolor Hemerythrin Proteins 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 241000760358 Enodes Species 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007797 corrosion Effects 0.000 description 1
- 238000005260 corrosion Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000009982 effect on human Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses an embedded attitude estimation method based on an unmanned aerial vehicle scout image, and belongs to the field of image processing and machine vision. The method specifically comprises the following steps: acquiring an original unmanned aerial vehicle scout image data set, and performing data enhancement processing on the original unmanned aerial vehicle scout image data set; labeling the obtained original unmanned aerial vehicle reconnaissance image data set to obtain a training data set with a label; constructing a lightweight multi-stage hourglass network, and training the lightweight multi-stage hourglass network by using the training data set; inputting an unmanned aerial vehicle scout image to be processed, preprocessing the unmanned aerial vehicle scout image, inputting the preprocessed unmanned aerial vehicle scout image into a trained lightweight attitude estimation network to obtain a portrait feature map, and estimating the portrait attitude according to the portrait feature map. According to the technical scheme, the algorithm performance and the deployment adaptability are integrated, and various problems of attitude estimation of the unmanned aerial vehicle video processing system are solved.
Description
Technical Field
The invention relates to the field of image processing and machine vision, in particular to a ground small target embedded posture estimation hourglass network for an unmanned aerial vehicle aerial video.
Background
In recent years, the unmanned aerial vehicle as a new combat force plays an irreplaceable role under the intelligent combat condition, the unmanned aerial vehicle equipment technology is vigorously developed, and the unmanned aerial vehicle has great strategic significance for improving the combat capability of troops. The attitude estimation technology is one of key technologies for the unmanned aerial vehicle to execute reconnaissance and striking tasks, and can provide strong support for the unmanned aerial vehicle to quickly and accurately identify the target intention, the advancing route and the like. The high-efficiency and accurate attitude estimation algorithm can effectively reduce the burden of ground operators, and improve the investigation capability and the quick response combat efficiency.
The traditional unmanned aerial vehicle reconnaissance ground small target posture estimation algorithm mainly obtains coordinates of key points of a human body through an image processing technology, so that a human body skeleton model or a contour model is obtained, and human body posture behaviors can be expressed more intuitively. Before 2015, all body pose estimation methods aimed at regressing the exact coordinates of the body's key points. However, these methods are very poorly scalable due to the flexibility of human body actions.
The human body posture estimation-based algorithm has the advantages that a human body is converted into a human body posture skeleton diagram or a human body contour diagram, so that the method is concise and intuitive, and background interference can be suppressed to a great extent. The disadvantage is that the pose estimation itself is a relatively complex problem, which is used as the front-end input for the pose detector, and the detector results are greatly affected by the pose estimation.
In an unmanned aerial vehicle video image processing system, the attitude estimation technology for a ground small target currently faces the following problems:
1) the complex human body image makes the model need to learn the highly nonlinear mapping relation, and the learning difficulty of the mapping relation is extremely large. The main reasons are: firstly, human body images are shot in different scenes and have different shooting angles and illumination conditions; secondly, random occlusion can be caused by the interaction relationship between people and objects and between people; finally, different wear and body types also increase the complexity of the mapping. Although the human body posture estimation method based on manual features can realize accurate positioning of non-shielding joints under the conditions of fixed scenes, visual angles and stable illumination, the ideal situation does not exist in real scenes. Therefore, how to extract more robust features and learn complex mapping relationships through characterization learning is a problem which must be studied in human body posture estimation.
2) The highly non-linear mapping relationship needs to use a more complex model to learn, and the more complex model needs a large computational overhead. How to guarantee the accuracy of the model while accelerating the running speed of the human body posture estimation model is a key problem for the human body posture estimation method to be practical.
Disclosure of Invention
In order to solve the problems, the technical scheme of the invention provides an embedded attitude estimation algorithm based on unmanned aerial vehicle scout images based on unmanned aerial vehicle video image characteristics and defects of the domestic prior art in the aspect of attitude estimation of unmanned aerial vehicle scout ground small targets, and overall algorithm performance and deployment adaptability, and solves a plurality of problems of attitude estimation of an unmanned aerial vehicle video processing system. The method mainly comprises the following steps: 1) the traditional attitude estimation is greatly influenced by the foreground; 2) the traditional deep learning algorithm model is large and difficult to deploy in the embedded equipment; 3) the problem of low efficiency of feature extraction and poor effect of feature fusion; 4) real-time nature of the detection process.
According to a first aspect of the present invention, an embedded pose estimation method based on a scout image of an unmanned aerial vehicle is provided, where the method specifically includes:
step 2, performing labeling processing on the original unmanned aerial vehicle reconnaissance image data set obtained in the step 1 to obtain a training data set with a label;
step 3, constructing a lightweight multi-stage hourglass network, and training the lightweight multi-stage hourglass network by using the training data set;
and 4, inputting an unmanned aerial vehicle scout image to be processed, preprocessing the unmanned aerial vehicle scout image, inputting the preprocessed unmanned aerial vehicle scout image into the trained lightweight attitude estimation network to obtain a portrait feature map, and estimating the portrait attitude according to the portrait feature map.
Further, in step 1, the data enhancement processing includes dilation, erosion, and bilateral filtering operations.
Further, in the step 2, the labeling processing of adding the Label is realized by an image labeling tool Label Img.
Further, the image annotation tool is Label Img.
Further, in step 3, the lightweight posture estimation network includes a convolutional layer, a pooling layer, a channel separation Module, a multilevel hourglass network formed by a plurality of Pyramid Residual Modules (PRMs), and a channel mixing Module.
Furthermore, the multi-stage hourglass network is a two-stage hourglass network and is composed of two pyramid residual modules.
Further, the convolutional layers are depth separable convolutional layers, including depth convolution processing and point convolution processing.
Further, the depth separable convolutional layer is specifically operative to:
for the common convolution with convolution kernel K, input channel number M and output channel number O, the method is divided into deep convolution processing and point convolution processing,
deep convolution processing: performing a K convolution operation on each input channel;
and (3) point convolution processing: performing linear fusion on the M characteristics, wherein the number of point convolutions is O,
wherein K, M, O are all positive integers.
Further, the channel separation module includes a plurality of feature channels.
Further, the multi-stage hourglass network is an eight-stage hourglass network.
Further, the hourglass network is composed of
Further, the step 4 specifically includes:
step 41: inputting an unmanned aerial vehicle scout image to be processed, and intercepting the unmanned aerial vehicle scout image to obtain a reduced-size unmanned aerial vehicle scout image;
step 42: inputting the unmanned aerial vehicle scout image obtained in the step 41 into the lightweight attitude estimation network, and obtaining a first characteristic diagram after pooling and convolution;
step 43: inputting the first characteristic diagram into a multi-stage hourglass network through a channel separation module, and outputting a plurality of second characteristic diagrams;
step 44: and inputting the plurality of second feature maps into a channel mixing module, performing feature fusion on the plurality of second feature maps, and outputting a portrait feature map, thereby estimating the portrait posture according to the portrait feature map.
Further, the second feature map is a low resolution feature map.
According to a second aspect of the invention, there is provided a computer readable storage medium having a computer program stored thereon, characterized in that the program, when executed by a processor, implements the steps of the method according to any of the above aspects.
According to a third aspect of the present invention there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any aspect are implemented when the program is executed by the processor.
Compared with the prior art, the invention has the following advantages:
1) the invention has high operation efficiency, and can perform real-time processing on a video image with a resolution of 1920 x 1080 within 20ms under the condition of only using a video card GTX 1050.
2) According to the invention, the common convolutional network is replaced by the deep separable network, so that the estimation effect is ensured and the network is further lightened.
3) The invention can be grouped and transmitted downwards during application, respectively extracts the characteristics, and reorders the channels when the characteristics are fused finally. Therefore, the number of channels can be reduced during transmission, the image characteristics of all parts can be effectively transmitted to the back during transmission, the correlation of the image characteristics can be improved, and the posture estimation effect can be further improved.
4) The present invention fuses features using concatenation. The fusion between the features is enhanced, so that each group of output channels can include all input features, the correlation of information is enhanced, and the attitude estimation efficiency of the ground small target is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a flow chart of an embedded attitude estimation method based on an unmanned aerial vehicle scout image according to the technical scheme of the invention;
FIG. 2 is a schematic diagram of a network model built up from a plurality of hourglass networks according to an aspect of the present invention;
fig. 3 is a schematic view of an hourglass network of light-weight PRMs of identical construction according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a depth separable convolution according to aspects of the present invention;
FIG. 5 is a schematic diagram of channel separation and recombination according to the present invention;
FIG. 6 is a diagram illustrating an original pyramid residual block according to an embodiment of the present invention;
fig. 7a and 7b are schematic views of a light-weight PRM according to an embodiment of the present invention;
fig. 8 is a diagram illustrating the results of the detection of 16 key points on the MPII dataset by the detected person and the lightweight network.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A plurality, including two or more.
And/or, it should be understood that, for the term "and/or" as used in this disclosure, it is merely one type of association that describes an associated object, meaning that three types of relationships may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone.
The technical scheme of the invention provides an embedded attitude estimation method based on an unmanned aerial vehicle scout image, which is mainly based on a pyramid residual error module and designs a lightweight network model. The depth separable convolution is used for replacing the ordinary convolution to reduce the training parameters, and a channel separation module and a channel mixing module are added to change the channel dimension of the feature map so as to strengthen the fusion of the features. In order to ensure that the network can still extract the features more completely, only the identity mapping part is subjected to channel separation processing, and a channel mixing module is added in the final feature fusion. According to the technical scheme, the deep separable convolution is added on the basis of the pyramid residual error network, and the channel separation and channel mixing module is combined, so that the network can effectively reduce the calculated amount and the storage space on the basis of maintaining the precision.
Based on the pyramid residual module, a lightweight network model is provided. The depth separable convolution is used for replacing the ordinary convolution to reduce the training parameters, and a channel separation module and a channel mixing module are added to change the channel dimension of the feature map so as to strengthen the fusion of the features. In order to ensure that the network can still extract the features more completely, only the identity mapping part is subjected to channel separation processing, and a channel mixing module is added in the final feature fusion.
And adding multi-scale features on the basis of the pyramid residual error module, extracting the features through convolution, and then performing feature fusion on the resolution ratio up-sampled before.
Specifically, as shown in fig. 1, the following steps are included.
101, acquiring an original unmanned aerial vehicle scout image data set, and performing data enhancement processing on the original unmanned aerial vehicle scout image data set;
102, performing labeling processing on the original unmanned aerial vehicle reconnaissance image data set obtained in the step 1 to obtain a training data set with a label;
103, constructing a lightweight multi-stage hourglass network, and training the lightweight multi-stage hourglass network by using the training data set;
and 104, inputting an unmanned aerial vehicle scout image to be processed, preprocessing the unmanned aerial vehicle scout image, inputting the preprocessed unmanned aerial vehicle scout image into the trained lightweight attitude estimation network to obtain a portrait feature map, and estimating the portrait attitude according to the portrait feature map.
The following describes key technologies related to the technical solutions of the present invention in detail with reference to the drawings.
Pyramid residual network
The hourglass network has a good detection effect on human body posture estimation, and a plurality of hourglass networks are stacked to continuously optimize a detection result. Each hourglass network combines the characteristics of a plurality of resolutions, is a modular network, and uses a residual error module for characteristic extraction for a plurality of times at each stage. Based on the pyramid residual network of the hourglass network, as shown in fig. 2, the image passes through a 7 × 7 convolutional layer, a pooling layer and a PRM, the image resolution is reduced to 64 × 64, and the image passes through each hourglass network in turn, and each network is followed by a relay monitor to prevent the gradient from disappearing. The structure of each hourglass network is as shown in fig. 3, the resolution is reduced continuously through the pooling layer, the lowest resolution reaches 4 × 4, and then feature extraction is performed through the pyramid residual module. Meanwhile, the multi-resolution features are continuously combined to carry out effective attitude estimation. Each module in the network is a pyramid residual module, and based on the fact that the module is a modularized network, the lightweight design is carried out on the module, the number of channels and the convolution mode are changed, and therefore the whole network is improved.
Designing a lightweight network:
depth separable convolution
The deep separable convolution is divided into two parts, namely, deep convolution and point convolution, as shown in fig. 4, wherein a convolution kernel is K, the number of input channels is M, the number of output channels is O, the deep convolution is performed on each channel by K × K convolution operation, then the point convolution is used for performing linear fusion on M features, and the number of the point convolution is the number of output channels.
For an input image of size Y × Z × M, the amount of computation through a common convolution is:
Y×Z×M×O×K×K (1)
the amount of computation through the depth separable convolution is:
Y×Z×M×O+Y×Z×M×K×K (2)
by comparison, when the convolution kernel is 3 × 3, the calculation amount of each convolution is reduced to about 1/9, and the convolution mode can effectively combine the characteristics of each channel.
Channel separation recombination
Channel separation and recombination as shown in fig. 5, when applied, the channels can be transmitted in groups, features are respectively extracted, and the channels are reordered when the features are fused finally. Therefore, the number of channels can be reduced during transmission, the image characteristics of all parts can be effectively transmitted to the back during transmission, and the correlation of the image characteristics can be improved.
Lightweight PRM
Each module in the hourglass network is a pyramid residual module. As shown in fig. 6, multi-scale features are added on the basis of a residual error module, the number of scales can be customized, and after features are extracted by convolution, the features are upsampled to the previous resolution ratio for feature fusion.
According to the analysis, the invention designs a lightweight pyramid residual module. As shown in fig. 7a and 7b, the normal convolution is replaced with a depth separable convolution. In experiments, it is found that the depth separable convolution has poor effect on the calculation rate although the parameter amount and the calculation amount are reduced, so that the invention only replaces the convolution of the original resolution branch with the depth separable convolution. Meanwhile, a channel separation module is added at the beginning part of the module, and in order to enable the network to extract more features, the number of channels of the feature extraction branch is not reduced, but half of the channels are selected at the identity mapping part and the features are fused by cascade connection. If the direct fusion has half of the channels with less information of feature extraction, a channel recombination module is added in the back to orderly reorder the channels, and the method enhances the fusion among the features, so that each group of output channels can include all the input features, and the information correlation is enhanced.
Examples
The network proposed herein was trained on the MPII dataset of human pose estimation, with the results shown in fig. 8, which included about 25000 images and 40000 labeled samples after Label Img, with 28000 trainings and 11000 tests. The environment Ubuntu was run, number of iterations 250, batchsize 6, and tested using the Torch7 framework and two NVIDIA 1080ti GPUs. The evaluation results were good using Percentage Correct Keys (PCK) as an accuracy evaluation index.
First, a 1080 × 1920 drone scout image is acquired, cut to size 227 × 227 by windowing, and data-enhanced by dilation, erosion, and bilateral filtering.
Wherein, swell (dilate): the operation of finding the local maximum value expands the boundary of the object, and the specific expansion result is related to the shape of the image and the structural element; corrosion (enode): erosion and dilation are the opposite operations, erosion being the operation of finding a local minimum. The erosion operation causes the highlight areas in the image to gradually decrease.
And secondly, outputting a first feature map through three times of pooling (Max Pool) and convolution, inputting the first feature map to a multi-stage hourglass network through a multi-channel separation module (Split), and outputting a plurality of second feature maps with low resolution.
Wherein the present embodiment uses eight hourglass network stacks as the overall network framework.
Finally, the plurality of second feature maps are subjected to feature fusion through a channel mixing module (Merge), and a portrait feature map is output, so that the portrait posture is estimated according to the portrait feature map.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above implementation method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation method. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. An embedded attitude estimation method based on unmanned aerial vehicle scout images is characterized by specifically comprising the following steps:
step 1: acquiring an original unmanned aerial vehicle scout image data set, and performing data enhancement processing on the original unmanned aerial vehicle scout image data set;
step 2: performing labeling processing on the original unmanned aerial vehicle reconnaissance image data set obtained in the step 1 to obtain a training data set with a label;
and step 3: constructing a lightweight attitude estimation network, and training the lightweight attitude estimation network by using the training data set;
and 4, step 4: the method comprises the steps of obtaining an unmanned aerial vehicle scout image to be processed, preprocessing the unmanned aerial vehicle scout image, inputting the preprocessed unmanned aerial vehicle scout image into a trained lightweight attitude estimation network to obtain a portrait feature map, and estimating the portrait attitude according to the portrait feature map.
2. The embedded pose estimation method according to claim 1, wherein in step 1, the data enhancement process comprises dilation, erosion and bilateral filtering operations.
3. The embedded pose estimation method according to claim 1, wherein in the step 2, labeling processing for adding labels is realized by an image labeling tool.
4. The embedded pose estimation method of claim 1, wherein in step 3, the lightweight pose estimation network comprises a convolutional layer, a pooling layer, a channel separation module, a multi-level hourglass network composed of a plurality of pyramid residual modules, and a channel mixing module.
5. The embedded pose estimation method of claim 4, wherein the convolutional layers are depth separable convolutional layers, including depth convolution processing and point convolution processing.
6. The embedded pose estimation method of claim 5, wherein the depth separable convolutional layer is specifically operable to:
for the common convolution with convolution kernel K, input channel number M and output channel number O, the method is divided into deep convolution processing and point convolution processing,
deep convolution processing: performing a K convolution operation on each input channel;
and (3) point convolution processing: performing linear fusion on the M characteristics, wherein the number of point convolutions is O,
wherein K, M, O are all positive integers.
7. The embedded pose estimation method of claim 4, wherein the channel separation module comprises a plurality of independent feature channels.
8. The embedded pose estimation method according to claim 4, wherein the step 4 specifically comprises:
step 41: inputting an unmanned aerial vehicle scout image to be processed, and intercepting the unmanned aerial vehicle scout image to obtain a reduced-size unmanned aerial vehicle scout image;
step 42: inputting the unmanned aerial vehicle scout image obtained in the step 41 into the trained lightweight attitude estimation network, and performing pooling and convolution to obtain a first characteristic diagram;
step 43: after the first characteristic diagram is input into the multi-stage hourglass network through the channel separation module, a plurality of second characteristic diagrams are output;
step 44: and inputting the plurality of second feature maps into a channel mixing module, performing feature fusion on the plurality of second feature maps, and outputting a portrait feature map, thereby estimating the portrait posture according to the portrait feature map.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 8 are implemented when the program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110004413.7A CN112966546A (en) | 2021-01-04 | 2021-01-04 | Embedded attitude estimation method based on unmanned aerial vehicle scout image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110004413.7A CN112966546A (en) | 2021-01-04 | 2021-01-04 | Embedded attitude estimation method based on unmanned aerial vehicle scout image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112966546A true CN112966546A (en) | 2021-06-15 |
Family
ID=76271221
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110004413.7A Pending CN112966546A (en) | 2021-01-04 | 2021-01-04 | Embedded attitude estimation method based on unmanned aerial vehicle scout image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966546A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116434127A (en) * | 2023-06-14 | 2023-07-14 | 季华实验室 | Human body posture estimation method, device, equipment and storage medium |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239728A (en) * | 2017-01-04 | 2017-10-10 | 北京深鉴智能科技有限公司 | Unmanned plane interactive device and method based on deep learning Attitude estimation |
US20180182109A1 (en) * | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
CN108960211A (en) * | 2018-08-10 | 2018-12-07 | 罗普特(厦门)科技集团有限公司 | A kind of multiple target human body attitude detection method and system |
WO2019000325A1 (en) * | 2017-06-29 | 2019-01-03 | 深圳市大疆创新科技有限公司 | Augmented reality method for aerial photography of unmanned aerial vehicle, processor, and unmanned aerial vehicle |
CN109766887A (en) * | 2019-01-16 | 2019-05-17 | 中国科学院光电技术研究所 | A kind of multi-target detection method based on cascade hourglass neural network |
CN110175524A (en) * | 2019-04-26 | 2019-08-27 | 南京航空航天大学 | A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network |
CN110781765A (en) * | 2019-09-30 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Human body posture recognition method, device, equipment and storage medium |
CN111079556A (en) * | 2019-11-25 | 2020-04-28 | 航天时代飞鸿技术有限公司 | Multi-temporal unmanned aerial vehicle video image change area detection and classification method |
CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
CN111192267A (en) * | 2019-12-31 | 2020-05-22 | 航天时代飞鸿技术有限公司 | Multisource perception fusion remote sensing image segmentation method based on UNET network and application |
CN111461008A (en) * | 2020-03-31 | 2020-07-28 | 华南理工大学 | Unmanned aerial vehicle aerial shooting target detection method combining scene perspective information |
WO2020164270A1 (en) * | 2019-02-15 | 2020-08-20 | 平安科技(深圳)有限公司 | Deep-learning-based pedestrian detection method, system and apparatus, and storage medium |
CN111680655A (en) * | 2020-06-15 | 2020-09-18 | 深延科技(北京)有限公司 | Video target detection method for aerial images of unmanned aerial vehicle |
CN111696033A (en) * | 2020-05-07 | 2020-09-22 | 中山大学 | Real image super-resolution model and method for learning cascaded hourglass network structure based on angular point guide |
CN111815577A (en) * | 2020-06-23 | 2020-10-23 | 深圳供电局有限公司 | Method, device, equipment and storage medium for processing safety helmet wearing detection model |
CN111860175A (en) * | 2020-06-22 | 2020-10-30 | 中国科学院空天信息创新研究院 | Unmanned aerial vehicle image vehicle detection method and device based on lightweight network |
CN112101259A (en) * | 2020-09-21 | 2020-12-18 | 中国农业大学 | Single pig body posture recognition system and method based on stacked hourglass network |
-
2021
- 2021-01-04 CN CN202110004413.7A patent/CN112966546A/en active Pending
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180182109A1 (en) * | 2016-12-22 | 2018-06-28 | TCL Research America Inc. | System and method for enhancing target tracking via detector and tracker fusion for unmanned aerial vehicles |
CN107239728A (en) * | 2017-01-04 | 2017-10-10 | 北京深鉴智能科技有限公司 | Unmanned plane interactive device and method based on deep learning Attitude estimation |
WO2019000325A1 (en) * | 2017-06-29 | 2019-01-03 | 深圳市大疆创新科技有限公司 | Augmented reality method for aerial photography of unmanned aerial vehicle, processor, and unmanned aerial vehicle |
CN108960211A (en) * | 2018-08-10 | 2018-12-07 | 罗普特(厦门)科技集团有限公司 | A kind of multiple target human body attitude detection method and system |
CN109766887A (en) * | 2019-01-16 | 2019-05-17 | 中国科学院光电技术研究所 | A kind of multi-target detection method based on cascade hourglass neural network |
WO2020164270A1 (en) * | 2019-02-15 | 2020-08-20 | 平安科技(深圳)有限公司 | Deep-learning-based pedestrian detection method, system and apparatus, and storage medium |
CN110175524A (en) * | 2019-04-26 | 2019-08-27 | 南京航空航天大学 | A kind of quick vehicle checking method of accurately taking photo by plane based on lightweight depth convolutional network |
CN110781765A (en) * | 2019-09-30 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Human body posture recognition method, device, equipment and storage medium |
CN111160085A (en) * | 2019-11-19 | 2020-05-15 | 天津中科智能识别产业技术研究院有限公司 | Human body image key point posture estimation method |
CN111079556A (en) * | 2019-11-25 | 2020-04-28 | 航天时代飞鸿技术有限公司 | Multi-temporal unmanned aerial vehicle video image change area detection and classification method |
CN111192267A (en) * | 2019-12-31 | 2020-05-22 | 航天时代飞鸿技术有限公司 | Multisource perception fusion remote sensing image segmentation method based on UNET network and application |
CN111461008A (en) * | 2020-03-31 | 2020-07-28 | 华南理工大学 | Unmanned aerial vehicle aerial shooting target detection method combining scene perspective information |
CN111696033A (en) * | 2020-05-07 | 2020-09-22 | 中山大学 | Real image super-resolution model and method for learning cascaded hourglass network structure based on angular point guide |
CN111680655A (en) * | 2020-06-15 | 2020-09-18 | 深延科技(北京)有限公司 | Video target detection method for aerial images of unmanned aerial vehicle |
CN111860175A (en) * | 2020-06-22 | 2020-10-30 | 中国科学院空天信息创新研究院 | Unmanned aerial vehicle image vehicle detection method and device based on lightweight network |
CN111815577A (en) * | 2020-06-23 | 2020-10-23 | 深圳供电局有限公司 | Method, device, equipment and storage medium for processing safety helmet wearing detection model |
CN112101259A (en) * | 2020-09-21 | 2020-12-18 | 中国农业大学 | Single pig body posture recognition system and method based on stacked hourglass network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116434127A (en) * | 2023-06-14 | 2023-07-14 | 季华实验室 | Human body posture estimation method, device, equipment and storage medium |
CN116434127B (en) * | 2023-06-14 | 2023-11-07 | 季华实验室 | Human body posture estimation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647585B (en) | Traffic identifier detection method based on multi-scale circulation attention network | |
CN110246181B (en) | Anchor point-based attitude estimation model training method, attitude estimation method and system | |
CN112528976B (en) | Text detection model generation method and text detection method | |
CN111862126A (en) | Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm | |
CN107329962B (en) | Image retrieval database generation method, and method and device for enhancing reality | |
CN107564009B (en) | Outdoor scene multi-target segmentation method based on deep convolutional neural network | |
CN111476806B (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN110619638A (en) | Multi-mode fusion significance detection method based on convolution block attention module | |
CN112365511B (en) | Point cloud segmentation method based on overlapped region retrieval and alignment | |
CN110705566B (en) | Multi-mode fusion significance detection method based on spatial pyramid pool | |
CN109461177B (en) | Monocular image depth prediction method based on neural network | |
CN111462140B (en) | Real-time image instance segmentation method based on block stitching | |
CN110348531B (en) | Deep convolution neural network construction method with resolution adaptability and application | |
CN112232134A (en) | Human body posture estimation method based on hourglass network and attention mechanism | |
CN112163498A (en) | Foreground guiding and texture focusing pedestrian re-identification model establishing method and application thereof | |
CN112163447B (en) | Multi-task real-time gesture detection and recognition method based on Attention and Squeezenet | |
JP2021096850A (en) | Parallax estimation system and method, electronic apparatus, and computer readable storage medium | |
CN110807379A (en) | Semantic recognition method and device and computer storage medium | |
CN113850136A (en) | Yolov5 and BCNN-based vehicle orientation identification method and system | |
CN112528858A (en) | Training method, device, equipment, medium and product of human body posture estimation model | |
CN111914596B (en) | Lane line detection method, device, system and storage medium | |
CN112669452B (en) | Object positioning method based on convolutional neural network multi-branch structure | |
CN112966546A (en) | Embedded attitude estimation method based on unmanned aerial vehicle scout image | |
CN113298922A (en) | Human body posture estimation method and device and terminal equipment | |
CN111931793B (en) | Method and system for extracting saliency target |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |