WO2019033747A1 - 无人机智能跟随目标确定方法、无人机和遥控器 - Google Patents

无人机智能跟随目标确定方法、无人机和遥控器 Download PDF

Info

Publication number
WO2019033747A1
WO2019033747A1 PCT/CN2018/078582 CN2018078582W WO2019033747A1 WO 2019033747 A1 WO2019033747 A1 WO 2019033747A1 CN 2018078582 W CN2018078582 W CN 2018078582W WO 2019033747 A1 WO2019033747 A1 WO 2019033747A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
layer
candidate
downsampling
Prior art date
Application number
PCT/CN2018/078582
Other languages
English (en)
French (fr)
Inventor
梅江元
Original Assignee
深圳市道通智能航空技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市道通智能航空技术有限公司 filed Critical 深圳市道通智能航空技术有限公司
Priority to EP18717495.8A priority Critical patent/EP3471021B1/en
Priority to US15/980,051 priority patent/US10740607B2/en
Publication of WO2019033747A1 publication Critical patent/WO2019033747A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7784Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors
    • G06V10/7788Active pattern-learning, e.g. online learning of image or video features based on feedback from supervisors the supervisor being a human, e.g. interactive learning with a human teacher
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones

Definitions

  • Embodiments of the present invention relate to the field of computer vision, and in particular, to a method for determining an intelligent following target of a drone, a drone, and a remote controller.
  • the selection and recognition of dynamic targets is generally accomplished by a deep learning-based target detection algorithm. Based on the deep learning target detection algorithm, the target recognition and localization process can be completed in a unified deep learning framework with accurate positioning. There are many types of identifiable.
  • the inventors have found that at least the following problems exist in the related art: the target detection algorithm based on deep learning has a large amount of calculation, and the image prediction process needs to reach one billion or even one billion floating-point operations, if a general processor is used. , it will lead to very long computing time, it is difficult to achieve real-time requirements, therefore, the processor requirements are higher.
  • An object of the embodiments of the present invention is to provide a UAV intelligent following target determination method, a drone, and a remote controller, which have a small amount of calculation, a short calculation time, and low requirements on hardware devices.
  • an embodiment of the present invention provides a method for determining a smart follow target of a drone, the method comprising:
  • the electronic device acquires an image returned by the drone
  • the electronic device obtains a region image of interest according to a click of the image returned by the user to the drone;
  • the electronic device loads a deep learning network model and inputs the acquired region image into the deep learning network model, and uses the deep learning network model to output a plurality of candidate circumscribed frames for selecting targets in the region image and the The probability that the target in the candidate add-in box belongs to the preset category;
  • a target follow command is sent to the drone.
  • the method further includes: if there is no target image, the electronic device prompts the user that there is no target of interest in the image.
  • the method further includes: if the target image does not exist, the electronic device prompts the user to click the image again to reselect the target of interest.
  • determining, according to the probability that the target in the candidate enclosure and the target in the candidate enclosure belongs to the preset category, whether the target image exists in the area image including:
  • step S2 repeating step S1 for each of the other preset categories
  • step S3 For each candidate circumstance remaining after the step S2 is performed, obtain a preset category with the highest probability of the target belonging to each preset category in the candidate circumscribed box as the category to which the target in the candidate external frame belongs And the target whose maximum probability is greater than the second preset threshold is taken as the possible target image;
  • the coordinates of the target image may be (x o , y o ), and the coordinates of the click position are (x p , y p );
  • the deep learning network model includes at least 2 convolution layers and at least 2 sampling layers.
  • the deep learning network model includes:
  • a first convolution layer a first downsampling layer, a second convolutional layer, a second downsampling layer, a third convolutional layer, a third downsampling layer, a fourth convolutional layer, a fourth downsampling layer, and a fifth Convolution layer, fifth downsampling layer, sixth convolutional layer, sixth downsampling layer, seventh convolutional layer, eighth convolutional layer, and regional layer.
  • the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer, and the sixth convolution In the layer, the number of filters of the latter convolutional layer is twice that of the previous convolutional layer filter, and the number of filters of the sixth convolutional layer and the seventh convolutional layer is equal;
  • the window size of the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, and the fifth downsampling layer is 2*2 pixels, and the jump interval 2, the sixth downsampling layer has a window size of 2*2 pixels and a jump interval of 1.
  • the number of filters of the first convolution layer is 4, the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, The fifth downsampling layer and the sixth downsampling layer both adopt a maximum value downsampling method.
  • each of the convolution layers uses a filter of 3*3 pixels.
  • the area image size is 288*288 pixels, and 9*9*5 candidate outer frames are obtained by using the deep learning network model.
  • an embodiment of the present invention provides a method for determining a smart follow target of a drone, the method comprising:
  • the drone acquires an image
  • the drone acquires a picture of a region of interest according to a click operation of the user
  • the UAV loads a deep learning network model and inputs the acquired region image into the deep learning network model, and uses the deep learning network model to output a plurality of candidate outer frames and targets for selecting targets in the region image.
  • the probability that the target in the candidate enclosure is a preset category;
  • the target is followed.
  • the method further includes: if there is no target image, the drone sends an instruction to the electronic device, the instruction is used to prompt the user that there is no target of interest in the image.
  • the instructions are further configured to prompt the user to click the image again to reselect the target of interest.
  • determining, according to the probability that the target in the candidate enclosure and the target in the candidate enclosure belongs to the preset category, whether the target image exists in the area image including:
  • step S2 repeating step S1 for each of the other preset categories
  • step S3 For each candidate circumstance remaining after the step S2 is performed, obtain a preset category with the highest probability of the target belonging to each preset category in the candidate circumscribed box as the category to which the target in the candidate external frame belongs And the target whose maximum probability is greater than the second preset threshold is taken as the possible target image;
  • the coordinates of the target image may be (x o , y o ), and the coordinates of the click position are (x p , y p );
  • the deep learning network model includes at least 2 convolution layers and at least 2 sampling layers.
  • the deep learning network model includes:
  • a first convolution layer a first downsampling layer, a second convolutional layer, a second downsampling layer, a third convolutional layer, a third downsampling layer, a fourth convolutional layer, a fourth downsampling layer, and a fifth Convolution layer, fifth downsampling layer, sixth convolutional layer, sixth downsampling layer, seventh convolutional layer, eighth convolutional layer, and regional layer.
  • the first convolution layer, the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer, and the sixth convolution In the layer, the number of filters of the latter convolutional layer is twice that of the previous convolutional layer filter, and the number of filters of the sixth convolutional layer and the seventh convolutional layer is equal;
  • the window size of the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, and the fifth downsampling layer is 2*2 pixels, and the jump interval 2, the sixth downsampling layer has a window size of 2*2 pixels and a jump interval of 1.
  • the number of filters of the first convolution layer is 4, the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, The fifth downsampling layer and the sixth downsampling layer both adopt a maximum value downsampling method.
  • each of the convolution layers uses a filter of 3*3 pixels.
  • the area image size is 288*288 pixels, and 9*9*5 candidate outer frames are obtained by using the deep learning network model.
  • an embodiment of the present invention provides a remote controller, including:
  • a signal receiver for receiving an image returned by the drone
  • a signal transmitter for transmitting an instruction to the drone
  • the processor is used to:
  • a target follow command is sent to the drone by the signal transmitter.
  • the display screen displays a prompt for no target of interest within the image.
  • the display screen displays a prompt to re-click the image to reselect the target of interest.
  • the processor is further configured to perform the method of any one of the first aspects.
  • an embodiment of the present invention provides a drone, including a body, a arm connected to the body, a power device disposed on the arm, an image sensor for acquiring an image, and a processor and a signal transmitter within the body, the processor for:
  • the method further includes: if there is no target image, the drone sends an instruction to the electronic device by using the signal transmitter, the instruction is used to prompt the user that there is no target of interest in the image. .
  • the instructions are further configured to prompt the user to click the image again to reselect the target of interest.
  • the processor is further configured to perform the method of any one of the second aspects.
  • an embodiment of the present invention provides a non-transitory computer readable storage medium storing computer executable instructions, when the computer executable instructions are executed by an electronic device, The method of any of the first aspect of the electronic device.
  • an embodiment of the present invention provides a non-transitory computer readable storage medium, where the computer-readable storage medium stores computer-executable instructions, when the computer-executable instructions are executed by a drone, The drone is caused to perform the method of any of the second aspects.
  • an embodiment of the present invention provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program The instructions, when executed by the electronic device, cause the electronic device to perform the method of the first aspect.
  • an embodiment of the present invention provides a computer program product, the computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program When the command is executed by the drone, the drone is caused to perform the method described in the second aspect.
  • the embodiment of the present invention acquires a region image of interest on the original image according to the click position of the user, and inputs the image of the region of interest into the deep learning network model for target prediction, and the calculation amount is small. , short computing time, low requirements for hardware devices.
  • FIG. 1 is a schematic diagram of an application scenario of a method and apparatus for determining a target according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of an embodiment of an object determining method of the present invention performed by an electronic device
  • FIG. 3 is a schematic diagram of a process of an embodiment of the method for determining an object of the present invention.
  • FIG. 4 is a schematic diagram of a deduplication processing step of a candidate outer frame in an embodiment of the object determining method of the present invention
  • FIG. 5 is a schematic diagram of a network structure based on a deep learning algorithm according to an embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of an embodiment of a method for determining an object of the present invention performed by a drone;
  • Figure 7 is a schematic structural view of an embodiment of the object determining apparatus of the present invention.
  • Figure 8 is a schematic structural view of an embodiment of the object determining apparatus of the present invention.
  • FIG. 9 is a schematic structural diagram of hardware of a drone according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of hardware of an electronic device according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of hardware of a remote controller according to an embodiment of the present invention.
  • the unmanned aerial vehicle intelligent following target determining method and apparatus are applicable to the application scenario shown in FIG. 1 .
  • the application scenario includes the drone 10, the electronic device 20, and the user 30.
  • the drone 10 can be any suitable type of high altitude or low altitude aircraft, including a typical four-axis aircraft, a hovering RC helicopter or a fixed-wing aircraft with a certain speed of movement, and the like.
  • the electronic device 20 can be, for example, a remote control, a smart phone, a tablet, a personal computer, a laptop, or the like.
  • User 30 can interact with electronic device 20 by any suitable type of one or more user interaction devices, such as a mouse, a button, a touch screen, and the like.
  • the drone 10 and the electronic device 20 can establish a communication connection, upload or issue data/instructions through wireless communication modules respectively disposed therein.
  • the drone 10 can achieve tracking of a target, such as a particular person, car, boat, or animal, etc., in order to track the target, the drone 10 needs to first determine the target.
  • the drone 10 is provided with at least one image capturing device, such as a high-definition camera or a motion camera, for performing image capturing.
  • the drone 10 transmits the image back to the electronic device 20 over a wireless network, and the electronic device 20 displays the image on the screen of the electronic device 20.
  • the user 30 can operate on the image, such as clicking on a target of interest on the image, and the electronic device 20 determines the location of the object of interest in the image in accordance with a click operation by the user 30.
  • the target in the captured image can be identified and confirmed based on the deep learning network model, and if the entire original image is image-recognized, the amount of calculation is large.
  • the image of the region of interest of the user 30 is obtained on the original image according to the click position of the user 30, and then the image is recognized for the region image of the user 30, which is relatively small in calculation amount and fast in recognition speed.
  • the area image of interest of the user 30 may be obtained from the original image by the electronic device 20 according to the click position of the user 30, and then the target image of the area image of interest to the user 30 is subjected to target recognition based on the depth learning algorithm to obtain the target image.
  • the deep learning network model is loaded on the electronic device 20, and the target identification and confirmation are completed on the electronic device 20.
  • the computing resources of the drone 10 are not occupied, and the hardware cost of the drone 10 is not increased. Under the conditions, new functions have been added to the drone 10.
  • the deep learning network model may also be loaded on the UAV 10 end, and the electronic device 20 transmits the area image of the user 30's interest and the click position of the user 30 to the drone 10, by the drone. 10
  • the area picture of the user 30 is subjected to target recognition based on the depth learning algorithm to obtain the target image.
  • the electronic device 20 can also transmit only the click position of the user 30 to the drone 10, and the drone 10 obtains the region image of interest on the original image based on the click position, and recognizes based on the region image of interest.
  • FIG. 2 is a schematic flowchart of a method for determining an intelligent following target of a drone according to an embodiment of the present invention. The method may be performed by the electronic device 20 in FIG. 1 , as shown in FIG. 2 , the method includes:
  • the electronic device 20 acquires an image returned by the drone 10;
  • the UAV 10 transmits the image to the electronic device 20 after the image is taken, and after the electronic device 20 receives the image returned by the UAV 10, the image can be displayed on the screen of the electronic device 20.
  • the electronic device 20 obtains a region image of interest according to a click operation of the image returned by the user 30 to the drone 10 .
  • the user 30 can click on the image returned by the drone 10 on the screen of the electronic device 20 to determine the target to be tracked, and the electronic device 20 can acquire the image of the region of interest to the user 30 according to the click position of the user 30. For example, according to the position coordinates (x m , y m ) clicked by the user 30 on the screen, the corresponding coordinates (x p , y p ) of the click position on the image are determined, and according to the coordinates (x p , y p ) crop the original image to obtain a region image of interest.
  • steps (1)-(2) show the acquisition process of the region image of interest, wherein the "+" mark in the figure indicates the click position of the user 30, and the portion enclosed by the dotted line frame is obtained. Area picture.
  • the electronic device 20 loads a deep learning network model and inputs the acquired region image into a deep learning network model, and outputs, by using the deep learning network model, a plurality of candidate circumscribed frames and targets of the target in the region image.
  • a deep learning-based network model Before performing target prediction, a deep learning-based network model can be obtained in advance, including:
  • the preset categories are: people; miniature, small, medium-sized cars, etc.; buses, trucks, etc.; agricultural vehicles, tricycles, tractors, etc.; bicycles, motorcycles, and other riding targets; water targets such as ships; Waiting for flying targets; common pets such as cats and dogs; other animals; other significant goals.
  • the number of preset categories can be any number, for example, 10.
  • steps (3)-(4) in FIG. 3 illustrate a process of predicting a region of interest image based on a deep learning network model.
  • the region image is input into the deep learning network model, and a plurality of candidate circumscribed boxes for all the targets in the region image are obtained, and the targets in each candidate circumscribed box have a probability corresponding to each preset category.
  • a 288*288 pixel region of interest image is input for prediction based on a deep learning network model as shown in FIG. 5, and 9*9*5*15 prediction results are output.
  • 9*9*5 indicates the number of candidate circumscribed boxes
  • “5” is obtained by training sample mean clustering
  • “15” represents the parameters of each candidate circumscribed box, which are respectively 4 position parameters (including coordinates and length and width). )
  • the probability of 10 belonging preset categories and whether 1 is the probability parameter of the target.
  • the 405 candidate splicing boxes provide a sufficient number of minimum splicing boxes from which to select the optimal target image. Setting the input image to a resolution of 288x288 improves the recognition speed based on the recognition accuracy.
  • the deep learning-based network model includes at least two convolution layers and at least two sampling layers.
  • the deep learning network model 300 shown in FIG. 5 may be used, including a 15-layer network structure, and the 15-layer network structure avoids excessive over-fitting of the number of layers, and avoids the number of layers being too low and accurate enough to optimize. Deep learning network structure.
  • the deep learning network model includes:
  • a first convolution layer a first downsampling layer, a second convolutional layer, a second downsampling layer, a third convolutional layer, a third downsampling layer, a fourth convolutional layer, a fourth downsampling layer, and a fifth Convolution layer, fifth downsampling layer, sixth convolutional layer, sixth downsampling layer, seventh convolutional layer, eighth convolutional layer, and regional layer.
  • the number of filters of the latter convolution layer is twice the number of filters of the previous convolution layer, and the number of seventh convolution layer and eighth convolution layer filters equal. Referring to FIG. 5, if the number of filters of the first convolutional layer is 4, the number of filters of the subsequent convolutional layer is 8, 16, 32, 64, 128, 256, 256.
  • the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, and the fifth downsampling layer have a window size of 2*2 pixels and a jump interval of 2, and the sixth downsampling layer
  • the window size is 2*2 pixels and the jump interval is 1.
  • each convolution layer may use a 3*3 pixel filter, and the 3*3 pixel filter has a small amount of calculation.
  • the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, the fifth downsampling layer, and the sixth downsampling layer may adopt a maximum sampling method .
  • the number of the next convolutional layer filter is twice that of the previous convolutional layer filter (except for the last convolutional layer), and each time a convolutional layer is passed , the number of features doubled.
  • the downsampling layer has a window size of 2x2 and a hopping interval of 2 (except for the last downsampling layer), and the feature resolution is halved each time the downsampling layer is performed. This setting combines the resolution and the number of features, and the reduction in resolution corresponds to an increase in the number of features.
  • the electronic device 20 if there is no target image, the electronic device 20 prompts the user 30 that there is no target of interest within the image.
  • the electronic device 20 may further prompt the user to re-click on the image to reselect the target of interest.
  • the steps (5)-(6)-(7) of FIG. 3 show the process of confirming the target image. Confirm whether there is a target image in the image of the area, including the following steps:
  • Step 1 For each of the preset categories, obtain candidate candidate enclosures corresponding to the target with the highest probability of the preset category, and calculate a coincidence ratio of the other candidate enclosures and the candidate enclosure with the highest probability, respectively, and the coincidence ratio
  • the probability that the target within the candidate splicing box that is greater than the first preset threshold belongs to the preset category is set to zero.
  • the number of preset categories is j
  • the probability that the i-th candidate bounding box belongs to the j-th preset category is P ij .
  • a default category for each j ordered according to P ij, P ij to large external candidate block top surface, P ij small external candidate frame at the back.
  • a default category for the same j order the calculated coincidence rate P ij IOU behind the maximum outer frame and a candidate block and the other candidate P ij external largest external candidate frame, if the coincidence rate greater than a first IOU
  • the preset threshold ⁇ 1 sets the probability that the subsequent candidate circumsound belongs to the preset category j to be zero.
  • the coincidence rate IOU is used to indicate the degree of coincidence of the two candidate circumscribing frames. The greater the coincidence rate, the more similar the two candidate circumstance frames are. When the coincidence rate of the two candidate circumscribed frames is greater than the first preset threshold ⁇ 1 , the description The two candidate enclosures have a high degree of similarity. In order to simplify the calculation, the candidate outer frame with less probability is removed, and the first preset threshold ⁇ 1 can be set according to the actual application.
  • the coincidence rate IOU can be based on equation (1).
  • S 1 and S 2 represent the areas of the two candidate circling frames, respectively, and S 12 is the area of the overlapping portions of the two.
  • Step 2 Repeat step 1 for each of the other preset categories.
  • the de-duplication processing of the above step 1 is performed for each of the remaining preset categories.
  • Step 3 For each candidate circumstance remaining after the step 2 is performed, obtain a preset category with the highest probability of the target belonging to each preset category in the candidate circumscribed box as the target in the candidate circumscribed box The category, and the target whose maximum probability is greater than the second preset threshold ⁇ 2 is taken as the possible target image.
  • the maximum value P i max(P ij ) of the probability that the target in each candidate circumsole belongs to each preset category is sequentially calculated for each candidate circumseat remaining after the reprocessing in steps 1 and 2, and recorded.
  • the probability value P i represents the possibility that the target belongs to the category to which the target belongs. The larger the P i value is, the greater the probability that the target belongs to the category. If the P i value is greater than the second preset threshold ⁇ 2 , the candidate external connection is indicated. It is highly probable that the frame belongs to the category to which it belongs. To further simplify the calculation, the candidate outer frame corresponding to the target whose P i is smaller than the second preset threshold ⁇ 2 is removed, and the value of the second preset threshold ⁇ 2 may be based on the actual application. Situation settings.
  • Step 4 Calculate a distance coefficient between the possible target image and the click position of the user 30, and the distance coefficient ⁇ is expressed as:
  • the coordinates of the target image may be (x o , y o ), and the coordinates of the click position are (x p , y p ).
  • the distance coefficient ⁇ characterizes the distance of the candidate outer frame from the click position of the user 30. The larger the ⁇ is, the closer the candidate outer frame is to the user 30 click position, and the smaller the ⁇ is, the farther the candidate external frame is from the user 30 click position.
  • the distance coefficient calculation formula can distinguish each type of target, and even if the click position of the user 30 occurs outside the target image, the target can be accurately framed.
  • Step 5 Obtain the product ⁇ i of the distance coefficient of each possible target image and the probability of the category to which it belongs, and find the maximum value max( ⁇ i ) of the product, if the maximum value max( ⁇ i ) is greater than the third preset
  • the threshold value ⁇ 3 is a target image corresponding to the maximum value max( ⁇ i ) of the product, and the category to which the target image belongs is recorded.
  • the determination value ⁇ represents the possibility that the possible target image is far from the click position of the user 30 and belongs to the category to which it belongs.
  • the ⁇ value is greater than the third preset threshold ⁇ 3 , it indicates that the possible target image is relatively close to the click position of the user 30, and the possibility of belonging to the category to which it belongs is relatively large, and the possible target image may be used as the target image, and the third preset
  • the value of the threshold ⁇ 3 can be set according to the actual application.
  • the flight strategy may be adjusted according to the category of the target image.
  • the target is a fast moving large target such as a vehicle, and the drone 10 needs to increase its own flying height and flight speed to obtain a larger field of view and tracking speed; and if the target is a small target such as a person, the drone 10 It is necessary to reduce the height reduction speed to ensure that the target is not lost in the field of view due to being too small.
  • the embodiment of the present invention acquires a region image of interest on the original image according to the click position of the user 30, and takes the image of the region of interest as an input, and performs target prediction based on the network model of deep learning, which has small calculation amount and short calculation time. Low requirements for hardware devices.
  • the embodiment of the present invention further provides another unmanned aerial vehicle intelligent following target determining method, which can be executed by the drone 10 in FIG. As shown in FIG. 6, the method includes:
  • the drone 10 acquires an image.
  • the drone 10 collects images through an image capture device.
  • the drone 10 acquires a picture of the area of interest according to the click operation of the user 30.
  • the drone 10 transmits the collected original image back to the electronic device 20 through the wireless network, and the area image of interest to the user 30 can be obtained according to the click operation of the original image by the user 30.
  • the area picture of interest of the user 30 may be acquired by the electronic device 20 according to the click operation of the user 30, and the picture of the area of interest of the user 30 is transmitted back to the drone 10.
  • the electronic device 20 transmits only the click position of the user 30 to the drone 10, and the drone 10 obtains the area picture of interest of the user 30 from the original image according to the click position of the user 30.
  • the drone 10 loads the deep learning network model and inputs the acquired region image into the deep learning network model, and uses the deep learning network model to output a plurality of candidate circumscribed frames for selecting targets in the region image and the The probability that the target within the candidate add-in box belongs to the preset category.
  • the drone 10 sends an instruction to the electronic device 20 for prompting the user 30 that there is no target of interest within the image.
  • the instructions may further be used to prompt the user to re-click the image to reselect the target of interest.
  • the deep learning-based network model includes at least 2 convolution layers and at least 2 sampling layers.
  • the deep learning network model 300 shown in FIG. 5 may be used.
  • details of the specific structure and technical details of the deep learning network model refer to the above description about the network model based on deep learning, and details are not described herein again.
  • the image of the region of interest to the user 30 is obtained, and the image of the region of interest is used as the input of the network model based on the deep learning, and the target prediction is performed, the calculation amount is small, the calculation time is short, and the hardware device requirements are low.
  • an embodiment of the present invention further provides a UAV intelligent following target determining apparatus, which is used in an electronic device 20.
  • the device 300 includes:
  • the image obtaining module 301 is configured to acquire an image returned by the drone 10;
  • the image processing module 302 is configured to obtain a region image of interest according to a click of the image returned by the user 30 to the drone 10;
  • the image prediction module 303 is configured to load a deep learning network model and input the acquired region image into a deep learning network model, and use the deep learning network model to output a plurality of candidate outer frames and objects of the target in the region image.
  • the probability that the target in the candidate enclosure is a preset category;
  • the target image confirming module 304 is configured to determine whether a target image exists in the area image according to a probability that the candidate outer frame and the target in the candidate outer frame belong to a preset category; if the target image exists, send a target follow command Give the drone.
  • the target image confirmation module 304 is further configured to prompt the user that there is no target of interest within the image if the target image does not exist.
  • the target image validation module 304 may further prompt the user to re-click on the image to reselect the target of interest.
  • the embodiment of the present invention acquires a region image of interest on the original image according to the click position of the user 30, and takes the image of the region of interest as an input, and performs target prediction based on the network model of deep learning, which has small calculation amount and short calculation time. Low requirements for hardware devices.
  • the target image confirmation module 304 is specifically configured to:
  • step S2 repeating step S1 for each of the other preset categories
  • step S3 For each candidate circumstance remaining after the step S2 is performed, obtain a preset category with the highest probability of the target belonging to each preset category in the candidate circumscribed box as the category to which the target in the candidate external frame belongs And the target whose maximum probability is greater than the second preset threshold ⁇ 2 is taken as the possible target image;
  • the coordinates of the target image may be (x o , y o ), and the coordinates of the click position are (x p , y p );
  • the deep learning network model includes at least 2 convolution layers and at least 2 sampling layers.
  • the deep learning network model includes:
  • a first convolution layer a first downsampling layer, a second convolutional layer, a second downsampling layer, a third convolutional layer, a third downsampling layer, a fourth convolutional layer, a fourth downsampling layer, and a fifth Convolution layer, fifth downsampling layer, sixth convolutional layer, sixth downsampling layer, seventh convolutional layer, eighth convolutional layer, and regional layer.
  • the filtering of the subsequent convolutional layer is twice that of the previous convolutional layer filter, and the number of filters of the sixth convolutional layer and the seventh convolutional layer is equal;
  • the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, and the fifth downsampling layer have a window size of 2*2 pixels and a jump interval of 2, and the sixth downsampling layer
  • the window size is 2*2 pixels and the jump interval is 1.
  • the number of filters of the first convolution layer is 4, the first downsampling layer, the second downsampling layer, the third downsampling layer, the fourth downsampling layer, and the fifth downsampling Both the layer and the sixth downsampling layer use a maximum downsampling method.
  • each of the convolution layers uses a filter of 3*3 pixels.
  • the area picture size is 288*288 pixels, and a total of 9*9*5 candidate outer frames are obtained by using the deep learning network model.
  • the embodiment of the present invention further provides a UAV intelligent following target determining device, which is used in the UAV 10, and the device 400 includes:
  • An image acquisition module 401 configured to acquire an image
  • a second image processing module 402 configured to acquire a region image of interest according to a click operation of the user 30;
  • An image prediction module 303 configured to load a deep learning network model and input the acquired region image into the deep learning network model, and use the deep learning network model to output a plurality of candidate outer frames for selecting targets in the region image And a probability that the target in the candidate enclosure is a preset category;
  • the target image confirming module 304 is configured to determine whether a target image exists in the area image according to a probability that the candidate outer frame and the target in the candidate outer frame belong to a preset category; if the target image exists, follow the target .
  • the drone 10 if there is no target image, the drone 10 sends an instruction to the electronic device 20 for prompting the user 30 that there is no target of interest within the image.
  • the instructions may further be used to prompt the user to re-click the image to reselect the target of interest.
  • the deep learning based network model includes at least 2 convolution layers and at least 2 sampling layers.
  • the deep learning network model 300 shown in FIG. 5 may be used.
  • the image of the region of interest to the user 30 is obtained, and the image of the region of interest is used as the input of the network model based on the deep learning, and the target prediction is performed, the calculation amount is small, the calculation time is short, and the hardware device requirements are low.
  • FIG. 9 is a schematic diagram showing the hardware structure of the unmanned aerial vehicle 10 according to the embodiment of the present invention.
  • the unmanned aerial vehicle 10 includes: a fuselage 14 and an arm 15 connected to the fuselage 14 The power unit 17 of the arm, the image sensor 16 for acquiring an image, the processor 11 provided in the body 14, the signal transmitter 13, and the memory 12 built in or externally mounted to the drone 10 (memory in FIG. 9) 12 is built into the drone 10 as an example).
  • the processor 11 and the memory 12 can be connected by a bus or other means.
  • the memory 12 is a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the target determining method in the embodiment of the present invention. / unit (for example, the image acquisition module 401, the second image processing module 402, the image prediction module 303, and the target image confirmation module 304 shown in FIG. 8).
  • the processor 11 executes various functional applications and data processing of the drone 10 by executing non-volatile software programs, instructions, and units stored in the memory 12, that is, the object determining method of the above-described method embodiments.
  • the memory 12 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the user terminal device, and the like. Further, the memory 12 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, memory 12 can optionally include memory remotely located relative to processor 11 that can be connected to drone 10 via a network.
  • the one or more modules are stored in the memory 12, and when executed by the one or more processors 11, performing the target determination method in any of the above method embodiments, for example, performing the above described FIG.
  • the method steps 201 to 204 implement the functions of the image acquisition module 401, the second image processing module 402, the image prediction module 303, and the target image validation module 304 in FIG.
  • the drone 10 determines that the target image exists by using the target determination method, the target is followed. Alternatively, if the target image does not exist, the drone 10 sends an instruction to the electronic device 20 for prompting the user that there is no object of interest within the image. The instructions may be further for prompting the user to re-click the image to reselect the target of interest.
  • the above-mentioned UAV 10 can perform the target determination method provided by the embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
  • the method of object determination provided by the embodiments of the present invention.
  • Embodiments of the present invention provide a non-transitory computer readable storage medium storing computer-executable instructions that are executed by one or more processors, for example, to perform the above
  • the method steps 201 to 204 in FIG. 6 described describe the functions of the image acquisition module 401, the second image processing module 402, the image prediction module 303, and the target image validation module 304 in FIG.
  • FIG. 10 is a schematic diagram showing the hardware structure of an electronic device 20 according to an embodiment of the present invention. As shown in FIG. 10, the electronic device 20 includes:
  • One or more processors 21 and a memory 22 are exemplified by a processor 21 in FIG.
  • the processor 21 and the memory 22 can be connected by a bus or other means, as exemplified by a bus connection in FIG.
  • the memory 22 is a non-volatile computer readable storage medium, and can be used for storing non-volatile software programs, non-volatile computer-executable programs, and modules, such as program instructions corresponding to the target determining method in the embodiment of the present invention. / unit (for example, the image acquisition module 301, the image processing module 302, the image prediction module 303, and the target image confirmation module 304 shown in FIG. 7).
  • the processor 21 executes various functional applications and data processing of the electronic device 20 by executing non-volatile software programs, instructions, and units stored in the memory 22, that is, the object determining method of the above-described method embodiments.
  • the memory 22 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the electronic device 20, and the like. Further, the memory 22 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other nonvolatile solid state storage device. In some embodiments, the memory 22 can optionally include memory remotely located relative to the processor 21, which can be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more units are stored in the memory 22, and when executed by the one or more processors 21, perform the target determination method in any of the above method embodiments, for example, performing the above described FIG.
  • the method steps 101-104 implement the functions of the image acquisition module 301, the image processing module 302, the image prediction module 303, and the target image validation module 304 shown in FIG.
  • the above-mentioned electronic device 20 can perform the target determination method provided by the embodiment of the present invention, and has the corresponding functional modules and beneficial effects of the execution method.
  • the method for determining the object provided by the embodiment of the present invention For technical details that are not described in detail in the embodiment of the electronic device 20, reference may be made to the method for determining the object provided by the embodiment of the present invention.
  • the electronic device 20 of the embodiment of the present application exists in various forms including, but not limited to:
  • Mobile communication devices These devices are characterized by mobile communication functions and are mainly aimed at providing voice and data communication.
  • Such terminals include: smart phones (such as iPhone), multimedia phones, functional phones, and low-end phones.
  • Ultra-mobile PC devices These devices belong to the category of personal computers, have computing and processing functions, and generally have mobile Internet access.
  • Such terminals include: PDAs, MIDs, and UMPC devices, such as the iPad.
  • Portable entertainment devices These devices can display and play multimedia content. Such devices include: audio, video players (such as iPod), handheld game consoles, e-books, and smart toys and portable car navigation devices.
  • Server A device that provides computing services.
  • the server consists of a processor, a hard disk, a memory, a system bus, etc.
  • the server is similar to a general computer architecture, but because of the need to provide highly reliable services, processing power and stability High reliability in terms of reliability, security, scalability, and manageability.
  • the electronic device 20 may be a remote controller as shown in FIG. 11 .
  • the remote controller includes an operating lever 25 , a signal receiver 26 , a signal transmitter 23 , and a display screen 24 in addition to the processor 21 and the memory 22 described above.
  • the signal receiver 26 is configured to receive an image returned by the drone 10, and the signal transmitter 23 is configured to send an instruction to the drone 10.
  • the target follow command is transmitted to the drone 10 by the signal transmitter 23.
  • the display screen 24 displays a prompt for no target of interest within the image, and the display screen 24 may further display a prompt to re-click the image to reselect the target of interest.
  • Embodiments of the present invention also provide a non-transitory computer readable storage medium storing computer executable instructions that are executed by one or more processors, for example,
  • the method steps 101-104 in FIG. 2 described above implement the functions of the image acquisition module 301, the image processing module 302, the image prediction module 303, and the target image validation module 304 shown in FIG.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware.
  • a person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Abstract

本发明实施例公开了一种无人机智能跟随目标确定方法、无人机和电子设备。所述方法包括:获取无人机传回的图像;根据用户对所述无人机传回的图像的点击获得感兴趣的区域图片;加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像。本发明实施例通过获取用户感兴趣的图像,从而可以将该用户感兴趣的图像作为基于深度学习算法网络模型的输入,进行目标预测,计算量小、运算时间短、对硬件设备要求低。

Description

无人机智能跟随目标确定方法、无人机和遥控器 技术领域
本发明实施例涉及计算机视觉领域,特别涉及一种无人机智能跟随目标确定方法、无人机和遥控器。
背景技术
随着无人机技术的发展,无人机在军事及民用领域都得到了广泛的应用。随着应用的日趋广泛,对无人机的性能也不断提出新的要求,尤其是在智能化上的要求。基于视觉的无人机智能跟随是智能化无人机的关键功能之一,在行业中也有着重要的应用。在消费级无人机领域,智能跟随功能是高端无人机的重要标志之一,为用户带来很多乐趣;在行业级无人机领域,智能跟随可以用于逃犯追踪、异常目标行为分析等,对国家安全以及治安维稳有着重要的意义。
在智能跟随技术中,如何对无人机拍摄的目标进行初始化一直是智能跟随的难题之一,如果目标无法准确框选,接下来的跟踪过程很容易出现目标丢失的情况。目前,动态目标的框选和识别一般是通过基于深度学习的目标检测算法来完成,基于深度学习的目标检测算法,可以将目标识别与定位过程在一个统一的深度学习框架中完成,且定位准确、可识别种类很多。
实现本发明过程中,发明人发现相关技术中至少存在如下问题:基于深度学习的目标检测算法计算量大,一次图片预测过程要达到十亿甚至百亿次浮点运算,如果采用一般的处理器,则会导致运算时间非常长,很难达到实时性要求,因此,对处理器的要求较高。
发明内容
本发明实施例的目的是提供一种算法计算量小、运算时间短、对硬 件设备要求低的无人机智能跟随目标确定方法、无人机和遥控器。
第一方面,本发明实施例提供了一种无人机智能跟随目标确定方法,所述方法包括:
所述电子设备获取无人机传回的图像;
所述电子设备根据用户对所述无人机传回的图像的点击获得感兴趣的区域图片;
所述电子设备加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;
如果存在目标图像,则发送目标跟随命令给所述无人机。
可选的,所述方法还包括:如果不存在目标图像,所述电子设备提示用户所述图像内无感兴趣的目标。
可选的,所述方法还包括:如果不存在目标图像,所述电子设备提示用户重新点击所述图像以重新选择感兴趣目标。
可选的,所述依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像,包括:
S1:针对每一个所述预设类别,获取属于该预设类别概率最大的目标对应的候选外接框,分别计算其他各个候选外接框与该候选外接框的重合率,将重合率大于第一预设阀值的候选外接框内的目标属于该预设类别的概率置为零;
S2:针对其他各个预设类别,重复步骤S1;
S3:针对执行完步骤S2后剩下的每个候选外接框,获取该候选外接框内的目标属于各个预设类别的概率中概率最大的预设类别作为该候选外接框内的目标所属的类别,并将最大概率大于第二预设阈值的目标作为可能目标图像;
S4:计算所述可能目标图像与所述用户点击位置的距离系数,则所 述距离系数δ表示为:
Figure PCTCN2018078582-appb-000001
其中,可能目标图像的坐标为(x o,y o),点击位置坐标为(x p,y p);
S5、获取各个可能目标图像的距离系数与其所属的类别的概率的乘积并找出所述乘积的最大值,如果该最大值大于第三预设阀值,则将该乘积的最大值对应的可能目标图像作为所述目标图像,记录所述目标图像所属的类别。
可选的,所述深度学习网络模型包括至少2个卷积层和至少2个采样层。
可选的,所述深度学习网络模型依次包括:
第一卷积层、第一下采样层、第二卷积层、第二下采样层、第三卷积层、第三下采样层、第四卷积层、第四下采样层、第五卷积层、第五下采样层、第六卷积层、第六下采样层、第七卷积层、第八卷积层和区域层。
可选的,所述第一卷积层、所述第二卷积层、所述第三卷积层、所述第四卷积层、所述第五卷积层和所述第六卷积层中,后一个卷积层的滤波器数量是前一个卷积层滤波器数量的2倍,所述第六卷积层和所述第七卷积层的滤波器数量相等;
所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层和所述第五下采样层的窗口尺寸为2*2像素,跳跃间隔为2,所述第六下采样层的窗口尺寸为2*2像素,跳跃间隔为1。
可选的,所述第一卷积层的滤波器数量为4,所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层、所述第五下采样层和所述第六下采样层均采用最大值下采样法。
可选的,各个所述卷积层均使用3*3像素的滤波器。
可选的,所述区域图片大小为288*288像素,利用所述深度学习网络模型共获得9*9*5个所述候选外接框。
第二方面,本发明实施例提供了一种无人机智能跟随目标确定方法,所述方法包括:
所述无人机获取图像;
所述无人机根据用户的点击操作获取感兴趣的区域图片;
所述无人机加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;
如果存在目标图像,则跟随所述目标。
可选的,所述方法还包括:如果不存在目标图像,所述无人机发送指令至电子设备,所述指令用于提示所述用户所述图像内无感兴趣目标。
可选的,所述指令还用于提示所述用户重新点击所述图像以重新选择感兴趣目标。
可选的,所述依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像,包括:
S1:针对每一个所述预设类别,获取属于该预设类别概率最大的目标对应的候选外接框,分别计算其他各个候选外接框与该候选外接框的重合率,将重合率大于第一预设阀值的候选外接框内的目标属于该预设类别的概率置为零;
S2:针对其他各个预设类别,重复步骤S1;
S3:针对执行完步骤S2后剩下的每个候选外接框,获取该候选外接框内的目标属于各个预设类别的概率中概率最大的预设类别作为该候选外接框内的目标所属的类别,并将最大概率大于第二预设阈值的目标作为可能目标图像;
S4:计算所述可能目标图像与所述用户点击位置的距离系数,则所述距离系数δ表示为:
Figure PCTCN2018078582-appb-000002
其中,可能目标图像的坐标为(x o,y o),点击位置坐标为(x p,y p);
S5、获取各个可能目标图像的距离系数与其所属的类别的概率的乘积并找出所述乘积的最大值,如果该最大值大于第三预设阀值,则将该乘积的最大值对应的可能目标图像作为所述目标图像,记录所述目标图像所属的类别。
可选的,所述深度学习网络模型包括至少2个卷积层和至少2个采样层。
可选的,所述深度学习网络模型依次包括:
第一卷积层、第一下采样层、第二卷积层、第二下采样层、第三卷积层、第三下采样层、第四卷积层、第四下采样层、第五卷积层、第五下采样层、第六卷积层、第六下采样层、第七卷积层、第八卷积层和区域层。
可选的,所述第一卷积层、所述第二卷积层、所述第三卷积层、所述第四卷积层、所述第五卷积层和所述第六卷积层中,后一个卷积层的滤波器数量是前一个卷积层滤波器数量的2倍,所述第六卷积层和所述第七卷积层的滤波器数量相等;
所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层和所述第五下采样层的窗口尺寸为2*2像素,跳跃间隔为2,所述第六下采样层的窗口尺寸为2*2像素,跳跃间隔为1。
可选的,所述第一卷积层的滤波器数量为4,所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层、所述第五下采样层和所述第六下采样层均采用最大值下采样法。
可选的,各个所述卷积层均使用3*3像素的滤波器。
可选的,所述区域图片大小为288*288像素,利用所述深度学习网络模型共获得9*9*5个所述候选外接框。
第三方面,本发明实施例提供了一种遥控器,包括:
操作杆;
信号接收器,用于接收无人机传回的图像;
信号发送器,用于发送指令给所述无人机;
显示屏;以及
处理器;
其中,处理器用于:
根据用户对所述无人机传回的图像的点击获得感兴趣的区域图片;
加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;
如果存在目标图像,则通过所述信号发送器发送目标跟随命令给所述无人机。
可选的,如果不存在目标图像,所述显示屏显示所述图像内无感兴趣的目标的提示。
可选的,如果不存在目标图像,所述显示屏显示重新点击所述图像以重新选择感兴趣目标的提示。
可选的,所述处理器还用于执行第一方面中任一所述的方法。
第四方面,本发明实施例提供了一种无人机,包括机身、与所述机身相连的机臂、设于所述机臂的动力装置、用于获取图像的图像传感器、设于所述机身内的处理器和信号发送器,所述处理器用于:
根据用户的点击操作获取感兴趣的区域图片;
加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像,如果存在目标图像,则控制所述无人机跟随所述目标。
可选的,所述方法还包括:如果不存在目标图像,所述无人机通过所述信号发送器发送指令至电子设备,所述指令用于提示所述用户所述图像内无感兴趣目标。
可选的,所述指令还用于提示所述用户重新点击所述图像以重新选择感兴趣目标。
可选的,所述处理器还用于执行第二方面中任一项所述的方法。
第五方面,本发明实施例提供了一种非易失性计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,当所述计算机可执行指令被电子设备执行时,使所述电子设备第一方面的任一项所述的方法。
第六方面,本发明实施例提供了一种非易失性计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,当所述计算机可执行指令被无人机执行时,使所述无人机执行第二方面任一项所述的方法。
第七方面,本发明实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被电子设备执行时,使所述电子设备执行第一方面所述的方法。
第八方面,本发明实施例提供了一种计算机程序产品,所述计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被无人机执行时,使所述无人机执行第二方面所述的方法。
本发明实施例的有益效果是:本发明实施例根据用户的点击位置在原始图像上获取感兴趣的区域图片,并将所述感兴趣的区域图片输入深度学习网络模型进行目标预测,计算量小、运算时间短、对硬件设备要求低。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示为类似的元件,除非有特别申明,附图中的图不构成比例 限制。
图1是本发明实施例提供的目标确定方法和装置的应用场景示意图;
图2是电子设备执行的本发明目标确定方法的一个实施例的流程示意图;
图3是本发明目标确定方法的一个实施例的过程示意图;
图4是本发明目标确定方法的一个实施例中对候选外接框进行去重处理步骤的示意图;
图5是本发明实施例中基于深度学习算法的网络结构的示意图;
图6是无人机执行的本发明目标确定方法的一个实施例的流程示意图;
图7是本发明目标确定装置的一个实施例的结构示意图;
图8是本发明目标确定装置的一个实施例的结构示意图;
图9是本发明实施例提供的无人机的硬件结构示意图;
图10是本发明实施例提供的电子设备的硬件结构示意图;
图11是本发明实施例提供的遥控器的硬件结构示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明实施例提供的无人机智能跟随目标确定方法和装置适用于如图1所示的应用场景。所述应用场景包括无人机10、电子设备20和用户30。所述无人机10可以是任何合适类型的高空或者低空飞行器,包括典型的四轴飞行器、可悬停的遥控直升机或者具有一定移动速度的固定翼飞行器等。电子设备20可以是例如遥控器、智能手机、平板电脑、个人电脑、手提电脑等。用户30可以通过任何合适类型的、一种或者多种用户交互设备与电子设备20交互,这些用户交互设备可以是鼠标、按键、触摸屏等。无人机10和电子设备20可以通过分别设置在 其内部的无线通信模块建立通信连接,上传或者下发数据/指令。
无人机10可以实现对目标的跟踪,所述目标例如是特定的人、车、船或者动物等,为了跟踪目标,所述无人机10需要首先确定目标。无人机10上设置有至少一个图像采集装置,例如高清摄像头或者运动摄像机等,用以完成图像的拍摄。无人机10将所述图像通过无线网络回传到电子设备20上,电子设备20将所述图像在电子设备20的屏幕上显示。用户30可以对所述图像进行操作,例如点击所述图像上的某一感兴趣的目标,电子设备20依据用户30的点击操作以确定所述图像中感兴趣目标的位置。
可以基于深度学习网络模型对拍摄图像中的目标进行识别和确认,如果对整个原始图像进行图像识别,计算量较大。而根据用户30的点击位置在原始图像上获取用户30感兴趣的区域图片,然后针对该用户30感兴趣的区域图片进行图像识别,相对计算量小,识别速度快。可以由电子设备20从原始图像中根据用户30的点击位置获得用户30感兴趣的区域图片,然后再针对该用户30感兴趣的区域图片基于深度学习算法进行目标识别,获得目标图像。本实施例中,将深度学习网络模型加载在电子设备20端,在电子设备20上完成目标识别和确认,不会占用无人机10的计算资源,在无人机10未增加任何硬件成本的条件下,为无人机10增加了新的功能。在其他可能的实施例中,深度学习网络模型也可以加载在无人机10端,电子设备20将用户30感兴趣的区域图片和用户30的点击位置发送给无人机10,由无人机10对该用户30感兴趣的区域图片基于深度学习算法进行目标识别,获得目标图像。电子设备20也可以仅将用户30的点击位置发送给无人机10,由无人机10根据该点击位置在原始图像上获得感兴趣的区域图片,并基于该感兴趣的区域图片进行识别。
图2为本发明实施例提供的一种无人机智能跟随目标确定方法的流程示意图,所述方法可以由图1中的电子设备20执行,如图2所示,所述方法包括:
101:电子设备20获取无人机10传回的图像;
无人机10拍摄图像后会将所述图像发送给电子设备20,电子设备 20接收到无人机10回传的图像后,可以在电子设备20的屏幕上显示所述图像。
102:电子设备20根据用户30对所述无人机10传回的图像的点击操作获得感兴趣的区域图片。
用户30可以点击电子设备20屏幕上无人机10传回的图像,以确定需要跟踪的目标,电子设备20可以根据用户30的点击位置获取用户30感兴趣的区域图片。例如,根据用户30在屏幕上点击的位置坐标(x m,y m),确定所述点击位置在所述图像上对应的坐标(x p,y p),并根据该坐标(x p,y p)对原图像进行裁剪获得感兴趣的区域图片。一般情况下,无人机10传回图片为1280*720像素,感兴趣的区域图片可以是以坐标(x p,y p)为中心的288*288像素的区域。请参照图3,步骤(1)-(2)示出了感兴趣的区域图片的获取过程,其中,图中“+”标记表示用户30的点击位置,虚线框框住的部分为获得的感兴趣的区域图片。
103:电子设备20加载深度学习网络模型并将获取的所述区域图片输入深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率。
在进行目标预测之前,可以预先获取基于深度学习的网络模型,具体包括:
将多个图片以及所述图片对应的预设类别作为输入,基于深度学习算法进行模型训练,获得基于深度学习的网络模型以及网络模型中的各权重参数。其中,所述预设类别例如:人;微型、小型、中型汽车等;公交车、卡车等;农用车、三轮车、拖拉机等;自行车、摩托车等骑行目标;船等水上目标;无人机等飞行目标;猫、狗等常见宠物;其他动物;其他显著目标等。预设类别的个数可以为任意数量,例如10个。
请参照图3,图3中的(3)-(4)步骤示出了将感兴趣的区域图片基于深度学习网络模型进行预测的过程。
将所述区域图片输入深度学习网络模型,将获得框选该区域图片中所有目标的多个候选外接框,每个候选外接框内的目标具有对应各个预 设类别的概率。例如,将288*288像素的感兴趣的区域图片输入如图5所示的基于深度学习的网络模型进行预测,将输出9*9*5*15个预测结果。其中,9*9*5表示候选外接框的个数,“5”由训练样本均值聚类获得,“15”代表每个候选外接框的参数,分别为4个位置参数(包括坐标和长宽)、10个所属预设类别的概率和1个是否是目标的概率参数。405个侯选外接框提供了足够的数量供从中选出最优的目标图像的最小外接框。将输入图像设定为288x288的分辨率在保证识别精度的基础上提高了识别速度。
可选的,在所述方法的某些实施例中,所述基于深度学习的网络模型至少包括2个卷积层和至少2个采样层。具体的,可以采用图5中所示的深度学习网络模型300,包括15层网络结构,15层的网络结构既避免层数过高过度拟合,又避免了层数过低精度不够,优化了深度学习网络结构。所述深度学习网络模型依次包括:
第一卷积层、第一下采样层、第二卷积层、第二下采样层、第三卷积层、第三下采样层、第四卷积层、第四下采样层、第五卷积层、第五下采样层、第六卷积层、第六下采样层、第七卷积层、第八卷积层和区域层。
其中,可选的,除第八卷积层外,后一个卷积层的滤波器数量是前一个卷积层滤波器数量的2倍,第七卷积层和第八卷积层滤波器数量相等。请参照图5所示,如果第一个卷积层的滤波器数量为4,则后面的卷积层的滤波器数量依次为8、16、32、64、128、256、256。
第一下采样层、第二下采样层、第三下采样层、第四下采样层和第五下采样层的窗口尺寸为2*2像素,跳跃间隔为2,所述第六下采样层的窗口尺寸为2*2像素,跳跃间隔为1。
其中,可选的,各个卷积层可以采用3*3像素的滤波器,3*3像素的滤波器计算量较小。
其中,可选的,所述第一下采样层、第二下采样层、第三下采样层、第四下采样层、第五下采样层和第六下采样层可以采用最大值下采样法。
从第一个卷积层的4个滤波器开始,后一个卷积层滤波器的数量是 前一个卷积层滤波器数量的2倍(最后一个卷积层除外),每经过一次卷积层,特征数量翻倍。下采样层的窗口尺寸是2x2,跳跃间隔为2(最后一个下采样层除外),每经过一次下采样层,特征分辨率减半。这样设置可以将分辨率和特征数量的变化结合起来,分辨率的减少对应了特征数量的增加。
104:依据所述候选外接框及所述候选外接框属于预设类别的概率判断所述区域图片内是否存在目标图像;如果存在目标图像,则发送目标跟随命令给所述无人机10。
可选的,在所述方法的其他实施例中,如果不存在目标图像,电子设备20提示用户30所述图像内无感兴趣的目标。电子设备20还可以进一步提示用户重新点击所述图像以重新选择感兴趣目标。
请参照图3,图3的(5)-(6)-(7)步骤示出了确认目标图像的过程。确认所述区域图片内是否存在目标图像,具体包括以下步骤:
步骤1:针对每一个所述预设类别,获取属于该预设类别概率最大的目标对应的候选外接框,分别计算其他各个候选外接框与该概率最大的候选外接框的重合率,将重合率大于第一预设阀值的候选外接框内的目标属于该预设类别的概率置为零。
例如,如果具有i个候选外接框,预设类别的数量为j个,假设第i个候选外接框属于第j个预设类别的概率是P ij。那么针对每一个预设类别j,根据P ij进行排序,将P ij大的候选外接框排在前面,P ij小的候选外接框排在后面。请参照图4所示,对于同一个预设类别j,依次计算P ij最大的候选外接框后面的其他候选外接框与P ij最大的候选外接框的重合率IOU,如果重合率IOU大于第一预设阈值θ 1,则将后面的候选外接框属于该预设类别j的概率置为0。其中,重合率IOU表征两个候选外接框重合的程度,重合率越大,说明两个候选外接框越相似,当两个候选外接框的重合率大于第一预设阀值θ 1时,说明两个候选外接框相似度较高。为了简化计算,去掉概率较小的候选外接框,第一预设阀值θ 1可以根据实际应用情况设置。重合率IOU可以采用公式(1)。
Figure PCTCN2018078582-appb-000003
其中,S 1和S 2分别代表两个候选外接框的面积,S 12为两者重合部分的面积。
步骤2:针对其他各个预设类别,重复步骤1。
即针对其余各个预设类别均进行上述步骤1的去重处理。
步骤3:针对执行完步骤2后剩下的每个候选外接框,获取该候选外接框内的目标属于各个预设类别的概率中概率最大的预设类别作为该候选外接框内的目标所属的类别,并将最大概率大于第二预设阈值θ 2的目标作为可能目标图像。
对经过步骤1和步骤2去重处理之后剩下的各个候选外接框,依次计算每一个候选外接框内的目标属于各个预设类别的概率的最大值P i=max(P ij),并记录对应的预设类别j。选取P i值大于第二预设阀值θ 2的目标作为可能目标图像,并记录其对应的预设类别j作为所述可能目标图像所属的类别。
其中,概率值P i表征目标属于其所属类别的可能性,P i值越大,则其属于该类别的可能性越大,如果P i值大于第二预设阀值θ 2,说明候选外接框属于其所属类别的可能性较大,为了进一步简化计算,去除P i小于第二预设阀值θ 2的目标对应的候选外接框,第二预设阀值θ 2的值可以根据实际应用情况设置。
步骤4:计算所述可能目标图像与所述用户30点击位置的距离系数,则所述距离系数δ表示为:
Figure PCTCN2018078582-appb-000004
其中,可能目标图像的坐标为(x o,y o),点击位置坐标为(x p,y p)。
距离系数δ表征候选外接框距离用户30点击位置的远近,δ越大,候选外接框离用户30点击位置越近,δ越小,候选外接框离用户30点击位置越远。
该距离系数计算公式可以将各个同类目标区分开来,即使用户30的点击位置发生在目标图像外,仍然可以将目标准确框出。
步骤5:获取各个可能目标图像的距离系数与其所属的类别的概率的乘积ε i并找出所述乘积的最大值max(ε i),如果该最大值max(ε i)大于第三预设阀值θ 3,则将该乘积的最大值max(ε i)对应的可能目标图像作为目标图像,记录所述目标图像所属的类别。
即计算各个可能目标图像的判定值ε i=P iδ i,并获取各个可能目标图像中判定值的最大值ε=max(ε i),如果ε大于第三预设阀值θ 3,则将ε对应的可能目标图像作为目标图像,记录所述目标图像所属的类别。如果ε不大于第三预设阀值θ 3,则说明用户30点击的位置附近没有用户30需要跟踪的目标,可以通过电子设备20端发送提示,请用户30重新选择目标。
其中,判定值ε表征可能目标图像距离用户30点击位置的远近和属于其所属类别的可能性,ε值越大,则其距离用户30点击位置越近,属于其所属类别的可能性越大,如果ε值大于第三预设阀值θ 3,说明该可能目标图像距离用户30点击位置比较近,属于其所属类别的可能性较大,可以将该可能目标图像作为目标图像,第三预设阀值θ 3的值可以根据实际应用情况设置。
可选的,在所述方法的其他实施例中,确认出目标图像及目标图像所属的类别后,可以根据目标图像的类别调整飞行策略。例如目标是车辆这类的快速移动的大目标,无人机10需要提高自身飞行高度和飞行速度,以获得更大的视野以及跟踪速度;而如果目标是人这样的小目标,无人机10需要降低高度减小速度,保证目标在视野中不因过小而丢失。
本发明实施例根据用户30的点击位置在原始图像上获取感兴趣的区域图片,并将所述感兴趣的区域图片作为输入,基于深度学习的网络模型进行目标预测,计算量小、运算时间短、对硬件设备要求低。
本发明实施例还提供了另一无人机智能跟随目标确定方法,可以由图1中的无人机10执行。如图6所示,所述方法包括:
201:无人机10获取图像。
无人机10通过图像采集装置采集图像。
202:无人机10根据用户30的点击操作获取感兴趣的区域图片。
无人机10将采集的原始图像通过无线网络传回到电子设备20,根据用户30对原始图像的点击操作可以获得用户30感兴趣的区域图片。可以由电子设备20根据用户30的点击操作获取用户30感兴趣的区域图片,再将用户30感兴趣的区域图片传回给无人机10。或者,电子设备20仅将用户30的点击位置发送给无人机10,由无人机10根据用户30的点击位置,从原始图像中获得用户30感兴趣的区域图片。
203:无人机10加载深度学习网络模型并将获取的所述区域图片输入深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率。
204:依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;如果存在目标图像,则跟随所述目标。
可选的,在所述方法的其他实施例中,如果不存在目标图像,无人机10发送指令至电子设备20,该指令用于提示用户30所述图像内无感兴趣的目标。所述指令还可以进一步用于提示用户重新点击所述图像以重新选择感兴趣目标。
其中,关于步骤203和步骤204的技术细节请分别参照步骤103和步骤104的描述,在此不再赘述。可选的,在上述方法的某些实施例中,所述基于深度学习的网络模型包括至少2个卷积层和至少2个采样层。具体的,可以采用图5中所示的深度学习网络模型300,该深度学习网络模型的具体结构和技术细节请参照上述关于基于深度学习的网络模型的介绍,在此亦不再赘述。
本发明实施例通过获取用户30感兴趣的区域图片,并将该感兴趣的区域图片作为基于深度学习的网络模型的输入,进行目标预测,计算量小、运算时间短、对硬件设备要求低。
相应的,如图7所示,本发明实施例还提供了一种无人机智能跟随 目标确定装置,用于电子设备20,所述装置300包括:
图像获取模块301,用于获取无人机10传回的图像;
图像处理模块302,用于根据用户30对所述无人机10传回的图像的点击获得感兴趣的区域图片;
图像预测模块303,用于加载深度学习网络模型并将获取的所述区域图片输入深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
目标图像确认模块304,用于依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;如果存在目标图像,则发送目标跟随命令给所述无人机。
可选的,在所述装置的其他实施例中,目标图像确认模块304还用于如果不存在目标图像,提示用户所述图像内无感兴趣的目标。目标图像确认模块304还可以进一步提示用户重新点击所述图像以重新选择感兴趣目标。
本发明实施例根据用户30的点击位置在原始图像上获取感兴趣的区域图片,并将所述感兴趣的区域图片作为输入,基于深度学习的网络模型进行目标预测,计算量小、运算时间短、对硬件设备要求低。
可选的,在所述装置的某些实施例中,所述目标图像确认模块304具体用于:
S1:针对每一个所述预设类别,获取属于该预设类别概率最大的目标对应的候选外接框,分别计算其他各个候选外接框与该候选外接框的重合率,将重合率大于第一预设阀值的候选外接框内的目标属于该预设类别的概率置为零;
S2:针对其他各个预设类别,重复步骤S1;
S3:针对执行完步骤S2后剩下的每个候选外接框,获取该候选外接框内的目标属于各个预设类别的概率中概率最大的预设类别作为该候选外接框内的目标所属的类别,并将最大概率大于第二预设阈值θ 2的 目标作为可能目标图像;
S4:计算所述可能目标图像与所述用户30点击位置的距离系数,则所述距离系数δ表示为:
Figure PCTCN2018078582-appb-000005
其中,可能目标图像的坐标为(x o,y o),点击位置坐标为(x p,y p);
S5、获取各个可能目标图像的距离系数与其所属的类别的概率的乘积ε i并找出所述乘积的最大值max(ε i),如果该最大值max(ε i)大于第三预设阀值θ 3,则将该乘积的最大值max(ε i)对应的可能目标图像作为所述目标图像,记录所述目标图像所属的类别。
可选的,在所述装置的某些实施例中,所述深度学习网络模型至少包括2个卷积层和至少2个采样层。具体的,所述深度学习网络模型包括:
第一卷积层、第一下采样层、第二卷积层、第二下采样层、第三卷积层、第三下采样层、第四卷积层、第四下采样层、第五卷积层、第五下采样层、第六卷积层、第六下采样层、第七卷积层、第八卷积层和区域层。
其中,可选的,第一卷积层、第二卷积层、第三卷积层、第四卷积层、第五卷积层和第六卷积层中,后一个卷积层的滤波器数量是前一个卷积层滤波器数量的2倍,第六卷积层和第七卷积层的滤波器数量相等;
第一下采样层、第二下采样层、第三下采样层、第四下采样层和第五下采样层的窗口尺寸为2*2像素,跳跃间隔为2,所述第六下采样层的窗口尺寸为2*2像素,跳跃间隔为1。
其中,可选的,所述第一卷积层的滤波器数量为4,所述第一下采样层、第二下采样层、第三下采样层、第四下采样层、第五下采样层和第六下采样层均采用最大值下采样法。
可选的,各个所述卷积层均使用3*3像素的滤波器。
可选的,在所述装置的某些实施例中,所述区域图片大小为288*288 像素,利用所述深度学习网络模型共获得9*9*5个候选外接框。
相应的,如图8所示,本发明实施例还提供了一种无人机智能跟随目标确定装置,用于无人机10,所述装置400包括:
图像采集模块401,用于获取图像;
第二图像处理模块402,用于根据用户30的点击操作获取感兴趣的区域图片;
图像预测模块303,用于加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
目标图像确认模块304,用于依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;如果存在目标图像,则跟随所述目标。
可选的,在所述装置的其他实施例中,如果不存在目标图像,无人机10发送指令至电子设备20,该指令用于提示用户30所述图像内无感兴趣的目标。所述指令还可以进一步用于提示用户重新点击所述图像以重新选择感兴趣目标。
其中,关于图像预测模块303和目标图像确认模块304的技术细节请分别参照无人机智能跟随目标确定装置300中的图像预测模块303和目标图像确认模块304,在此不再赘述。可选的,在上述装置的某些实施例中,所述基于深度学习的网络模型包括至少2个卷积层和至少2个采样层。具体的,可以采用图5中所示的深度学习网络模型300,该深度学习网络模型的具体结构和技术细节请参照上述关于基于深度学习的网络模型的介绍,在此亦不再赘述。
本发明实施例通过获取用户30感兴趣的区域图片,并将该感兴趣的区域图片作为基于深度学习的网络模型的输入,进行目标预测,计算量小、运算时间短、对硬件设备要求低。
需要说明的是,上述装置可执行本申请实施例所提供的方法,具备 执行方法相应的功能模块和有益效果。未在装置实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
图9是本发明实施例提供的无人机10的硬件结构示意图,如图9所示,无人机10包括:机身14、与所述机身14相连的机臂15、设于所述机臂的动力装置17、用于获取图像的图像传感器16、设于机身14内的处理器11、信号发送器13和内置或者外置于无人机10的存储器12(图9中以存储器12内置于无人机10中为例)。
其中,处理器11和存储器12可以通过总线或者其他方式连接。
存储器12作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本发明实施例中的目标确定方法对应的程序指令/单元(例如,附图8所示的图像采集模块401、第二图像处理模块402、图像预测模块303和目标图像确认模块304)。处理器11通过运行存储在存储器12中的非易失性软件程序、指令以及单元,从而执行无人机10的各种功能应用以及数据处理,即实现上述方法实施例的目标确定方法。
存储器12可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据用户终端设备使用所创建的数据等。此外,存储器12可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器12可选包括相对于处理器11远程设置的存储器,这些远程存储器可以通过网络连接至无人机10。
所述一个或者多个模块存储在所述存储器12中,当被所述一个或者多个处理器11执行时,执行上述任意方法实施例中的目标确定方法,例如,执行以上描述的图6中的方法步骤201至步骤204,实现图8中的图像采集模块401、第二图像处理模块402、图像预测模块303和目标图像确认模块304的功能。
其中,如果无人机10利用所述目标确定方法判断存在目标图像,则跟随所述目标。可选的,如果不存在目标图像,无人机10发送指令 至电子设备20,所述指令用于提示所述用户所述图像内无感兴趣目标。所述指令还可以进一步用于提示所述用户重新点击所述图像以重新选择感兴趣目标。
上述无人机10可执行本发明实施例所提供的目标确定方法,具备执行方法相应的功能模块和有益效果。未在无人机10实施例中详尽描述的技术细节,可参见本发明实施例所提供的目标确定方法。
本发明实施例提供了一种非易失性计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,例如,执行以上描述的图6中的方法步骤201至步骤204,实现图8中的图像采集模块401、第二图像处理模块402、图像预测模块303和目标图像确认模块304的功能。
图10是本发明实施例提供的电子设备20的硬件结构示意图,如图10所示,该电子设备20包括:
一个或多个处理器21以及存储器22,图10中以一个处理器21为例。
处理器21和存储器22可以通过总线或者其他方式连接,图10中以通过总线连接为例。
存储器22作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本发明实施例中的目标确定方法对应的程序指令/单元(例如,附图7所示的图像获取模块301、图像处理模块302、图像预测模块303和目标图像确认模块304)。处理器21通过运行存储在存储器22中的非易失性软件程序、指令以及单元,从而执行电子设备20的各种功能应用以及数据处理,即实现上述方法实施例的目标确定方法。
存储器22可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据电子设备20使用所创建的数据等。此外,存储器22可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存 储器22可选包括相对于处理器21远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个单元存储在所述存储器22中,当被所述一个或者多个处理器21执行时,执行上述任意方法实施例中的目标确定方法,例如,执行以上描述的图2中的方法步骤101-104,实现图7所示的图像获取模块301、图像处理模块302、图像预测模块303和目标图像确认模块304的功能。
上述电子设备20可执行本发明实施例所提供的目标确定方法,具备执行方法相应的功能模块和有益效果。未在电子设备20实施例中详尽描述的技术细节,可参见本发明实施例所提供的目标确定方法。
本申请实施例的电子设备20以多种形式存在,包括但不限于:
(1)遥控器。
(2)移动通信设备:这类设备的特点是具备移动通信功能,并且以提供话音、数据通信为主要目标。这类终端包括:智能手机(例如iPhone)、多媒体手机、功能性手机,以及低端手机等。
(3)超移动个人计算机设备:这类设备属于个人计算机的范畴,有计算和处理功能,一般也具备移动上网特性。这类终端包括:PDA、MID和UMPC设备等,例如iPad。
(4)便携式娱乐设备:这类设备可以显示和播放多媒体内容。该类设备包括:音频、视频播放器(例如iPod),掌上游戏机,电子书,以及智能玩具和便携式车载导航设备。
(5)服务器:提供计算服务的设备,服务器的构成包括处理器、硬盘、内存、系统总线等,服务器和通用的计算机架构类似,但是由于需要提供高可靠的服务,因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。
其中,电子设备20可以为图11所示的遥控器,所述遥控器除包括上述的处理器21和存储器22外,还包括操作杆25、信号接收器26、信号发送器23和显示屏24,其中,信号接收器26用于接收无人机10传回的图像,信号发送器23用于发送指令给所述无人机10。
其中,如果遥控器利用所述目标确定方法判断存在目标图像,则通过信号发送器23发送目标跟随命令给无人机10。可选的,如果不存在目标图像,显示屏24显示所述图像内无感兴趣的目标的提示,显示屏24还可以进一步显示重新点击所述图像以重新选择感兴趣目标的提示。
本发明实施例还提供了一种非易失性计算机可读存储介质,所述计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,例如,执行以上描述的图2中的方法步骤101-104,实现图7所示的图像获取模块301、图像处理模块302、图像预测模块303和目标图像确认模块304的功能。
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施例的描述,本领域普通技术人员可以清楚地了解到各实施例可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(RandomAccessMemory,RAM)等。
最后应说明的是:以上实施例仅用以说明本发明的技术方案,而非对其限制;在本发明的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本发明的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使 相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (30)

  1. 一种无人机智能跟随目标确定方法,用于电子设备端,其特征在于,所述方法包括:
    所述电子设备获取无人机传回的图像;
    所述电子设备根据用户对所述无人机传回的图像的点击获得感兴趣的区域图片;
    所述电子设备加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
    依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;
    如果存在目标图像,则发送目标跟随命令给所述无人机。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    如果不存在目标图像,所述电子设备提示用户所述图像内无感兴趣的目标。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    如果不存在目标图像,所述电子设备提示用户重新点击所述图像以重新选择感兴趣目标。
  4. 根据权利要求1-3中任一所述的方法,其特征在于,所述依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像,包括:
    S1:针对每一个所述预设类别,获取属于该预设类别概率最大的目标对应的候选外接框,分别计算其他各个候选外接框与该候选外接框的重合率,将重合率大于第一预设阀值的候选外接框内的目标属于该预设类别的概率置为零;
    S2:针对其他各个预设类别,重复步骤S1;
    S3:针对执行完步骤S2后剩下的每个候选外接框,获取该候选外接框内的目标属于各个预设类别的概率中概率最大的预设类别作为该候选外接框内的目标所属的类别,并将最大概率大于第二预设阈值的目标作为可能目标图像;
    S4:计算所述可能目标图像与所述用户点击位置的距离系数,则所述距离系数δ表示为:
    Figure PCTCN2018078582-appb-100001
    其中,可能目标图像的坐标为(x o,y o),点击位置坐标为(x p,y p);
    S5、获取各个可能目标图像的距离系数与其所属的类别的概率的乘积并找出所述乘积的最大值,如果该最大值大于第三预设阀值,则将该乘积的最大值对应的可能目标图像作为所述目标图像,记录所述目标图像所属的类别。
  5. 根据权利要求1-4中任一所述的方法,其特征在于,所述深度学习网络模型包括至少2个卷积层和至少2个采样层。
  6. 根据权利要求1-5中任一所述的方法,其特征在于,所述深度学习网络模型依次包括:
    第一卷积层、第一下采样层、第二卷积层、第二下采样层、第三卷积层、第三下采样层、第四卷积层、第四下采样层、第五卷积层、第五下采样层、第六卷积层、第六下采样层、第七卷积层、第八卷积层和区域层。
  7. 根据权利要求6所述的方法,其特征在于,所述第一卷积层、所述第二卷积层、所述第三卷积层、所述第四卷积层、所述第五卷积层和所述第六卷积层中,后一个卷积层的滤波器数量是前一个卷积层滤波器数量的2倍,所述第六卷积层和所述第七卷积层的滤波器数量相等;
    所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层和所述第五下采样层的窗口尺寸为2*2像素,跳跃间隔为2,所述第六下采样层的窗口尺寸为2*2像素,跳跃间隔为1。
  8. 根据权利要求6或7所述的方法,其特征在于,所述第一卷积层的滤波器数量为4,所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层、所述第五下采样层和所述第六下采样层均采用最大值下采样法。
  9. 根据权利要求5-8中任一所述的方法,其特征在于,各个所述卷积层均使用3*3像素的滤波器。
  10. 根据权利要求1-9中任一所述的方法,其特征在于,所述区域图片大小为288*288像素,利用所述深度学习网络模型共获得9*9*5个所述候选外接框。
  11. 一种无人机智能跟随目标确定方法,应用于无人机,其特征在于,所述方法包括:
    所述无人机获取图像;
    所述无人机根据用户的点击操作获取感兴趣的区域图片;
    所述无人机加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
    依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;
    如果存在目标图像,则跟随所述目标。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    如果不存在目标图像,所述无人机发送指令至电子设备,所述指令用于提示所述用户所述图像内无感兴趣目标。
  13. 根据权利要求11或12所述的方法,其特征在于,所述指令还用于提示所述用户重新点击所述图像以重新选择感兴趣目标。
  14. 根据权利要求11-13中任一所述的方法,其特征在于,所述依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像,包括:
    S1:针对每一个所述预设类别,获取属于该预设类别概率最大的目标对应的候选外接框,分别计算其他各个候选外接框与该候选外接框的重合率,将重合率大于第一预设阀值的候选外接框内的目标属于该预设类别的概率置为零;
    S2:针对其他各个预设类别,重复步骤S1;
    S3:针对执行完步骤S2后剩下的每个候选外接框,获取该候选外接框内的目标属于各个预设类别的概率中概率最大的预设类别作为该候选外接框内的目标所属的类别,并将最大概率大于第二预设阈值的目标作为可能目标图像;
    S4:计算所述可能目标图像与所述用户点击位置的距离系数,则所述距离系数δ表示为:
    Figure PCTCN2018078582-appb-100002
    其中,可能目标图像的坐标为(x o,y o),点击位置坐标为(x p,y p);
    S5、获取各个可能目标图像的距离系数与其所属的类别的概率的乘积并找出所述乘积的最大值,如果该最大值大于第三预设阀值,则将该乘积的最大值对应的可能目标图像作为所述目标图像,记录所述目标图像所属的类别。
  15. 根据权利要求11-14中任一所述的方法,其特征在于,所述深度学习网络模型包括至少2个卷积层和至少2个采样层。
  16. 根据权利要求15所述的方法,其特征在于,所述深度学习网络模型依次包括:
    第一卷积层、第一下采样层、第二卷积层、第二下采样层、第三卷积层、第三下采样层、第四卷积层、第四下采样层、第五卷积层、第五下采样层、第六卷积层、第六下采样层、第七卷积层、第八卷积层和区域层。
  17. 根据权利要求16所述的方法,其特征在于,所述第一卷积层、所述第二卷积层、所述第三卷积层、所述第四卷积层、所述第五卷积层 和所述第六卷积层中,后一个卷积层的滤波器数量是前一个卷积层滤波器数量的2倍,所述第六卷积层和所述第七卷积层的滤波器数量相等;
    所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层和所述第五下采样层的窗口尺寸为2*2像素,跳跃间隔为2,所述第六下采样层的窗口尺寸为2*2像素,跳跃间隔为1。
  18. 根据权利要求16或17所述的方法,其特征在于,所述第一卷积层的滤波器数量为4,所述第一下采样层、所述第二下采样层、所述第三下采样层、所述第四下采样层、所述第五下采样层和所述第六下采样层均采用最大值下采样法。
  19. 根据权利要求15-18中任一所述的方法,其特征在于,各个所述卷积层均使用3*3像素的滤波器。
  20. 根据权利要求11-19中任一所述的方法,其特征在于,所述区域图片大小为288*288像素,利用所述深度学习网络模型共获得9*9*5个所述候选外接框。
  21. 一种遥控器,其特征在于,包括:
    操作杆;
    信号接收器,用于接收无人机传回的图像;
    信号发送器,用于发送指令给所述无人机;
    显示屏;以及
    处理器;
    其中,处理器用于:
    根据用户对所述无人机传回的图像的点击获得感兴趣的区域图片;
    加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
    依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像;
    如果存在目标图像,则通过所述信号发送器发送目标跟随命令给所 述无人机。
  22. 根据权利要求22所述的遥控器,其特征在于,如果不存在目标图像,所述显示屏显示所述图像内无感兴趣的目标的提示。
  23. 根据权利要求22或23所述的遥控器,其特征在于,如果不存在目标图像,所述显示屏显示重新点击所述图像以重新选择感兴趣目标的提示。
  24. 根据权利要求22-24任一所述的遥控器,其特征在于,所述处理器还用于执行权利要求3-10中任一项所述的方法。
  25. 一种无人机,包括机身、与所述机身相连的机臂、设于所述机臂的动力装置、用于获取图像的图像传感器、设于所述机身内的处理器和信号发送器,其特征在于,所述处理器用于:
    根据用户的点击操作获取感兴趣的区域图片;
    加载深度学习网络模型并将获取的所述区域图片输入所述深度学习网络模型,利用所述深度学习网络模型输出多个框选所述区域图片中目标的候选外接框及所述候选外接框内的目标属于预设类别的概率;
    依据所述候选外接框及所述候选外接框内的目标属于预设类别的概率判断所述区域图片内是否存在目标图像,如果存在目标图像,则控制所述无人机跟随所述目标。
  26. 根据权利要求26所述的无人机,其特征在于,所述方法还包括:
    如果不存在目标图像,所述无人机通过所述信号发送器发送指令至电子设备,所述指令用于提示所述用户所述图像内无感兴趣目标。
  27. 根据权利要求26或27所述的无人机,其特征在于,所述指令还用于提示所述用户重新点击所述图像以重新选择感兴趣目标。
  28. 根据权利要求26-28任一所述的无人机,其特征在于,所述处理器还用于执行权利要求14-21中任一项所述的方法。
  29. 一种非易失性计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,当所述计算机可执行指令被电子设备执行时,使所述电子设备执行权利要求1-10的任一项所述的方 法。
  30. 一种非易失性计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,当所述计算机可执行指令被无人机执行时,使所述无人机执行权利要求11-21任一项所述的方法。
PCT/CN2018/078582 2017-08-18 2018-03-09 无人机智能跟随目标确定方法、无人机和遥控器 WO2019033747A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP18717495.8A EP3471021B1 (en) 2017-08-18 2018-03-09 Method for determining target intelligently followed by unmanned aerial vehicle, unmanned aerial vehicle and remote controller
US15/980,051 US10740607B2 (en) 2017-08-18 2018-05-15 Method for determining target through intelligent following of unmanned aerial vehicle, unmanned aerial vehicle and remote control

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710714275.5A CN109409354B (zh) 2017-08-18 2017-08-18 无人机智能跟随目标确定方法、无人机和遥控器
CN201710714275.5 2017-08-18

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/980,051 Continuation US10740607B2 (en) 2017-08-18 2018-05-15 Method for determining target through intelligent following of unmanned aerial vehicle, unmanned aerial vehicle and remote control

Publications (1)

Publication Number Publication Date
WO2019033747A1 true WO2019033747A1 (zh) 2019-02-21

Family

ID=63490384

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078582 WO2019033747A1 (zh) 2017-08-18 2018-03-09 无人机智能跟随目标确定方法、无人机和遥控器

Country Status (3)

Country Link
EP (1) EP3471021B1 (zh)
CN (2) CN109409354B (zh)
WO (1) WO2019033747A1 (zh)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949381A (zh) * 2019-03-15 2019-06-28 深圳市道通智能航空技术有限公司 图像处理方法、装置、图像处理芯片、摄像组件及飞行器
CN110956157A (zh) * 2019-12-14 2020-04-03 深圳先进技术研究院 基于候选框选择的深度学习遥感影像目标检测方法及装置
CN111241905A (zh) * 2019-11-21 2020-06-05 南京工程学院 基于改进ssd算法的输电线路鸟窝检测方法
CN111461202A (zh) * 2020-03-30 2020-07-28 上海尽星生物科技有限责任公司 甲状腺结节超声图像实时识别方法及装置
CN111476840A (zh) * 2020-05-14 2020-07-31 阿丘机器人科技(苏州)有限公司 目标定位方法、装置、设备及计算机可读存储介质
CN112488066A (zh) * 2020-12-18 2021-03-12 航天时代飞鸿技术有限公司 一种无人机多机协同侦察下的目标实时检测方法
CN112683916A (zh) * 2020-12-17 2021-04-20 华能新能源股份有限公司云南分公司 集电线路杆塔小金具缺失或安装错误的识别方法及装置
CN113591748A (zh) * 2021-08-06 2021-11-02 广东电网有限责任公司 一种航拍绝缘子目标检测方法及装置
CN113850209A (zh) * 2021-09-29 2021-12-28 广州文远知行科技有限公司 一种动态物体检测方法、装置、交通工具及存储介质
CN113949826A (zh) * 2021-09-28 2022-01-18 航天时代飞鸿技术有限公司 一种通信带宽有限条件下无人机集群协同侦察方法及系统
CN114034882A (zh) * 2021-10-28 2022-02-11 广州大学 一种洋流智能检测方法、装置、设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978045A (zh) * 2019-03-20 2019-07-05 深圳市道通智能航空技术有限公司 一种目标跟踪方法、装置和无人机
CN110390261B (zh) * 2019-06-13 2022-06-17 北京汽车集团有限公司 目标检测方法、装置、计算机可读存储介质及电子设备
KR20210009458A (ko) 2019-07-16 2021-01-27 삼성전자주식회사 객체 검출 방법 및 객체 검출 장치
CN110879602B (zh) * 2019-12-06 2023-04-28 安阳全丰航空植保科技股份有限公司 基于深度学习的无人机控制律参数调节方法及系统
CN111401301B (zh) * 2020-04-07 2023-04-18 上海东普信息科技有限公司 人员着装监控方法、装置、设备及存储介质
CN111737604B (zh) * 2020-06-24 2023-07-21 中国银行股份有限公司 一种目标对象的搜索方法及装置
CN114355960B (zh) * 2021-10-25 2023-08-15 中国船舶重工集团公司第七0九研究所 一种无人机防御智能决策方法及系统、服务器及介质
CN114567888B (zh) * 2022-03-04 2023-12-26 国网浙江省电力有限公司台州市黄岩区供电公司 一种多无人机动态部署方法
CN115035425B (zh) * 2022-06-07 2024-02-09 北京庚图科技有限公司 一种基于深度学习的目标识别方法、系统、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156807A (zh) * 2015-04-02 2016-11-23 华中科技大学 卷积神经网络模型的训练方法及装置
CN106228158A (zh) * 2016-07-25 2016-12-14 北京小米移动软件有限公司 图片检测的方法和装置
US20170068246A1 (en) * 2014-07-30 2017-03-09 SZ DJI Technology Co., Ltd Systems and methods for target tracking
CN106709456A (zh) * 2016-12-27 2017-05-24 成都通甲优博科技有限责任公司 基于计算机视觉的无人机目标跟踪框初始化方法
CN106780612A (zh) * 2016-12-29 2017-05-31 浙江大华技术股份有限公司 一种图像中的物体检测方法及装置

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8964298B2 (en) * 2010-02-28 2015-02-24 Microsoft Corporation Video display modification based on sensor input for a see-through near-to-eye display
CN103149939B (zh) * 2013-02-26 2015-10-21 北京航空航天大学 一种基于视觉的无人机动态目标跟踪与定位方法
US9934453B2 (en) * 2014-06-19 2018-04-03 Bae Systems Information And Electronic Systems Integration Inc. Multi-source multi-modal activity recognition in aerial video surveillance
US10127448B2 (en) * 2014-08-27 2018-11-13 Bae Systems Information And Electronic Systems Integration Inc. Method and system for dismount detection in low-resolution UAV imagery
CN104457704B (zh) * 2014-12-05 2016-05-25 北京大学 基于增强地理信息的无人机地面目标定位系统及方法
US20170102699A1 (en) * 2014-12-22 2017-04-13 Intel Corporation Drone control through imagery
CN105957077B (zh) * 2015-04-29 2019-01-15 国网河南省电力公司电力科学研究院 基于视觉显著性分析的输电线路异物检测方法
US9609288B1 (en) * 2015-12-31 2017-03-28 Unmanned Innovation, Inc. Unmanned aerial vehicle rooftop inspection system
CN105701810B (zh) * 2016-01-12 2019-05-17 湖南中航天目测控技术有限公司 一种基于点击式图像分割的无人机航拍图像电子勾绘方法
CA3012049A1 (en) * 2016-01-20 2017-07-27 Ez3D, Llc System and method for structural inspection and construction estimation using an unmanned aerial vehicle
CN105930791B (zh) * 2016-04-19 2019-07-16 重庆邮电大学 基于ds证据理论的多摄像头融合的路面交通标志识别方法
CN106981073B (zh) * 2017-03-31 2019-08-06 中南大学 一种基于无人机的地面运动目标实时跟踪方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170068246A1 (en) * 2014-07-30 2017-03-09 SZ DJI Technology Co., Ltd Systems and methods for target tracking
CN106156807A (zh) * 2015-04-02 2016-11-23 华中科技大学 卷积神经网络模型的训练方法及装置
CN106228158A (zh) * 2016-07-25 2016-12-14 北京小米移动软件有限公司 图片检测的方法和装置
CN106709456A (zh) * 2016-12-27 2017-05-24 成都通甲优博科技有限责任公司 基于计算机视觉的无人机目标跟踪框初始化方法
CN106780612A (zh) * 2016-12-29 2017-05-31 浙江大华技术股份有限公司 一种图像中的物体检测方法及装置

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109949381A (zh) * 2019-03-15 2019-06-28 深圳市道通智能航空技术有限公司 图像处理方法、装置、图像处理芯片、摄像组件及飞行器
CN109949381B (zh) * 2019-03-15 2023-10-24 深圳市道通智能航空技术股份有限公司 图像处理方法、装置、图像处理芯片、摄像组件及飞行器
CN111241905A (zh) * 2019-11-21 2020-06-05 南京工程学院 基于改进ssd算法的输电线路鸟窝检测方法
CN110956157A (zh) * 2019-12-14 2020-04-03 深圳先进技术研究院 基于候选框选择的深度学习遥感影像目标检测方法及装置
CN111461202A (zh) * 2020-03-30 2020-07-28 上海尽星生物科技有限责任公司 甲状腺结节超声图像实时识别方法及装置
CN111461202B (zh) * 2020-03-30 2023-12-05 上海深至信息科技有限公司 甲状腺结节超声图像实时识别方法及装置
CN111476840B (zh) * 2020-05-14 2023-08-22 阿丘机器人科技(苏州)有限公司 目标定位方法、装置、设备及计算机可读存储介质
CN111476840A (zh) * 2020-05-14 2020-07-31 阿丘机器人科技(苏州)有限公司 目标定位方法、装置、设备及计算机可读存储介质
CN112683916A (zh) * 2020-12-17 2021-04-20 华能新能源股份有限公司云南分公司 集电线路杆塔小金具缺失或安装错误的识别方法及装置
CN112488066A (zh) * 2020-12-18 2021-03-12 航天时代飞鸿技术有限公司 一种无人机多机协同侦察下的目标实时检测方法
CN113591748A (zh) * 2021-08-06 2021-11-02 广东电网有限责任公司 一种航拍绝缘子目标检测方法及装置
CN113949826A (zh) * 2021-09-28 2022-01-18 航天时代飞鸿技术有限公司 一种通信带宽有限条件下无人机集群协同侦察方法及系统
CN113850209A (zh) * 2021-09-29 2021-12-28 广州文远知行科技有限公司 一种动态物体检测方法、装置、交通工具及存储介质
CN114034882A (zh) * 2021-10-28 2022-02-11 广州大学 一种洋流智能检测方法、装置、设备及存储介质
CN114034882B (zh) * 2021-10-28 2023-09-26 广州大学 一种洋流智能检测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
EP3471021A1 (en) 2019-04-17
EP3471021B1 (en) 2020-05-27
CN109409354B (zh) 2021-09-21
CN109409354A (zh) 2019-03-01
CN113762252A (zh) 2021-12-07
EP3471021A4 (en) 2019-04-17
CN113762252B (zh) 2023-10-24

Similar Documents

Publication Publication Date Title
WO2019033747A1 (zh) 无人机智能跟随目标确定方法、无人机和遥控器
US10740607B2 (en) Method for determining target through intelligent following of unmanned aerial vehicle, unmanned aerial vehicle and remote control
US11749124B2 (en) User interaction with an autonomous unmanned aerial vehicle
Alwateer et al. Drone services: issues in drones for location-based services from human-drone interaction to information processing
US20220019248A1 (en) Objective-Based Control Of An Autonomous Unmanned Aerial Vehicle
WO2021249071A1 (zh) 一种车道线的检测方法及相关设备
US10671068B1 (en) Shared sensor data across sensor processing pipelines
CN110850877A (zh) 基于虚拟环境和深度双q网络的自动驾驶小车训练方法
WO2022021027A1 (zh) 目标跟踪方法、装置、无人机、系统及可读存储介质
Nahar et al. Autonomous UAV forced graffiti detection and removal system based on machine learning
Martinez‐Alpiste et al. Smartphone‐based object recognition with embedded machine learning intelligence for unmanned aerial vehicles
CN113516227A (zh) 一种基于联邦学习的神经网络训练方法及设备
CN108881846B (zh) 信息融合方法、装置及计算机可读存储介质
US20230028196A1 (en) User-in-the-loop object detection and classification systems and methods
CN116823884A (zh) 多目标跟踪方法、系统、计算机设备及存储介质
CN116343169A (zh) 路径规划方法、目标对象运动控制方法、装置及电子设备
CN116805397A (zh) 用于使用机器学习算法检测和识别图像中的小对象的系统和方法
WO2022021028A1 (zh) 目标检测方法、装置、无人机及计算机可读存储介质
Kristiani et al. Flame and smoke recognition on smart edge using deep learning
EP4250251A1 (en) System and method for detecting and recognizing small objects in images using a machine learning algorithm
US11830247B2 (en) Identifying antenna communication issues using unmanned aerial vehicles
US20230306715A1 (en) System and method for detecting and recognizing small objects in images using a machine learning algorithm
WO2022179281A1 (zh) 场景识别方法及装置
WO2022104746A1 (zh) 返航控制方法、装置、无人机及计算机可读存储介质
Arora et al. A Compendium of Autonomous Navigation Using Object Detection and Tracking in Unmanned Aerial Vehicles

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2018717495

Country of ref document: EP

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18717495

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE