WO2019199967A1 - Systems and methods for gamification of drone behavior using artificial intelligence - Google Patents
Systems and methods for gamification of drone behavior using artificial intelligence Download PDFInfo
- Publication number
- WO2019199967A1 WO2019199967A1 PCT/US2019/026783 US2019026783W WO2019199967A1 WO 2019199967 A1 WO2019199967 A1 WO 2019199967A1 US 2019026783 W US2019026783 W US 2019026783W WO 2019199967 A1 WO2019199967 A1 WO 2019199967A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- drone
- neural network
- image
- data stream
- dnn
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 64
- 238000013473 artificial intelligence Methods 0.000 title description 48
- 238000013528 artificial neural network Methods 0.000 claims abstract description 55
- 230000009471 action Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 9
- 230000008569 process Effects 0.000 description 20
- 238000001514 detection method Methods 0.000 description 16
- 230000011218 segmentation Effects 0.000 description 14
- 230000002452 interceptive effect Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 8
- 239000013598 vector Substances 0.000 description 8
- 230000006399 behavior Effects 0.000 description 7
- 230000003068 static effect Effects 0.000 description 7
- 230000000007 visual effect Effects 0.000 description 7
- 238000012545 processing Methods 0.000 description 6
- 241000282836 Camelus dromedarius Species 0.000 description 5
- 230000008859 change Effects 0.000 description 5
- 230000003993 interaction Effects 0.000 description 5
- 239000000463 material Substances 0.000 description 5
- 241000282472 Canis lupus familiaris Species 0.000 description 4
- 230000008901 benefit Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000003709 image segmentation Methods 0.000 description 4
- 241000282412 Homo Species 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 241001133760 Acoelorraphe Species 0.000 description 2
- 241000219492 Quercus Species 0.000 description 2
- 230000004913 activation Effects 0.000 description 2
- 238000001994 activation Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000003491 array Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 230000001953 sensory effect Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- RZVHIXYEVGDQDX-UHFFFAOYSA-N 9,10-anthraquinone Chemical compound C1=CC=C2C(=O)C3=CC=CC=C3C(=O)C2=C1 RZVHIXYEVGDQDX-UHFFFAOYSA-N 0.000 description 1
- 238000012935 Averaging Methods 0.000 description 1
- 241000282832 Camelidae Species 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000009194 climbing Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000037361 pathway Effects 0.000 description 1
- 230000037081 physical activity Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000946 synaptic effect Effects 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/10—Simultaneous control of position or course in three dimensions
- G05D1/101—Simultaneous control of position or course in three dimensions specially adapted for aircraft
- G05D1/102—Simultaneous control of position or course in three dimensions specially adapted for aircraft specially adapted for vertical take-off of aircraft
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Definitions
- Drones are relatively‘old’ machines, introduced decades ago under the name of Remote Piloted Aircrafts (RPA). Recently, drones have experienced a surge in popularity thanks to improved hardware, compute power, connectivity. Nevertheless, drone usage is still mostly a human-controlled activity, where the human controls the drone’s flight path and interprets and analyzes the content of video or other sensor information collected by the drone.
- RPA Remote Piloted Aircrafts
- AI-controlled drone With the introduction of inexpensive drones able to stream high-definition (HD) data to a controlling device equipped with one or more powerful processors (e.g., a smart phone with multi-core central processor units (CPUs) and graphics processor units (GPUs)) or with onboard computing, it is possible to exploit the use of artificial intelligence (AI) for controlling the drones.
- processors e.g., a smart phone with multi-core central processor units (CPUs) and graphics processor units (GPUs)
- AI artificial intelligence
- One application of an AI-controlled drone is creating an interactive, game-like situation where the AI provides semantic understanding of the environment for the purpose of re-creating the human ability to play a game.
- an AI-controlled drone can be used to mimic a human for playing games and other physical activities.
- Examples of this use of AI-controlled drones include a method of controlling a drone.
- a sensor collects a data stream representing an object in the environment.
- a neural network running on a processor operably coupled to the sensor extracts a convolutional output from the data stream.
- This convolutional output represents features of the object and is used by a a classifier operably coupled to the neural network to classify the object.
- the drone is controlled in response to classifying the object according to pre-defmed logic.
- the senor can be an image sensor on the drone, in which case the data stream includes imagery acquired by the image sensor. Controlling the drone may include following the object with the drone.
- the classifier can be trained to recognize the object, on-the- fly or in real-time. In some cases, the processor may determine that the object has disappeared from the data stream, in which case the neural network and/or the classifier may automatically recognize a reappearance of the object in the data stream.
- An AI-controlled drone can be implemented as a system that includes a drone and a at least one processor.
- the drone is equipped with a sensor, such as an image sensor, lidar, radar, or acoustic sensor, that acquires a data stream that represents an object in an environment.
- the processor is operably coupled to the sensor and can be on the drone or on a smart phone that is wirelessly coupled to the drone (e.g., via a cellular, Wi-Fi, or Bluetooth link).
- the processor (1) executes a neural network that produces a convolutional output from the data stream; (2) classifies the object based on the convolutional output, which represents features of the object; and (3) controls the drone in response to the object.
- Another method includes playing a game with a drone.
- An image sensor on the drone acquires an input data stream.
- At least one processor communicatively coupled to the image sensor detects an object of interest in an input data stream.
- An artificial neural network executed by the processor identifies the object of interest.
- the processor determines an action to be taken by the drone in the context of the game based on the object of interest, and the drone performs the action. This action may be tracking the object of interest in the input data stream, following the object of interest with the drone, or blurring at least a portion of an image of the object of interest.
- the method can also include detecting another object in the input data stream and identifying the other object with the artificial neural network.
- the neural network may recognize the object by extracting features of the object from the second image to generate a convolutional output and classifying the object with a classifier coupled to the neural network based on the convolutional output.
- the classifier may be trained in real-time, without backpropagation, to recognize the object.
- the processor may transmit the second image to the smart phone.
- the smart phone executes the neural network, which recognizes the object, and transmits a command to the drone, which executes the command.
- the technology disclosed herein includes, but is not limited to:
- a drone such as a low-cost, consumer drone, as an interactive device rather than a flying device or a flying camera;
- FIG. 1 illustrates various components of an artificial intelligence (Al)-enabled interactive game-playing system, including a drone, a controller (optional), and AI residing in either or both the drone and the controller.
- Al artificial intelligence
- FIG. 2 illustrates an example L-DNN architecture suitable for implementing the AI in the system shown in FIG. 1.
- FIG. 3 illustrates a VGG-16 based L-DNN classifier as one example implementation of the L-DNN architecture in FIG. 2.
- FIG. 4 illustrates non-uniform multiscale object detection with an L-DNN for recognizing and tracking objects with a drone.
- FIG. 5 illustrates a Mask R-CNN-based L-DNN for object segmentation in a video stream acquired with a camera on a drone, smart phone, or other imaging device.
- FIG. 6 shows a flowchart illustrating a method of detecting and tracking an object of interest using an AI-enabled game playing system.
- FIG. 7 illustrates an implementation of hide-and-seek with the system shown in FIG. 1.
- FIG. 8 illustrates an implementation of an‘alarm drone’ with the system shown in FIG. 1.
- the present technology enables gamification, or the application of typical elements of game playing, to the interaction between humans and drones (e.g., quadcopters) to increase and encourage user encourage engagement with drones via the use of Artificial Intelligence (AI) and AI processes.
- Suitable AI processes include Artificial Neural Networks (ANNs), Deep Neural Networks (DNNs), Lifelong Deep Neural Networks (L-DNNs), and other machine vision processes executed either at the compute Edge (the drone) and/or on a controlling device (e.g., a smart phone) and/or a server (via a cellular or other wireless connection), or a combination thereof.
- AI can provide information about people and objects surrounding the drone, and this information (semantic information about items in the environment, their location, and numerosity) can be used to script a game interaction between the drone and the humans participating in the game.
- An AI device or process can be used to enhance drone user engagement by intelligently processing data captured by a sensor on or associated with the drone and intelligently controlling the behavior of the drone so as to reproduce a typical game that would normally involve people.
- the drone‘plays’ a role of a human player in the game, introducing a novelty in the user engagement, determined by the fact that the new‘player’ is actually a drone-embodied imitation of a human player, where the imitation is provided by the AI executed by a processor on or connected to the drone.
- the AI can also enable to the drone to learn and track objects in a robust fashion for photography, videography, etc.
- FIG. 1 provides an overview of an AI-enabled game-playing system comprising a drone 100 with or without onboard compute power.
- the drone 100 includes a sensor suite 101, which may include a red-green-blue (RGB) camera, structured light sensor, radar, lidar, and/or microphone, that acquires sensor data 103, such as a stream of image or audio data.
- the drone 100 has one or more processors 102, which may provide enough compute power to execute the AI processes or may only manage more basic drone functions, such as managing the power supply, actuating the drone’s flight control in response to external commands, and controlling the sensor suite.
- RGB red-green-blue
- the drone 100 also includes an antenna 112, such as a Wi-Fi, Bluetooth, or cellular antenna, that can be used to establish and maintain a wireless communications link to a smart phone 120, tablet, laptop, or other controller.
- the smart phone 120 has its own processors 122, such as multi-core CPUs and GPUs, which can be used to process data from the drone 100 and send commands to the drone 100, including commands generated by AI processes in response to the data.
- the input images 103 e.g., from an RGB camera or other sensors, such as Infrared or Structured Light sensors, in the sensor suite 110
- AI module 104 e.g., game logic 105
- drone control 106 can all be hosted on board the drone 100.
- the smart phone 120 may host an AI module 124, game logic 125, and drone control 126 that communicate with the drone 100 via the wireless communications link 110.
- the sensor suite 101 on the drone 100 acquires a video data stream 103 showing objects in the drone’s environment. If the drone 100 has on-board compute power, its on-board AI module 104 recognizes objects in the data stream 103 as explained below. Game logic 105 uses information about the objects recognized by the AI module 104 (e.g., which object(s), the object position(s), etc.) to determine the drone’s next action and instructs the drone control 106 accordingly. For instance, if the AI module 104 recognizes a person moving through the scene, the game logic 105 may instruct the drone control 106 to cause the drone to follow the person. If the AI module 104 recognizes a person hiding, then the game logic 105 may instruct the drone control 106 to cause the drone to hover over the person or send an appropriate signal to the smart phone 120.
- the AI module 104 recognizes a person moving through the scene
- the game logic 105 may instruct the drone control 106 to cause the drone to follow the person.
- the game logic 105 may instruct the drone control 106 to cause
- the AI module 124, game logic 125, and/or drone control 126 provided by the smart phone 120 may process the image data 103 acquired by the drone’s sensor suite 101 and control the drone 100 in response to the object(s) recognized in the image data 103.
- the smart phone 120 may perform all or only some of these functions, depending on the drone’s capabilities and the speed and reliability of the wireless communications link 110 between the drone 100 and the smart phone 120.
- the smart phone 120 or another device, including another drone may perform all of the data acquisition and processing, and the drone 100 may simply respond to commands issued by AI implemented by the smart phone 120. For instance, a person may acquire imagery of the drone 100 and/or the drone’s location with a camera on the smart phone 120. The smart phone 120 may recognize objects in that image data using its own AI module 124, then determine how the drone 100 should respond to the object according to the game logic 125 The drone control 126 translates the game logic’s output into commands that the smart phone 120 transmits to the drone 100 via the wireless communications link 110 The drone 100 responds to these commands, e.g., by moving in a particular way.
- the AI in the interactive drone system shown in FIG. 1 can be implemented using a neural network, such as an ANN or DNN.
- ANNs and DNNs can be trained to learn and identify objects of interest relevant to the AI-enabled game-playing system.
- a properly trained ANN or DNN can recognize an object each and every time it appears in the data stream, regardless of the object’s orientation or relative size.
- the interactive drone system in FIG. 1 can simply recognize the object each time it appears in the data stream— there is no need for the user to draw a bounding box around the object in order for the drone to track the object.
- the drone does not have to keep the object in view— the object can disappear from the data stream, then reappear in the data stream some time later, and the ANN or DNN will recognize the object automatically (without user intervention).
- a conventional neural network is pre-trained in order to recognize people and objects in data of the environment surrounding the drone. This pre-training is typically accomplished using backpropagation with a set of training data (e.g., tagged images). Unfortunately, training with backpropagation can take hours or longer, making it impractical for a user to train a conventional neural network to recognize a new object. As a result, if an AI-enabled game playing system that uses a traditional neural network encounters a new, unfamiliar object, the system may fail to recognize the object correctly.
- a Lifelong Deep Neural Network can perform recognize objects like a conventional neural network and learn to recognize new objects on the fly (e.g., in near real time).
- An L-DNN enables continuous, online, lifelong learning on a lightweight compute device (e.g., a drone or smart phone) without time-consuming, computationally intensive learning through backpropagation.
- An L-DNN enables real-time learning from continuous data streams, bypassing the need to store input data for multiple iterations of backpropagation learning.
- L-DNN technology combines a representation-rich, DNN-based subsystem (Module A), also called a backbone, with a fast-learning subsystem (Module B), also called a classifier, to achieve fast, yet stable learning of features that represent entities or events of interest.
- Mode A DNN-based subsystem
- Mode B fast-learning subsystem
- These feature sets can be pre-trained by slow learning methodologies, such as backpropagation.
- the high-level feature extraction layers of the DNN serve as inputs into the fast learning system in Module B to classify familiar entities and events and add knowledge of unfamiliar entities and events on the fly.
- Module B is able to learn important information and capture descriptive and highly predictive features of the environment without the drawback of slow learning.
- L-DNN techniques can be applied to visual, structured light, LIDAR, SONAR, RADAR, or audio data, among other modalities.
- L-DNN techniques can be applied to visual processing, such as enabling whole-image classification (e.g., scene detection), bounding box-based object recognition, pixel -wise segmentation, and other visual recognition tasks. They can also perform non-visual recognition tasks, such as classification of non-visual signal, and other tasks, such as updating Simultaneous Localization and Mapping (SLAM) generated maps by incrementally adding knowledge as the drone is navigating the environment.
- SLAM Simultaneous Localization and Mapping
- An L-DNN enables an Al-enabled game playing system to learn on-the fly at the edge without the necessity of learning on a central server or cloud. This eliminates network latency, increases real-time performance, and ensures privacy when desired.
- AI- enabled game playing systems can be updated for specific tasks in the field using an L-DNN.
- inspection drones can learn how to identify problems at the top of cell towers or solar panel arrays
- Al-enable game playing systems can be personalized based on user preferences without the worry about privacy issues since data is not shared outside the local device, smart phones can share knowledge learned at the Edge (peer to peer or globally with all devices) without shipping information to a central server for lengthy learning.
- An L-DNN also enables learning new knowledge without forgetting old knowledge, thereby mitigating or eliminating catastrophic forgetting.
- the present technology enables Al-enabled game playing systems to continually and optimally adjust behavior at the edge based on user input without a) needing to send or store input images, b) time-consuming training, or c) large computing resources.
- Learning after deployment with an L-DNN allows an AI-enabled game playing system to adapt to changes in its environment and to user interactions, handle imperfections in the original data set, and provide customized experience for a user.
- An L-DNN implements a heterogeneous Neural Network architecture characterized by two modules:
- Slow learning Module A which includes a neural network (e.g., a Deep Neural Network) that is either factory pre-trained and fixed or configured to learn via backpropagation or other learning algorithms based on sequences of data inputs; and
- a neural network e.g., a Deep Neural Network
- Module B which provides an incremental classifier able to change synaptic weights and representations instantaneously, with very few training samples.
- Example instantiations of this incremental classifier include, for example, an Adaptive Resonance Theory (ART) network or Restricted Boltzmann Machine (RBM) with contrastive divergence training neural networks, as well as non-neural methods, such as Support Vector Machines (SVMs) or other fast-learning supervised classification processes.
- ART Adaptive Resonance Theory
- RBM Restricted Boltzmann Machine
- SVMs Support Vector Machines
- FIG. 2 illustrates an example L-DNN architecture used by an Al-enabled game playing system.
- the L-DNN 226 uses two subsystems, slow learning Module A 222 and fast learning Module B 224.
- Module A includes a pre-trained DNN
- Module B is based on a fast-learning Adaptive Resonance Theory (ART) paradigm, where the DNN feeds to the ART the output of one of the latter feature layers (typically, the last or the penultimate layer before DNN own classifying fully connected layers).
- ART Adaptive Resonance Theory
- Other configurations are possible, where multiple DNN layers can provide inputs to one or more Modules B (e.g., in a multiscale, voting, or hierarchical form).
- An input source 103 such as a digital camera, detector array, or microphone, acquires information/data from the environment (e.g., video data, structured light data, audio data, a combination thereof, and/or the like). If the input source 103 includes a camera system, it can acquire a video stream of the environment surrounding the Al-enabled game playing system or drone.
- the input data from the input source 103 is processed in real-time by Module A 222, which provides a compressed feature signal as input to Module B 224.
- the video stream can be processed as a series of image frames in real-time by Modules A and B.
- Module A and Module B can be implemented in suitable computer processors, such as graphics processor units, field-programmable gate arrays, or application-specific integrated circuits, with appropriate volatile and non-volatile memory and appropriate input/output interfaces.
- the input data is fed to a pre-trained Deep Neural Network (DNN) 200 in Module A.
- the DNN 200 includes a stack 202 of convolutional layers 204 used to extract features that can be employed to represent an input information/data as detailed in the example implementation section.
- the DNN 200 can be factory pre-trained before deployment to achieve the desired level of data representation. It can be completely defined by a configuration file that determines its architecture and by a corresponding set of weights that represents the knowledge acquired during training.
- the L-DNN system 226 takes advantage of the fact that weights in the DNN are excellent feature extractors.
- Module B 224 which includes one or more fast learning neural network classifiers
- some of the DNN’s upper layers only engaged in classification by the original DNN e.g., layers 206 and 208 in FIG. 2 are ignored or even stripped from the system altogether.
- a desired raw convolutional output of high level feature extraction layer 204 is accessed to serve as input to Module B 224.
- the original DNN 200 usually includes a number of fully connected, averaging, and pooling layers 206 plus a cost layer 208 that is used to enable the gradient descent technique to optimize its weights during training.
- Module B 224 These layers are used during DNN training or for getting direct predictions from the DNN 200 but aren’t necessary for generating an input for the Module B 224 (the shading in FIG. 2 indicates that layers 206 and 208 are unnecessary). Instead, the input for the neural network classifier in Module B 224 is taken from a subset of the convolutional layers of the DNN 204. Different layers, or multiple layers can be used to provide input to Module B 224.
- Each convolutional layer on the DNN 200 contains filters that use local receptive fields to gather information from a small region in the previous layer. These filters maintain spatial information through the convolutional layers in the DNN.
- the output from one or more late stage convolutional layers 204 in the feature extractor (represented pictorially as a tensor 210) are fed to input neural layers 212 of a neural network classifier (e.g., an ART classifier) in Module B 224.
- a neural network classifier e.g., an ART classifier
- the initial Module B neural network classifier can be pre-trained with arbitrary initial knowledge or with a trained classification of Module A 222 to facilitate learning on-the-fly after deployment.
- the neural network classifier continuously processes data (e.g., tensor 210) from the DNN 200 as the input source 103 provides data relating to the environment to the L-DNN 106.
- the Module B neural network classifier uses fast, preferably one-shot learning.
- An ART classifier uses bottom-up (input) and top-down (feedback) associative projections between neuron-like elements to implement match-based pattern learning as well as horizontal projections to implement competition between categories.
- ART -based Module B 224 puts the features as an input vector in Fl layer 212 and computes a distance operation between this input vector and existing weight vectors 214 to determine the activations of all category nodes in F2 layer 216.
- the distance is computed either as a fuzzy AND (in the default version of ART), dot product, or Euclidean distance between vector ends.
- the category nodes are then sorted from highest activation to lowest to implement competition between them and considered in this order as winning candidates.
- Module B 224 serves as an output of L-DNN 226 either by itself or as a combination with an output from a specific DNN layer from Module A 222, depending on the task that the L-DNN 226 is solving.
- Module B output may be sufficient as it classifies the whole image.
- Module B 224 provides class labels that are superimposed on bounding boxes determined from Module A activity, so that each object is located correctly by Module A 222 and labeled correctly by Module B 224.
- the bounding boxes from Module A 222 may be replaced by pixel-wise masks, with Module B 224 providing labels for these masks.
- FIG. 3 represents example L-DNN implementation for whole image classification using a modified VGG-16 DNN as the core of Module A. Sofmax and the last two fully connected layers are removed from the original VGG-16 DNN, and an ART -based Module B is connected to the first fully connected layer of the VGG-16 DNN.
- a similar but much simpler L-DNN can be created using Alexnet instead of VGG-16. This is a very simple and computationally cheap system that runs on any modem smart phone or drone, does not require a GPU or any other specialized processor, and can learn any set of objects from a few frames of input provided by the smart phone camera or camera attached to a drone.
- One way to detect objects of interest in an image is to divide the image into a grid and ran classification on each grid cell.
- CNNs conventional neural networks
- each layer processes data maintaining a topographic organization. This means that irrespective of how deep in the network or kernel, stride, or pad sizes, features corresponding to a particular area of interest on an image can be found on every layer at various resolutions in the similar area of the layer. For example, when an object is in the upper left corner of an image, the
- Module B only one Module B must be created per each DNN layer (or scale) used as input because the same feature vector represents the same object irrespective of the position in the image. Learning one object in the upper right corner thus allows Module B to recognize it anywhere in the image.
- Using multiple DNN layers of different sizes (scales) as inputs to separate Modules B allows detection on multiple scales. This can be used to fine tune the position of the object in the image without processing the whole image at finer scale as in the following process.
- Module A provides the coarsest scale (for example, 7 c 7 in the publicly available ExtractionNet) image to Module B for classification. If Module B says that an object is located in the cell that is second from the left edge and fourth from top edge, only the corresponding part of the finer DNN input (for example, 14 x 14 in the same ExtractionNet) should be analyzed to further refine the location of the object.
- the coarsest scale for example, 7 c 7 in the publicly available ExtractionNet
- Another application of multiscale detection can use a DNN design where the layer sizes are not multiples of each other. For example, if a DNN has a 30 c 30 layer it can be reduced to layers that are 2 x 2 (compression factor of 15), 3 x 3 (compression factor of 10), and 5 x 5 (compression factor of 6). As shown in FIG. 4, attaching Modules B to each of these compressed DNNs gives coarse locations of an object (indicated as 402, 404, 406). But if the output of these Modules B is combined (indicated as 408), then the spatial resolution becomes a nonuniform 8 x 8 grid with higher resolution in the center and lower resolution towards the edges.
- the resolution in the multiscale grid in FIG. 4 for the central 36 locations is equal to or finer than the resolution in the uniform 8 x 8 grid.
- the system is able to pinpoint the location of an object (410) more precisely using only 60% of the computational resources of a comparable uniform grid. This performance difference increases for larger layers because the square of the sum (representing the number of computations for a uniform grid) grows faster than sum of squares (representing the number of computations for a non-uniform grid).
- Non-uniform (multiscale) detection can be especially beneficial for AI-enabled game playing systems as the objects in the center of view are most likely to be in the path of the drone and benefit from more accurate detection than objects in the periphery that do not present a collision threat.
- object detection is commonly defined as the task of placing a bounding box around an object and labeling it with an associated class (e.g., "dog").
- object detection techniques are commonly implemented by selecting one or more regions of an image with a bounding box, and then classifying the features within that box as a particular class, while simultaneously regressing the bounding box location offsets.
- Algorithms that implement this method of object detection include Region-based CNN (R-CNN), Fast R-CNN, and Faster R-CNN, although any method that does not make the localization depend directly on classification information may be substituted as the detection module.
- Image segmentation is the task of determining a class label for all or a subset of pixels in an image. Segmentation may be split into semantic segmentation, where individual pixels from two separate objects of the same class are not disambiguated, and instance segmentation, where individual pixels from two separate objects of the same class are uniquely identified or instanced. Image segmentation is commonly implemented by taking the bounding box output of an object detection method (such as R-CNN, Fast R-CNN, or Faster R-CNN) and segmenting the most prominent object in that box. The class label that is associated with the bounding box is then associated with segmented object. If no class label can be attributed to the bounding box, the segmentation result is discarded. The resulting segmented object may or may not have instance information.
- An algorithm that implements this method of segmentation is Mask R-CNN.
- FIG. 5 shows an L-DNN design for image detection or segmentation based on the R- CNN family of networks.
- a static classification module 500 may be replaced with an L-DNN Module B 224. That is, the segmentation pathway of the network remains unchanged; region proposals are made as usual, and subsequently segmented.
- L-DNN Module B 224 returns no positive class predictions that pass threshold, the segmentation results are discarded.
- the L-DNN Module B 224 returns an acceptable class prediction, the segmentation results are kept, just as with the static classification module.
- the L-DNN Module B 224 offers continual adaptation to change state from the former to the latter via user feedback.
- User feedback may be provided directly through bounding box and class labels, such as is the case when the user selects and tags an object on a social media profile, or through indirect feedback, such as is the case when the user selects an object in a video, which may then be tracked throughout the video to provide continuous feedback to the L-DNN on the new object class.
- This feedback is used to train the L-DNN how to classify novel class networks over time. This process does not affect the segmentation component of the network.
- Module B 224 in this paradigm also has some flexibility.
- the input to Module B 224 should be directly linked to the output of Module A convolutional layers 202, so that class labels may be combined with the segmentation output to produce a segmented, labeled output 502. This constraint may be fulfilled by having both Modules A and B take the output of a region proposal stage. Module A should not depend on any dynamic portion of Module B.
- Module B is adapting its network’s weights, but Module A is static, if Module B were to change its weights and then pass its output to Module A, Module A would likely see a performance drop due to the inability of most static neural networks to handle a sudden change in the input representation of its network.
- An Al-enabled game playing system can include a camera input or other sensory input (e.g., a sensor suite 110 as in FIG. 1) to capture information about people or objects surrounding a drone.
- the L-DNN included in the Al-enabled game playing system uses the input data to first extract features of an object or a person using module A.
- the one-shot classifier or module B in L-DNN uses these extracted features to classify the object. In this manner, the L-DNN identifies objects on-the-fly.
- subsequent input data can be used to continue tracking the object. If the drone encounters new objects or people as the game progresses, it can learn these new objects/people. L-DNNs enable learning new knowledge on-the-fly without catastrophic forgetting.
- the object of interest might go out of the drone’s field of view. If the object of interest returns to the drone’s field of view, the L-DNN can identify the object again without user intervention. In other words, if an object of interest moves back into the drone’s field of view after having moved out of the drone’s field of view, the AI-enabled game playing system can identify the object and resume tracking the object automatically. If the L-DNN has been trained to recognize the object from different angles, the object’s orientation upon reappearing the data stream is irrelevant; the L-DNN should recognize it no matter what. Likewise, the time elapsed between the object’s disappearance from and reappearance in the data stream is irrelevant; the L-DNN can wait indefinitely, so long as it is not reset during the wait.
- the user has to identify the object before tracking begins. Typically, the user does this by positioning the drone to acquire an image of the object and drawing a bounding box in the object. The drone correlates the pixels in this bounding box with pixels in subsequent images; the correlation gives the shift of the object’s geometric center from frame to frame. The drone can track the object so long as the object remains in the field of view and so long as the object’s appearance doesn’t change suddenly, e.g., because the object has turned sideways.
- the L-DNN-based AI-enabled game playing system can also enable improved performance accuracy by combining contextual information with current object information.
- the contextual L-DNN may learn that certain objects are likely to co-occur in the input stream. For example, camels, palm trees, sand dunes, and off-road vehicles are typical objects in a desert scene, whereas houses, sports cars, oak trees, and dogs are typical objects in a suburban scene.
- Locally ambiguous information at the pixel level and acquired as a drone input can be mapped to two object classes (e.g., camel or dog) depending on the context. In both cases, the object focus of attention has an ambiguous representation, which is often the case in low-resolution images.
- the pixelated image of the camel can be disentangled by global information about the scene and past associations learned between objects, despite“camel” is only the fourth most likely class inferred by the L-DNN, the most probably being“horse” based on local pixel information alone.
- Contextual objects e.g., a sand dune, off-road vehicle, or palm tree
- the contextual classifier can overturn the “horse” class in favor of the“camel” class.
- an L-DNN based AI-enabled game playing system can provide seamless identification, detection, and tracking of an object of interest.
- a drone implementing an AI-enabled game playing system disclosed herein can identify one or more objects of interest and track these objects. If the drone loses sight of the object of interest and the object of interest moves back into the line of sight of the drone, the Al-enabled game playing system can retrack the object. If the drone encounters new object that it hasn’t encountered before, the Al-enabled game playing system can learn these new objects.
- FIG. 6 shows a method 600 of detecting, tracking, and interacting with an object of interest using an Al-enabled interactive drone .
- the user may train the AI (e.g., an L- DNN) to recognize new objects (box 602), such as the people who will be playing a game using the Al-enabled interactive drone.
- the user can train the AI with previously acquired and tagged images, e.g., from the players’ social media accounts, or with images acquired in real time using a smart phone or the camera on the drone.
- the AI Once the AI has been trained, it can be loaded onto the drone or onto a smart phone that controls the drone (box 604) if it does not already reside on the drone or smart phone.
- a sensor such as a camera on the drone acquires a data stream (box 606), such as an image or video stream during the game or other interaction (e.g., a“follow-me” interaction where the drone tracks and films a user).
- a data stream such as an image or video stream during the game or other interaction (e.g., a“follow-me” interaction where the drone tracks and films a user).
- a typical sensor e.g., camera
- multiple objects can be present, and an AI module in or coupled to the drone recognizes one or more objects in that object stream (box 608) without the need for any bounding boxes drawn by the user.
- the AI modulate includes an L-DNN
- the L-DNN uses a pre-trained neural network (module A) to extract features of the objects present in the sensor stream (box 610).
- a one-shot classifier uses these extracted features to classify the objects (box 612).
- the game logic module e.g., game logic modules 105 and 125 in FIG. 1
- can then provide guidance to the drone control module e.g., drone control modules 106 and 126 in FIG. 1 to execute an action (box 620).
- executing the action can include avoiding obstacles that are in the path of the drone.
- the action can be to track a particular object of interest (box 622). For example, the drone may be prompted to track a ball with its camera or .
- the action can be to follow an object of interest.
- the drone may be prompted to follow that person.
- the action can also include blurring or replacing an object of interest in the images (box 624).
- the drone may be prompted to blur or segment a face in an image for privacy purposes before providing the image data to a display, server, or other memory.
- the drone may also take off, perform a particular maneuver (e.g., a flip or roll), hover, or land if it detects a specific object.
- the relationship between detected objects and action is contained in the game logic module (e.g., game logic 105 in FIG. 1).
- the object may disappear from the data stream (box 630). For instance, if the object is a person, and the person hides behind another object, the person may no longer be visible in the image stream captured by the drone’s camera. Similarly, as the drone flies through a given area, other objects in the area may occlude some or all of the drone camera’s field of view, potentially causing the person to disappear from the data stream, at least momentarily. If the game logic senses that the person has disappeared, it may command the drone to hover, return to a previous position, or follow a search pattern to re-acquire the person. In any event, if the person re-appears in the data stream (box 632), even from a different perspective, the AI module can recognize the person as described above (box 608).
- a drone using AI can be used to imitate a role of a human player.
- the AI-enable game playing system can provide enhanced drone user engagement by intelligently processing the camera input or other sensory input and intelligently controlling the behavior of the drone so as to reproduce a typical game that would normally involve people.
- Hide-and-Seek In a traditional hide-and-seek game, which can be played indoors or outdoors, the seeker closes his eyes for a brief period while the other players hide. The seeker then opens his eyes and tries to find the hiders, with the last person to be found being the winner of the round.
- the technology described herein transforms the traditional game hide-and-seek game into one where a drone plays the role of the seeker.
- AI embedded onboard a drone or on a controlling device e.g., a smart phone
- a controlling device e.g., a smart phone
- compute power provides a new user experience where the seeker is not a human player, but an intelligent drone, via appropriate scripting of the AI in the context of the game.
- FIG. 7 illustrates‘hide-and-seek,’ where the drone imitates the homonymous game.
- a drone 100 slowly spins on its axis and/or performs small, semi-random movements to seek human players partially occluded (e.g., by a tree 1100) in an outdoor or indoor space.
- the AI detects a person 1000 the player identified may be captured by a picture and eliminated from the game. The drone may continue this behavior until a time limit is reached or all but one person has been found.
- the technology described herein transforms the traditional game capture the flag game into one where a drone plays the role of a player.
- AI embedded onboard a drone or on a controlling device e.g., a smart phone
- a controlling device e.g., a smart phone
- compute power provides a new user experience where one of the players is not a human player, but an intelligent drone.
- the drone can imitate a player by tagging enemy team’s players in the drone’s team territory.
- the drone can detect, identify, and track enemy team players in order to do so.
- the drone can also search and detect the opposing team’s flag. If the drone detects the flag, the drone may be prompted to perform an action, such as hovering over the flag, so that another player in the same team can steal the opposing team’s flag.
- the drone can imitate the player chosen to chase other players.
- the drone can identify and track the other players. When it is at a pre-defmed distance from at least one other player, it can be prompted to tag that player.
- the technology described herein can be used to track a person of interest.
- the drone can therefore also be prompted to follow a person of interest.
- a drone may be prompted to follow a user when the user is performing an activity and capture pictures or videos of the user.
- a drone can be prompted to follow a user when the user is skiing in order to capture pictures and videos of the user.
- the user could be running, biking, kayaking, snowboarding, climbing, etc.
- the drone may maneuver based on the user’s trajectory, e.g., the drone may follow the user at a predetermined distance or keep the user positioned at a particular point in the sensor’s field of view.
- the drone uses a neural -network classifier, if the drone loses track of the person, e.g., because the person is temporarily occluded or leaves the sensor’s field of view, the drone can automatically re-acquire the person as soon as the occlusion vanishes or the person returns to the sensor’s field of view.
- the neural -network classifier recognizes the person in the data stream, regardless of the person’s orientation or how long the person has been absent from the data stream, so long as the drone has not been restored to its factory settings. For instance, the drone can be taught to recognize a person during a first session, then turned off and on and used to recognize the same person during a subsequent session.
- Drone alarm FIG. 8 illustrates another example game, termed‘drone alarm’.
- a drone 100 is placed in such a position as to monitor the entrance 2000 of a room (similar concepts can be applied to outdoor spaces).
- the AI module detects an intruder (here, a person 1000, but it can also be a pet)
- the drone takes off.
- events can happen on the controlling device 120, with a picture or a scan of the intruder being taken, or a sound alarm being played.
- inventive embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed.
- inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein.
- embodiments can be implemented in any of numerous ways. For example, embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.
- a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone or any other suitable portable or fixed electronic device.
- PDA Personal Digital Assistant
- a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output.
- Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets.
- a computer may receive input information through speech recognition or in other audible format.
- Such computers may be interconnected by one or more networks in any suitable form, including a local area network or a wide area network, such as an enterprise network, and intelligent network (IN) or the Internet.
- networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.
- the various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.
- inventive concepts may be embodied as one or more methods, of which an example has been provided.
- the acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.
- a reference to“A and/or B”, when used in conjunction with open-ended language such as“comprising” can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.
- the phrase“at least one,” in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements.
- This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase“at least one” refers, whether related or unrelated to those elements specifically identified.
- “at least one of A and B” can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Automation & Control Theory (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862655490P | 2018-04-10 | 2018-04-10 | |
US62/655,490 | 2018-04-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019199967A1 true WO2019199967A1 (en) | 2019-10-17 |
Family
ID=68163805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2019/026783 WO2019199967A1 (en) | 2018-04-10 | 2019-04-10 | Systems and methods for gamification of drone behavior using artificial intelligence |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2019199967A1 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111027427A (en) * | 2019-11-29 | 2020-04-17 | 大连理工大学 | Target gate detection method for small unmanned aerial vehicle race |
CN111257507A (en) * | 2020-01-16 | 2020-06-09 | 清华大学合肥公共安全研究院 | Gas concentration detection and accident early warning system based on unmanned aerial vehicle |
CN113724295A (en) * | 2021-09-02 | 2021-11-30 | 中南大学 | Unmanned aerial vehicle tracking system and method based on computer vision |
US11561540B2 (en) * | 2019-02-26 | 2023-01-24 | Intel Corporation | Augmenting autonomous driving with remote viewer recommendation |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180074519A1 (en) * | 2016-09-13 | 2018-03-15 | Hangzhou Zero Zero Technology Co., Ltd. | Unmanned aerial vehicle system and method with environmental sensing |
-
2019
- 2019-04-10 WO PCT/US2019/026783 patent/WO2019199967A1/en active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180074519A1 (en) * | 2016-09-13 | 2018-03-15 | Hangzhou Zero Zero Technology Co., Ltd. | Unmanned aerial vehicle system and method with environmental sensing |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11561540B2 (en) * | 2019-02-26 | 2023-01-24 | Intel Corporation | Augmenting autonomous driving with remote viewer recommendation |
US20230315093A1 (en) * | 2019-02-26 | 2023-10-05 | Mobileye Vision Technologies Ltd. | Augmenting autonomous driving with remote viewer recommendation |
US11899457B1 (en) * | 2019-02-26 | 2024-02-13 | Mobileye Vision Technologies Ltd. | Augmenting autonomous driving with remote viewer recommendation |
CN111027427A (en) * | 2019-11-29 | 2020-04-17 | 大连理工大学 | Target gate detection method for small unmanned aerial vehicle race |
CN111027427B (en) * | 2019-11-29 | 2023-07-18 | 大连理工大学 | Target gate detection method for small unmanned aerial vehicle racing match |
CN111257507A (en) * | 2020-01-16 | 2020-06-09 | 清华大学合肥公共安全研究院 | Gas concentration detection and accident early warning system based on unmanned aerial vehicle |
CN113724295A (en) * | 2021-09-02 | 2021-11-30 | 中南大学 | Unmanned aerial vehicle tracking system and method based on computer vision |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108961312B (en) | High-performance visual object tracking method and system for embedded visual system | |
US10846873B2 (en) | Methods and apparatus for autonomous robotic control | |
Bux et al. | Vision based human activity recognition: a review | |
US11869237B2 (en) | Modular hierarchical vision system of an autonomous personal companion | |
WO2019199967A1 (en) | Systems and methods for gamification of drone behavior using artificial intelligence | |
KR102287460B1 (en) | Artificial intelligence moving agent | |
Ahad | Motion history images for action recognition and understanding | |
CN107851191B (en) | Context-based priors for object detection in images | |
US11430124B2 (en) | Visual object instance segmentation using foreground-specialized model imitation | |
CN110914836A (en) | System and method for implementing continuous memory bounded learning in artificial intelligence and deep learning for continuously running applications across networked computing edges | |
Pons et al. | Assessing machine learning classifiers for the detection of animals’ behavior using depth-based tracking | |
CN102301311B (en) | Standard gestures | |
KR102414602B1 (en) | Data recognition model construction apparatus and method for constructing data recognition model thereof, and data recognition apparatus and method for recognizing data thereof | |
CN103608844A (en) | Fully automatic dynamic articulated model calibration | |
CN102222431A (en) | Hand language translator based on machine | |
Lakshmi et al. | Neuromorphic vision: From sensors to event‐based algorithms | |
Pavel et al. | Object class segmentation of RGB-D video using recurrent convolutional neural networks | |
US20210204785A1 (en) | Artificial intelligence moving agent | |
Majumder et al. | A review of real-time human action recognition involving vision sensing | |
Othman et al. | Challenges and Limitations in Human Action Recognition on Unmanned Aerial Vehicles: A Comprehensive Survey. | |
Bukht et al. | A review of video-based human activity recognition: Theory, methods and applications | |
Mohamed | A novice guide towards human motion analysis and understanding | |
Peng | Object recognition in videos utilizing hierarchical and temporal objectness with deep neural networks | |
Bhaidasna et al. | A Survey on Different Deep Learning Model for Human Activity Recognition Based on Application | |
EP4290478A1 (en) | Method for processing image acquired from imaging device linked with computing device, and system using same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19784619 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19784619 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 01.04.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19784619 Country of ref document: EP Kind code of ref document: A1 |