US20160088286A1 - Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment - Google Patents

Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment Download PDF

Info

Publication number
US20160088286A1
US20160088286A1 US14/858,901 US201514858901A US2016088286A1 US 20160088286 A1 US20160088286 A1 US 20160088286A1 US 201514858901 A US201514858901 A US 201514858901A US 2016088286 A1 US2016088286 A1 US 2016088286A1
Authority
US
United States
Prior art keywords
scene
composition
subjects
models
subject
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/858,901
Inventor
Hamish Forsythe
Alexander Cecil
Original Assignee
Hamish Forsythe
Alexander Cecil
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US201462053055P priority Critical
Application filed by Hamish Forsythe, Alexander Cecil filed Critical Hamish Forsythe
Priority to US14/858,901 priority patent/US20160088286A1/en
Assigned to FORSYTHE, HAMISH reassignment FORSYTHE, HAMISH ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: CECIL, Alexander, FORSYTHE, HAMISH
Publication of US20160088286A1 publication Critical patent/US20160088286A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24575Query processing with adaptation to user needs using context
    • H04N13/026
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00335Recognising movements or behaviour, e.g. recognition of gestures, dynamic facial expressions; Lip-reading
    • G06K9/00342Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00624Recognising scenes, i.e. recognition of a whole field of perception; recognising scene-specific objects
    • G06K9/00664Recognising scenes such as could be captured by a camera operated by a pedestrian or robot, including objects at substantially different ranges from the camera
    • G06K9/00671Recognising scenes such as could be captured by a camera operated by a pedestrian or robot, including objects at substantially different ranges from the camera for providing information about objects in the scene to a user, e.g. as in augmented reality applications
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00885Biometric patterns not provided for under G06K9/00006, G06K9/00154, G06K9/00335, G06K9/00362, G06K9/00597; Biometric specific functions not specific to the kind of biometric
    • G06K9/00912Interactive means for assisting the user in correctly positioning the object of interest
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce, e.g. shopping or e-commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping
    • G06Q30/0623Item investigation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce, e.g. shopping or e-commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/08Auctions, matching or brokerage
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/261Image signal generators with monoscopic-to-stereoscopic image conversion

Abstract

Method and system for automatic composition and orchestration of a 3D space or scene using networked devices and computer vision to bring ease of use and autonomy to a range of compositions. A scene, its objects, subjects and background are identified and classified, and relationships and behaviors are deduced through analysis. Compositional theories are applied, and context attributes (for example location, external data, camera metadata, and the relative positions of subjects and objects in the scene) are considered automatically to produce optimal composition and allow for direction of networked equipment and devices. Events inform the capture process, for example, a video recording initiated when a rock climber waves her hand, an autonomous camera automatically adjusting to keep her body in frame throughout the sequence of moves. Model analysis allows for direction, including audio tones to indicate proper form for the subject and instructions sent to equipment ensure optimal scene orchestration.

Description

    CROSS REFERENCE TO RELATED APPLICATION
  • The instant application is a utility application of the previously filed U.S. Provisional Application 62/053,055 filed on 19 Sep. 2014. The pending U.S. Provisional Application 62/053,055 is hereby incorporated by reference in its entireties for all of its teachings.
  • FIELD OF INVENTION
  • A method and system for automatically sensing using photographic equipment that captures a 3D space, scene, subject, object, and equipment for further analysis, composition and direction that can be used for creating visual design.
  • BACKGROUND
  • Computer device hardware and software continue to advance in sophistication. Cameras, micro controllers, computer processors (e.g., ARM), and smartphones have become more capable, as well as smaller, cheaper, and ubiquitous. In parallel, more sophisticated algorithms including computer vision, machine learning and 3D models can be computed in real-time or near real-time on a smartphone or distributed over a plurality of devices over a network.
  • At the same time, multiple cameras including front-facing cameras on smartphones have enabled the popularity of the selfie as a way for anyone to quickly capture a moment and share it with others. But the primary mechanism for composition has not advanced beyond an extended arm or a selfie stick and use of the device's screen as a visual reference for the user to achieve basic scene framing. Recently, there have been GPS-based drone cameras introduced such as Lily that improve on the selfie-stick, but they are not autonomous and instead require the user to wear a tracking device to continually establish the focal point of the composition and pass directional “commands” to the drone via buttons on the device. This is limiting when trying to include multiple dynamic subjects and or objects in the frame (a “groupie”), or when the user is preoccupied or distracted (for example at a concert, or while engaged in other activities).
  • SUMMARY
  • The present invention is in the areas of sensing, analytics, direction, and composition of 3D spaces. It provides a dynamic real-time approach to sense, recognize, and analyze objects of interest in a scene; applies a composition model that automatically incorporates best practices from prior art as models, for example: photography, choreography, cinematography, art exhibition, and live sports events; and directs subjects and equipment in the scene to achieve the desired outcome.
  • In one embodiment, a high-quality professional-style recording is being composed using the method and system. Because traditional and ubiquitous image capture equipment can now be enabled with microcontrollers and/or sensor nodes in a network to synthesize numerous compositional inputs and communicate real-time directions to subjects and equipment using a combination of sensory (e.g., visual, audio, vibration) feedback and control messages, it becomes significantly easier to get a high-quality output on one's own. If there are multiple people or subjects who need to be posed precisely, each subject can receive personalized direction to ensure their optimal positioning relative to the scene around them.
  • In one embodiment, real-world scenes are captured using sensor data and translated into 2D, 2.5 D and 3D models in real-time using a method such that continuous spatial sensing, recognition, composition, and direction is possible without requiring additional human judgment or interaction with the equipment and/or scene.
  • In one embodiment, image processing, image filtering, video analysis motion, background subtraction, object tracking, pose, stereo correspondence, and 3D reconstruction are run perpetually to provide optimal orchestration of subjects and equipment in the scene without a human operator.
  • In one embodiment, subjects can be tagged explicitly by a user, or determined automatically by the system. If desired, subjects can be tracked or kept in frame over time and as they move throughout a scene, without further user interaction with the system. The subject(s) can also be automatically directed through sensory feedback (e.g., audio, visual, vibration) or any other user interface.
  • In one embodiment as a method, an event begins the process of capturing the scene. The event can be an explicit hardware action such as pressing a shutter button or activating a remote control for the camera, or the event can be determined via software, a real world event, message or notification symbol; for example recognizing the subject waving their arms, a hand gesture or an object, a symbol, or identified subject or entity entering a predetermined area in the scene.
  • The system allows for the identification of multiple sensory event types, including physical-world events (object entering/exiting the frame, a sunrise, a change in the lighting of the scene, the sound of someone's voice, etc.) and software-defined events (state changes, timers, sensor-based). In one embodiment, a video recording is initiated when a golfer settles into her stance and aims her head down, and the camera automatically adjusts to keep her moving club in the frame during her backswing before activating burst mode so as to best capture the moment of impact with the ball during her downswing before pausing the recording seconds after the ball leaves the frame. Feedback can be further provided to improve her swing based on rules and constraints provided from an external golf professional, while measuring and scoring how well she complies with leading practice motion ranges.
  • In another embodiment, a video or camera scan can be voice or automatically initiated when the subject is inside the camera frame and monitor and direct them through a sequence of easy to understand body movements and steps with a combination of voice, lights and by simple mimic of on-screen poses as in a user interface or visual display. For a few examples, the subject could be practicing and precisely evaluating yoga poses, following a physical therapy program, or taking private body measurements.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1A, 1B, 1C, 1D, 1E show various methods for hands-free capture of a scene.
  • FIG. 2A, 2B, 2C, 2D, 2E, 2F, 2G, 2H, 2I, 2J illustrate a system of cameras that can be used to implement a method of sensing, analyzing, composing, and directing a scene.
  • FIG. 3 shows examples of interfaces, inputs, and outputs to direct subjects in a scene.
  • FIG. 4 shows further examples of interface when capturing close-up scenes.
  • FIG. 5 shows examples of selfies, groupies, and other typical applications.
  • FIG. 6 is a diagram of the Sensing Module, Analytics Module, Composition/Architecture Module, and Direction/Control Module.
  • FIG. 7 is a diagram of the system's algorithm and high-level process flow.
  • FIG. 8 is a detailed look at the Sensing Module from FIG. 6.
  • FIG. 9 is a high level view of the system architecture for on-premise and cloud embodiments.
  • FIG. 10 illustrates various iconic and familiar compositions and reference poses.
  • FIG. 11 shows an interface for choosing a composition model and assigning objects or subjects for direction.
  • FIG. 12 shows further examples of compositions that can be directed.
  • FIG. 13 shows an example interface for using data attached to specific geolocations, as well as an example use case.
  • FIG. 14 shows how computer vision can influence composition model selection.
  • FIGS. 15A and 15B show examples of Building Information Management (BIM) applications.
  • FIG. 16 shows how a collection of images and file types can be constructed and deconstructed into sub-components including 3D aggregate models and hashed files for protecting user privacy across system from device to network and cloud service.
  • FIG. 17 shows types of inputs that inform the Models from FIG. 6
  • FIG. 18 shows a method for virtual instruction to teach how to play music
  • FIG. 19 is an example of how a Model can apply to Sensed data.
  • FIG. 20 shows example connections to the network and to the Processing Unit.
  • DETAILED DESCRIPTION
  • The present invention enables real-time sensing, spatial composition, and direction for objects, subjects, scenes, and equipment in 2D, 2.5D or 3D models in a 3D space. In a common embodiment, a smartphone will be used for both its ubiquity and the combination of cameras, sensors, and interface options.
  • FIG. 1A shows how such a cell phone (110) can be positioned to provide hands-free capture of a scene. This can be achieved using supplemental stands different from traditional tripods designed for non-phone cameras. FIG. 1C shows a stand can be either foldable (101) or rigid (102) so long as it holds the sensors on the phone in a stable position. A braced style of stand (103) like the one shown in FIG. 1E can also be used. The stand can be made of any combination of materials, so long as the stand is sufficiently tall and wide as to support the weight of the capturing device (110) and hold it securely in place.
  • In an embodiment, the self-assembled stand (101) can be fashioned from materials included as a branded or unbranded removable insert (105) in a magazine or other promotion (106) with labeling and tabs sufficient so that the user is able to remove the insert (105) and assemble it into a stand (101) without any tools. This shortens the time to initial use by an end-user by reducing the steps needed to position a device for proper capture of a scene.
  • As seen in FIG. 1D, the effect of the stand can also be achieved using the angle of the wall/floor and the natural surface friction of a space. In this embodiment, the angle of placement (107) is determined by the phone's (110) sensors and slippage can be detected by monitoring changes in those sensors. The angle of elevation can be extrapolated from the camera's lens (111), allowing for very wide capture of a scene when the phone is oriented in portrait mode. Combined with a known fixed position from the bottom of the phone to the lens (104), the system is now able to deliver precise measurements and calibrations of objects in a scene. This precision could be used, for example, to capture a user's measurements and position using only one capture device (110) instead of multiple.
  • When positioning the device on a door, wall, or other vertical surface (FIG. 1A), adhesive or magnets (120) can be used to secure the capture device (110) and prevent it from falling. For rented apartments or other temporary spaces, the capture device can also be placed in a case (122) such that the case can then be mounted via hooks, adhesives, magnets, or other ubiquitous fasteners (FIG. 1B). This allows for easy removal of the device (110) without compromising or disturbing the capture location (121).
  • Referring now to FIG. 2A, 2B, 2C, 2D, 2E, 2F, 2G, 2H, 2I, 2J, various devices can be orchestrated into an ensemble to capture a scene. Existing capture device types can be positioned and networked to provide an optimal awareness of the scene. Examples include: cameras (202), wearable computers such as the Apple Watch, Google Glass, or FitBit (FIG. 2B), pan/tilt cameras such as those found in webcams/security cameras (FIG. 2C), mobile devices such as smartphones or tablets (FIG. 2D) equipped with front and rear-facing cameras (including advanced models with body-tracking sensors and fast-focus systems such as in the LG G3), traditional digital cameras (FIG. 2E), laptops with integrated webcams (FIG. 2F), depth cameras or thermal sensors (FIG. 2G) like those found in the Xbox Kinect hardware, dedicated video cameras (FIG. 2H), and autonomous equipment with only cameras attached (FIG. 2I) or autonomous equipment with sensors (FIG. 2J) such as sonar sensors, or infrared, or laser, or thermal imaging technology.
  • Advances in hardware/software coupling on smartphones further extend the applicability of the system and provide opportunities for a better user experience when capturing a scene because ubiquitous smartphones and tablets (FIG. 2D) can increasingly be used instead of traditionally expensive video cameras (FIG. 2E, FIG. 2H).
  • Using the mounts described in FIG. 1A, a device (110) can be mounted on a door or wall to capture the scene. The door allows panning of the scene by incorporating the known fixed-plane movement of the door. For alternate vantage points, it is also possible to use the mounts to position a device on a table (213) or the floor using a stand (103), or to use a traditional style tripod (215). The versatility afforded by the mounts and stands allows for multiple placement options for capturing devices, which in turn allows for greater precision and flexibility when sensing, analyzing, composing, and directing a subject in a 3D space.
  • Once recognized in a scene, subjects (220) can then be directed via the system to match desired compositional models, according to various sensed orientations and positions. These include body alignment (225), arm placement (230), and head tilt angle (234). Additionally, the subject can be directed to rotate in place (235) or to change their physical location by either moving forward, backward, or laterally (240).
  • Rotation (225) in conjunction with movement along a plane (240) also allows for medical observation, such as orthopedic evaluation of a user's gait or posture. While an established procedure exists today wherein trained professional humans evaluate gait, posture, and other attributes in-person, access to those professionals is limited and the quality and consistency of the evaluations is irregular. The invention addresses both shortcomings through a method and system that makes use of ubiquitous smartphones (110) and the precision and modularity of models. Another instance where networked sensors and cameras can replace a human professional is precise body measurement, previously achieved by visiting a quality tailor. By creating a 3D scene and directing subjects (220) intuitively as they move within it, the system is able to ensure with high accuracy that the subjects go through the correct sequences and the appropriate measurements are collected efficiently and with repeatable precision. Additionally, this method of dynamic and precise capture of a subject while sensing can be used to achieve positioning required for stereographic images with e.g., a single lens or sensor.
  • FIG. 3 provides examples of interface possibilities to communicate feedback to the subjects and users. The capturing device (110) can relay feedback that is passed to subjects through audio tones (345), voice commands (346), visually via a screen (347), or using vibration (348). An example of such a feedback loop is shown as a top view looking down on the subject (220) as they move along the rotation path directed in (225) according to audio tones heard by the subject (349).
  • The visual on-screen feedback (347) can take the form of a superimposed image of the subject's sensed position relative to the directed position in the scene (350). In one embodiment, the positions are represented as avatars, allowing human subjects to naturally mimic and achieve the desired position by aligning the two avatars (350). Real-time visual feedback is possible because the feedback-providing device (110) is networked (351) to all other sensing devices (352), allowing for synthesis and scoring of varied location and position inputs and providing a precise awareness of the scene's spatial composition (this method and system is discussed further in FIG. 8). One example of additional sensing data that can be networked is imagery of an infrared camera (360).
  • Other devices such as Wi-Fi-enabled GoPro®-style action cameras (202) and wearable technologies such as a smart watch with a digital display screen (353) can participate in the network (351) and provide the same types of visual feedback (350). This method of networking devices for capturing and directing allows individuals to receive communications according to their preferences on any network-connected device such as, but not limited to, a desktop computer (354), laptop computer (355), phone (356), tablet (357), or other mobile computer (358).
  • FIG. 4 provides examples of an interface when the screen is not visible, for example because the capture device is in too close of proximity to the subject. If the capture device is a smartphone (110) oriented to properly capture a subject's foot (465), it is unlikely that the subject will have sufficient ability to interact with the phone's screen, and there may not be additional devices or screens available to display visual feedback to the user.
  • The example in (466) shows how even the bottom of a foot (471) can be captured and precise measurements can be taken using a smartphone (110). By using the phone's gyroscope, the phone's camera can be directed to begin the capture when the phone is on its back, level, and the foot is completely in frame. No visual feedback is required and the system communicates direction such as rotation (470) or orientation changes (473, 474) through spoken instructions (446) via the smartphone's speakers (472).
  • Multiple sensory interface options provide ways to make the system more accessible, and allow more people to use it. In an embodiment, a user can indicate they do not want to receive visual feedback (because they are visually impaired, or because the ambient lighting is too bright, or for other reasons) and their preference can be remembered, so that they can receive feedback through audio (446) and vibration (448) instead.
  • Referring now to FIG. 5, examples of different types of scenes are shown to indicate how various compositional models can be applied. Traditionally, sensing, analytics, composition, and direction have been manual processes. The selfie shown in (501) is a photo or video typically difficult to capture by the operator at his or her arm length and/or reliant on a front-facing camera so immediate visual feedback is provided. Absent extensive planning and rehearsal, an additional human photographer has previously been required to achieve well-composed scene capture as seen in (502) and (503). Compositions with small children (504) or groups (505) represent further examples of use cases that are traditionally difficult to achieve without a human camera operator, because of the number of subjects involved and the requirement that they be simultaneously directed into precise poses.
  • Additionally, sports-specific movements such as those in soccer (506) (goal keeper positioning, shoot on goal, or dribbling and juggling form) and activities like baseball (507) (batting, fielding, catching), martial arts (508), dance (509), or yoga (510) are traditionally difficult to self-capture as they require precise timing and the subject is preoccupied so visual feedback becomes impractical. Looking again at (506), the ball may only contact the athlete's foot for a short amount of time, so the window for capture is correspondingly brief. The existing state of the art to capture such images is to record high definition, high-speed video over the duration of the activity and generate stills afterward, often manually. This is inefficient and creates an additional burden to sift through potentially large amounts of undesired footage.
  • A method and system for integrating perpetual sensor inputs, real-time analytics capabilities, and layered compositional algorithms (discussed further in FIG. 6) provide a benefit to the user through the form of automatic direction and orchestration without the need for additional human operators. In one embodiment, sports teams' uniforms can contain a designated symbol for sensing specific individuals, or existing uniform numbers can be used with CV and analytics methods to identify participants using software. Once identified, the system can use these markers for both identification and editing to inform capture, as well as for direction and control of the subjects.
  • In another embodiment, the system can use the order of the images to infer a motion path and can direct participants in the scene according to a compositional model matched from a database. Or, the images provided can be inputted to the system as designated “capture points” (516) or moments to be marked if they occur in the scene organically. This type of system for autonomous capture is valuable because it simplifies the post-capture editing/highlighting process by reducing the amount of waste footage captured initially, as defined by the user.
  • In another embodiment, static scenes such as architectural photography (518) can also be translated from 2D to 3D. The method for recording models for interior (517) and exterior (518) landscapes by directing the human user holding the camera can standardize historically individually composed applications (for example in real estate appraisals, MLS listings, or promotional materials for hotels). Because the system is capable of self-direction and provides a method for repeatable, autonomous capture of high quality visual assets by sensing, analyzing, composing, and directing, the system allows professionals in the above-mentioned verticals to focus their efforts not on orchestrating the perfect shot but on storytelling.
  • In another embodiment, mounted cameras and sensors can provide information for Building Information Modeling (BIM) systems. Providing real-time monitoring and sensing allows events to be not only tagged but also directed and responded to, using models that provide more granularity than is traditionally available. In one embodiment, successful architectural components from existing structures can evolve into models that can inform new construction, direct building maintenance, identify how people are using the building (e.g., traffic maps), and can optimize HVAC or lighting, or adjust other environment settings.
  • As their ubiquity drives their cost down, cameras and sensors used for creating 3D building models will proliferate. Once a 3D model of a building has been captured (517), the precise measurements can be shared and made useful to other networked devices. As an example, the state of the art now is for each device to create its own siloes of information. Dyson's vacuum cleaner The Eye, for example, captures multiple 360 images each second on its way to mapping a plausible vacuuming route through a building's interior, but those images remain isolated and aren't synthesized into a richer understanding of the physical space. Following 3D space and markers using relative navigation of model parameters and attribute values is much more reliable and less costly, regardless of whether image sensing is involved.
  • In another embodiment, the system can pre-direct a 3D scene via a series of 2D images such as a traditional storyboard (515). This can be accomplished by sensing the content in the 2D image, transforming sensed 2D content into a 3D model of the scene, objects, and subjects, and ultimately assigning actors roles based on the subjects and objects they are to mimic. This transformation method allows for greater collaboration in film and television industries by enabling the possibility of productions where direction can be given to actors without the need for actors and directors to be in the same place at the same time, or speak a common language.
  • FIG. 6 shows the method of the invention, including the identification of foundational components including Objects, Subjects, Scene, Scape, and Equipment (601).
  • Once the capture process has been started (602), pre-sensed contexts and components (Object(s), Subject(s), Scene, Scape, Equipment) (601) are fed into the Sensing Module (603). Now both physical and virtual analytics such as computer vision (i.e., CV) can be applied in the Analytics Module (604) to make sense of scene components identified in the Sensing Module (603). And they can be mapped against composition models in the Composition/Architecture Module (605) so that in an embodiment, a subject can be scored for compliance against a known composition or pattern. Pre-existing models can be stored in a Database (600) that can hold application states and reference models, and those models can be applied at every step of this process. Once the analysis has taken place comparing sensed scenes to composed scenes, direction of the components of the scene can occur in the Direction/Control Module (606) up to and including control of robotic or computerized equipment. Other types of direction include touch-UI, voice-UI, display, control message events, sounds, vibrations, and notifications. Equipment can be similarly directed via the Direction/Control Module (606) to automatically and autonomously identify a particular subject (e.g., a baseball player) in conjunction with other pattern recognition (such as a hit, 507), allowing efficient capture of subsets in frame only. This can provide an intuitive way for a user to instruct the capture of a scene (e.g., begin recording when #22 steps up to the plate, and save all photos of his swing, if applicable).
  • The Sensing Module (603) can connect to the Analytics Module (604) and the Database (600), however the Composition/Architecture Module (605) and Direction/Control Module (606) can connect to the Analytics Module (604) and the Database (600) as shown in FIG. 6.
  • In another embodiment, the capability gained from pairing the system's Sensing Module (603) and Analytics Module (604) with its Composition/Architecture Module (605) and Direction/Control Module (606) allows for on-demand orchestration of potentially large numbers of people in a building, for example automatically directing occupants to safety during an emergency evacuation such as a fire. The Sensing Module (603) can make sense of inputs from sources including security cameras, proximity sensors such as those found in commercial lighting systems, and models stored in a database (600) (e.g., seating charts, blueprints, maintenance schematics) to create a 3D model of the scene and its subjects and objects. Next, the Analytics Module (604) can use layered CV algorithms such as background cancellation to deduce, for example, where motion is occurring. The Analytics Module (604) can also run facial and body recognition processes to identify human subjects in the scene, and can make use of ID badge reading hardware inputs to link sensed subjects to real-world identities. The Composition/Architecture Module (605) can provide the optimal choreography model for the evacuation, which can be captured organically during a previous during a fire drill at this location, or can be provided to the system in the form of an existing “best practice” for evacuation. All three modules (Sensing Module (603), Analytics Module (604), and Composition/Architecture Module (605)) can work in a feedback loop to process sensed inputs, make sense of them, and score them against the ideal compositional model for the evacuation. Additionally, the Direction/Control Module (606) can provide feedback to the evacuees using the methods and system described in FIG. 3 and FIG. 4. The Direction/Control Module (606) can also, for example, shut off the gas line to the building if it has been properly networked beforehand. Because the Sensing Module (603) is running continuously, the system is capable of sensing if occupants are not complying with the directions being given from the Direction/Control Module (606). The benefits of automatically synthesizing disparate inputs into one cohesive scene is also evident in this example of an emergency evacuation, as infrared camera inputs allow the system to identify human subjects using a combination of CV algorithms and allow the system to direct them to the correct evacuation points, even if the smoke is too thick for traditional security cameras to be effective, or the evacuation points are not visible. The Direction/Control Module (606) can also dynamically switch between different styles of feedback, for example if high ambient noise levels are detected during the evacuation, feedback can be switched from audio to visual or haptic.
  • FIG. 7 is a process flow for a process, method, and system for automatic orchestration, sensing, composition and direction of subjects, objects and equipment in a 3D space. Once started (700), any real world event (701) from a user pushing a button on the software UI to some specific event or message received by the application can begin the capture process and the Sensing Module (603). This sensing can be done by a single sensor for example infrared or sonic sensor device (702) or from a plurality of nodes in a network that could also include a combination of image sensing (or camera) nodes (703).
  • To protect subject privacy and provide high levels of trust in the system, traditional images are neither captured nor stored, and only obfuscated points clouds are recorded by the device (704). These obfuscated points clouds are less identifiable than traditional camera-captured images, and can be encrypted (704). In real-time as this data is captured at any number of nodes and types, either by a set of device local (e.g., smartphone) or by a cloud based service, a dynamic set of computer vision modules (i.e., CV) (705) and machine learning algorithms (ML) are included and reordered as they are applied to optimally identify the objects and subjects in a 3D or 2D space. An external to the invention “context system” (706) can concurrently provide additional efficiency or speed in correlating what's being sensed with prior composition and/or direction models. Depending on the results from the CV and on the specific use-case, the system can transform the space, subjects and objects into a 3D space with 2D, 2.5 D or 3D object and subject models (707).
  • In some use-cases, additional machine learning and heuristic algorithms (708) can be applied across the entire system and throughout processes and methods, for example to correlate the new space being sensed with most relevant composition and or direction models or to provide other applications outside of this application with analytics on this new data. The system utilizes both supervised and unsupervised machine learning in parallel and can run in the background to provide context (706) around, for example, what CV and ML methods were implemented most successfully. Supervised and unsupervised machine learning can also identify the leading practices associated with successful outcomes, where success can be determined by criteria from the user, or expert or social feedback, or publicly available success metrics. For performance, the application can cache in memory most relevant composition model(s) (710) for faster association with models related to sensing and direction. While monitoring and tracking the new stored sensed data (711), this can be converted and dynamically updated (712) into a new unique composition model if the pattern is unique, for example as determined automatically using statistical analysis, ML, or manually through a user/expert review interface.
  • In embodiments where a user is involved in the process, the application can provide continual audio, speech, vibration or visual direction to a user (715) or periodically send an event or message to an application on the same or other device on the network (716) (e.g., a second camera to begin capturing data). Direction can be sporadic or continuous, can be specific to humans or equipment, and can be given using the approaches and interfaces detailed in FIG. 3
  • As the application monitors the processing of the data, it utilizes a feedback loop (720) against the composition or direction model and will adjust parameters and loop back to (710) or inclusion of software components and update dynamically on a continuous basis (721). New composition models will be stored (722) whether detected by the software or defined by user or expert through a user interface (723). New and old composition models and corresponding data are managed and version controlled (724).
  • By analyzing the output from the Sensing Module (603), the system can dynamically and automatically utilize or recommend a relevant stored composition model (725) and direct users or any and all equipment or devices from this model. But in other use cases, the user can manually select a composition model from those previously stored (726).
  • From the composition model, the direction model (727) provides events, messages, and notifications, or control values to other subjects, applications, robots or hardware devices. Users and/or experts can provide additional feedback as to the effectiveness of a direction model (728), to validate, augment or improve existing direction models. These models and data are version controlled (729).
  • In many embodiments, throughout the process the system can sporadically or continuously provide direction (730), by visual UI, audio, voice, vibration to user(s) or control values by event or message to networked devices (731) (e.g., robotic camera dolly, quadcopter drone, pan and tilt robot, Wi-Fi-enabled GoPro®, etc.).
  • Each process throughout the system can utilize a continuous feedback loop as it monitors, tracks, and reviews sensor data against training set models (732). The process can continuously compute and loop back to (710) in the process flow and can end (733) on an event or message from external or internal application or input from a user/expert through a UI.
  • FIG. 8 is a process flow for the Sensing Module (603) of the system, which can be started (800) by a user through UI or voice command, or by sensing a pattern in the frame (801) or by an event in the application. A plurality of sensors capture data into memory (802) and through a combination of machine learning and computer vision sensing and recognition processing, entities, objects, subjects and scenes can be recognized (803). They also will identify most strongly correlated model to help make sense of the data patterns being sensed against (804) previously sensed models stored in a Database (600), via a feedback loop (815). In one embodiment, the image sensor (804) will be dynamically adjusted to improve the sensing precision, for example, separating a foreground object or subject from the background in terms of contrast. A reference object in either 2D or 3D can be loaded (805) to help constrain the CV and aid in recognition of objects in the scene. Using a reference object to constrain the CV helps the Sensing Module (603) ignore noise in the image including shadows and non-target subjects, as well as objects that might enter or exit the frame.
  • Other sensors can be used in parallel or serially to improve the context and quality of sensing (806). For example, collecting the transmitted geolocation positions from their wearable devices or smartphones of the subjects in an imaged space can help provide richer real-time sensing data to other parts of the system, such as the Composition Module (605). Throughout the processes, the entity, object and scene capture validation (807) is continuously evaluating what, and to what level of confidence, in the scene is being captured and what is recognized. This confidence level of recognition and tracking is enhanced as other devices and cameras are added to the network because their inputs and sensory capabilities can be shared and reused and their various screens and interface options can be used to provide rich feedback and direction (FIG. 3).
  • The sensing process might start over or move onto a plurality and dynamically ordered set of computer vision algorithm components (809) and/or machine learning algorithms components (810). In various embodiments, those components can include, for example, blob detection algorithms, edge detection operators such as Canny, and edge histogram descriptors. The CV components are always in a feedback loop (808) provided by previously stored leading practice models in the Database (600) and machine learning processes (811). In an embodiment, image sensing lens distortion (i.e., smartphone camera data) can be error corrected for barrel distortion and the gyroscope and compass can be used to understand the context of subject positions to a 3D space relative to camera angles (812). The system can generate 3D models from the device or networked service or obfuscated and/or encrypted point clouds (813). These point clouds or models also maintained in a feedback loop (814) with pre-existing leading practice models in the Database (600).
  • A broader set of analytics and machine learning can be run against all models and data (604). The Sensing Module (603) is described earlier in FIG. 6 and a more detailed process flow is outlined here in FIG. 8. As powerful hardware is commercialized and further capabilities are unlocked via APIs, the system can correlate and analyze the increased sensor information to augment the Sensing Module (603) and provide greater precision and measurement of a scene.
  • FIG. 9 is a diagram of the architecture for the system (950) according to one or more embodiments. In an on-premise embodiment, the Processing Unit (900) is comprised of the Sensing Module (603), Analytics Module (604), Composition/Architecture Module (605), and Direction/Control Module (606), can be connected to a processor (901) and a device local database (600) or created in any other computer medium and connected to through a network (902) including being routed by a software defined network (i.e., SDN) (911). The Processing Unit (900) can also be connected to an off-premise service for greater scale, performance and context by network SDN (912). This processing capability service cloud or data center might be connected by SDN (913) to a distributed file system (910) (e.g., HDFS with Spark or Hadoop), a plurality of service side databases (600) or cloud computing platform (909). In one or more embodiments, the Processing Unit can be coupled to a processor inside a host data processing system (903) (e.g., a remote server or local server) through a wired interface and/or a wireless network interface. In another embodiment, processing can be done on distributed devices for use cases requiring real-time performance (e.g., CV for capturing a subject's position) and that processing can be correlated with other processing throughout the service (e.g., other subject's positioning in the scene).
  • FIG. 10 shows examples of iconic posing and professional compositions, including both stereotypical model poses (1000) and famous celebrity poses such as Marilyn Monroe (1001). These existing compositions can be provided to the system by the user and can be subsequently understood by the system, such that subjects can then be auto-directed to pose relative to a scene that optimally reproduces these compositions, with feedback given in real-time as the system determines all precise spatial orientation and compliance with the model.
  • In one embodiment, a solo subject can also be directed to pose in the style of professional models (1002), incorporating architectural features such as walls and with special attention given to precise hand, arm, leg placement and positioning even when no specific image is providing sole compositional guidance or reference. To achieve this, the system can synthesize multiple desirable compositions from a database (600) into one composite reference composition model. The system also provides the ability to ingest existing 2D art (1006) which is then transformed into a 3D model used to auto-direct composition and can act as a proxy for the types of scene attributes a user might be able to recognize but not articulate or manually program.
  • In another embodiment, groups of subjects can be automatically directed to pose and positioned so that hierarchy and status are conveyed (1010). This can be achieved using the same image synthesis method and system as in (1002), and by directing each subject individually and while posing them relative to each other to ensure compliance with the reference model. The system's simultaneous direction of multiple subjects in frame can dramatically shorten the time required to achieve a quality composition. Whereas previously a family (1005) would have used time-delay and extensive back-and-forth positioning or enlisted a professional human photographer, now the system is able to direct them and reliably execute the ideal photograph at the right time and using ubiquitous hardware they already own (e.g., smartphones). The system is able to make use of facial recognition (1007) to deliver specific direction to each participant, in this embodiment achieving optimal positioning of the child's arm (1008,1009). In another embodiment, the system is able to direct a kiss (1003) using the Sensing Module (603), Analytics Module (604), Composition/Architecture Model (605), and Direction/Control Module (606) and the method described in FIG. 7 to ensure both participants are in compliance with the choreography model throughout the activity. The system is also able to make use of sensed behaviors as triggers for other events, so that in one embodiment a dancer's movements can be used as inputs to direct the composition of live music, or in another embodiment specific choreography can be used to control the lighting of an event. This allows experts or professionals to create models to be emulated by others (e.g., for instruction or entertainment).
  • FIG. 11 is provided as an example of a consumer-facing UI for the system that would all for assignment of models to scenes (1103), and of roles to subjects (1100) and objects. Virtual subject identifiers can be superimposed over a visual representation of the scene (1101) to provide auto-linkage from group to composition, and allows for intuitive dragging and reassignments (1105). Sensed subjects, once assigned, can be linked to complex profile information (1104) including LinkedIn, Facebook, or various proprietary corporate LDAP or organizational hierarchy information. Once identified, subjects can be directed simultaneously and individually by the system, through the interfaces described in FIG. 3.
  • In scenarios where distinguishing between subjects is difficult (poor light, similar clothing, camouflage in nature) stickers or other markers can be attached to the real-world subjects and tagged in this manner. Imagine a distinguishing sticker placed on each of the five subjects (901) and helping to keep them correctly identified. These stickers or markers can be any sufficiently differentiated pattern (including stripes, dots, solid colors, text) and can be any material, including simple paper and adhesive, allowing them to come packaged in the magazine insert from FIG. 1 (105) as well.
  • FIG. 12 provides further examples of compositions difficult to achieve traditionally, in this case because of objects or entities relative to the landscape of the scene. Nature photography in particular poses a challenge due to the uncontrollable lighting on natural features such as mountains in the background versus the subject in the foreground (1200). Using the interface described in FIG. 11, users are able to create rules or conditions to govern the capture process and achieve the ideal composition with minimal waste and excess. Those rules can be used to suggest alternate compositions or directions if the desired outcome is determined to be unattainable, for example because of weather. Additionally, existing photographs (1201) can be captured by the system, as a method of creating a reference model. In one embodiment, auto-sensing capabilities described in FIG. 8 combined with compositional analysis and geolocation data can deliver specific user-defined outcomes such as a self-portrait facing away from the camera, executed when no one else is in the frame and the clouds are over the trees (1202). In another embodiment, the composition model is able to direct a subject to stand in front of a less visually “busy” section of the building (1203).
  • Much of the specific location information the system makes use of to inform the composition and direction decisions is embodied in a location model, as described in FIG. 13. Representing specific geolocations (1305), each pin (1306) provides composition and direction for camera settings and controls, positioning, camera angles (1302), architectural features, lighting, and traffic in the scene. This information is synthesized and can be presented to the user in such a way that the compositional process is easy to understand and highly automated, while delivering high quality capture of a scene. For example, consider a typical tourist destination that can be ideally composed (1307) involving the Arc de Triomphe. The system is able to synthesize a wide range of information (including lighting and shadows depending on date/time, weather, expected crowd sizes, ratings of comparable iterations of this photo taken previously) which it uses to suggest desirable compositions and execute them with precision and reliability, resulting in a pleasant and stress-free experience for the user.
  • FIG. 14 is a representation of computer vision and simple models informing composition. A building's exterior (1401) invokes a perspective model (1402) automatically to influence composition through a CV process and analytics of images during framing of architectural photography. The lines in the model (1402) can communicate ideal perspective to direct perspective, depth, and other compositional qualities, to produce emotional effects in architectural photography applications such as real estate listings.
  • Referring now to FIG. 15A, the system can make use of a smartphone or camera-equipped aerial drone (1500) to perform surveillance and visual inspection of traditionally difficult or dangerous-to-inspect structures such as bridges. Using 3D-constrained CV to navigate and control the drone autonomously and more precisely than traditional GPS waypoints, the system can make use of appropriate reference models for weld inspections, corrosion checks, and insurance estimates and damage appraisals. Relative navigation based on models of real-world structures (1502) provides greater flexibility and accuracy when directing equipment when compared to existing methods such as GPS. Because the system can make use of the Sensing Module (603) it is able to interpret nested and hierarchical instructions such as “search for corrosion on the underside of each support beam.” FIG. 15B depicts an active construction site, where a drone can provide instant inspections that are precise and available 24/7. A human inspector can monitor the video and sensory feed or take control from the system if desired, or the system is able to autonomously control the drone, recognizing and scoring the construction site's sensed environment for compliance based on existing models (e.g., local building codes). Other BIM (Building Information Management) applications include persistent monitoring and reporting as well as responsive structures that react to sensed changes in their environment, for example a window washing system that uses constrained CV to monitor only the exposed panes of glass in a building and can intelligently sense the need for cleaning in a specific location and coordinate an appropriate equipment response, autonomously and without human intervention.
  • Human subjects (1600) can be deconstructed similarly to buildings, as seen in FIG. 16. Beginning with a close and precise measurement of the subject's body (1601) which can be abstracted into, for example, a point cloud (1602), composite core rigging (1603) can then be applied such that a new composite reference core or base NURB 3D model is created (1604). This deconstruction, atomization, and reconstruction of subjects allows for precision modeling and the fusing of real and virtual worlds.
  • In one embodiment, such as a body measurement application for Body Mass Index or other health use-case, fitness application, garment fit or virtual fitting application, a simpler representation (1605) might be created and stored at the device for user interface or in a social site's datacenters. This obfuscates the subject's body to protect their privacy or mask their vivid body model to protect any privacy or social “body image” concerns. Furthermore, data encryption and hash processing of these images can also be automatically applied in the application on the user's device and throughout the service to protect user privacy and security.
  • Depending on the output from the Sensing Module (603), the system can either create a new composition model for the Database (600), or select a composition model based on attributes deemed most appropriate for composition: body type, size, shape, height, arm position, face position. Further precise composition body models can be created for precise direction applications in photo, film, theater, musical performance, dance, yoga.
  • FIG. 17 catalogues some of the models and/or data that can be stored centrally in a database (600) available to all methods and processes throughout the system, to facilitate a universal scoring approach for all items. In one example, models for best practices for shooting a close up movie scene (1702) are stored and include such items as camera angles, out of focus affects, aperture and exposure settings, depth of field, lighting equipment types with positions and settings, dolly and boom positions relative to the subject (i.e., actor), where “extras” should stand in the scene and entrance timing, set composition. By sensing and understanding the subjects and contexts of a scene over time via those models, film equipment can be directed to react in context with the entire space. An example is a networked system of camera dollies, mic booms, and lighting equipment on a film set that identifies actors in a scene and automatically cooperates with other networked equipment to provide optimal composition dynamically and in real-time, freeing the human director to focus on storytelling.
  • The Database (600) can also hold 2D images of individuals and contextualized body theory models (1707), 3D models of individuals (1705), and 2D and 3D models of clothing (1704), allowing the system to score and correlate between models. In one embodiment, the system can select an appropriate suit for someone it senses is tall and thin (1705) by considering the body theory and fashion models (1707) as well as clothing attributes (1704) such as the intended fit profile or the number of buttons.
  • The system can keep these models and their individual components correlated to social feedback (1703) such as Facebook, YouTube, Instagram, or Twitter using metrics such as views, likes, or changes in followers and subscribers. By connecting the system to a number of social applications, a number of use cases could directly provide context and social proof around identified individuals in a play or movie, from the overall composition cinematography of a scene in a play, music recital, movie or sports event to how well-received a personal image (501) or group image or video was (1101). This also continuously provides a method and process for tuning best practice models of all types of compositions from photography, painting, movies, skiing, mountain biking, surfing, competitive sports, exercises, yoga poses (510), dance, music, performances.
  • All of these composition models can also be analyzed for trends from social popularity, from fashion, to popular dance moves and latest form alterations to yoga or fitness exercises. In one example use case, a camera (202) and broad spectrum of hardware (1706), such as lights, robotic camera booms or dollies, autonomous quadcopters, could be evaluated individually, or as part of the overall composition including such items as lights, dolly movements, camera with its multitude of settings and attributes.
  • Referring now to FIG. 18, in one embodiment the system can facilitate the learning an instrument through the provision of real-time feedback. 3D models of an instrument, for example a guitar fretboard model, can be synthesized and used to constrain the CV algorithms so that only the fingers and relevant sections of the instrument (e.g., frets for guitars, keys for pianos, heads for drums) are being analyzed. Using the subject assignment interface from FIG. 11, each finger can be assigned a marker so that specific feedback can be provided to the user (e.g., “place 2nd finger on the A string on the 2nd fret.”) in a format that is understandable and useful to them. While there are many different ways to learn guitar, no other system looks at the proper hand (1802) and body (1800) position. Because the capture device (110) can be networked with other devices, instruction can be given holistically and complex behaviors and patterns such as rhythm and pick/strum technique (1805) can be analyzed effectively. Models can be created to inform behaviors varying from proper bow technique for violin to proper posture when playing keyboard. In one embodiment, advanced composition models and challenge models can be loaded into the database, making the system useful not just for beginners but anyone looking to improve their practice regimens. These models can be used as part of a curriculum to instruct, test and certify music students remotely. As with FIG. 15, a human expert can monitor the process and provide input or the sensing, analyzing, composing and directing can be completely autonomous. In another embodiment, renditions and covers of existing songs can also be scored and compared against the original and other covers, providing a video-game like experience but with fewer hardware requirements and greater freedom.
  • FIG. 19 shows an example of a golf swing (1901) to illustrate the potential of a database of models. Once the swing has been scanned, with that pre-modeled club or putter, that model is available for immediate application and stored in a Database (600). And a plurality of sensed movements can be synthesized into one, so that leading practice golf swings are sufficiently documented. Once stored, the models can be converted to compositional models, so that analysis and comparison can take place between the sensed movements and stored compositional swing, and direction and feedback can be given to the user (1902, 1903).
  • FIG. 20 is a systematic view of an integrated system for Composition and Orchestration of a 3D or 2.5D space illustrating communication between user and their devices to server through a network (902) or SDN (911, 912, 913), according to one embodiment. In one embodiment a user or multiple users can connect to the Processing Unit (900) that hosts the composition event. In another embodiment, the user hardware such as a sensor (2001), TV (2003), camera (2004), mobile device such as a tablet or smartphone etc. (2005), wearable (2006), server (2007), laptop (2008) or desktop computer (2009) or any wireless device, or any electronic device can communicate directly with other devices in a network or to the devices of specific users (2002, 902). For example, in one embodiment the orchestration system might privately send unique positions and directions to four separate devices (e.g., watch, smartphone, quadcopter (1706), and an internet-connected TV) in quickly composing high-quality and repeatable photographs of actors and fans at a meet-and-greet event.

Claims (7)

What is claimed is:
1. A method, comprising:
Capturing a 2D image in a specific format of an object, subject, and scene using a device;
Sensing an object, subject, and scene automatically and continuously using the device;
Analyzing the 2D image of the object, subject, and scene captured to determine the most relevant composition and direction model;
Transforming an object, subject, and scene into a 3D model using existing reference composition/architecture model; and
Storing the 3D model of the scene in a database for use and maintaining it in a feedback loop.
2. The method of claim 1, further comprising:
Performing continuous contextual analysis of an image and its resulting 3D model to provide an update to subsequent 3D modeling processes; and
Dynamically updating and responding to contextual analytics performed.
3. The method of claim 2, further comprising:
Coordinating accurate tracking of objects and subjects in a scene by orchestrating autonomous equipment movements using a feedback loop.
4. The method of claim 3, further comprising:
Controlling the direction of a scene and its subjects via devices using a feedback loop
5. The method of claim 4, further comprising:
Creating and dynamically modifying in real-time the 2D or 3D model for the subject, object, scene, and equipment in any spatial orientation and Providing immediate feedback in a user interface.
6. The method of claim 1, wherein the device is at least one of a camera, wearable device, desktop computer, laptop computer, phone, tablet, and other mobile computer.
7. A system, comprising:
A processing unit that can exist on a user device, on-premise, or as an off-premise service to house the following modules;
A sensing module that can understand the subjects and context of a scene over time via models;
An analytics module that can analyze sensed scenes and subjects to determine the most relevant composition and direction models or create them if necessary;
A composition/architecture module that can simultaneously store the direction of multiple subjects or objects of a scene according to one or more composition models;
A direction/control module that can provide direction and control to each subject, object, and equipment individually and relative to a scene model; and
A database that can store models for use and maintain them in a feedback loop with the above modules.
US14/858,901 2014-09-19 2015-09-18 Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment Abandoned US20160088286A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US201462053055P true 2014-09-19 2014-09-19
US14/858,901 US20160088286A1 (en) 2014-09-19 2015-09-18 Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
PCT/US2015/051041 WO2016044778A1 (en) 2014-09-19 2015-09-18 Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment
US14/858,901 US20160088286A1 (en) 2014-09-19 2015-09-18 Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment

Publications (1)

Publication Number Publication Date
US20160088286A1 true US20160088286A1 (en) 2016-03-24

Family

ID=55525935

Family Applications (3)

Application Number Title Priority Date Filing Date
US14/717,805 Active 2036-10-25 US10489407B2 (en) 2014-09-19 2015-05-20 Dynamic modifications of results for search interfaces
US14/858,901 Abandoned US20160088286A1 (en) 2014-09-19 2015-09-18 Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment
US16/531,929 Pending US20190354532A1 (en) 2014-09-19 2019-08-05 Dynamic modifications of results for search interfaces

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US14/717,805 Active 2036-10-25 US10489407B2 (en) 2014-09-19 2015-05-20 Dynamic modifications of results for search interfaces

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/531,929 Pending US20190354532A1 (en) 2014-09-19 2019-08-05 Dynamic modifications of results for search interfaces

Country Status (2)

Country Link
US (3) US10489407B2 (en)
WO (1) WO2016044778A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160077598A1 (en) * 2015-11-19 2016-03-17 International Business Machines Corporation Client device motion control via a video feed
US20160282872A1 (en) * 2015-03-25 2016-09-29 Yokogawa Electric Corporation System and method of monitoring an industrial plant
US20170280130A1 (en) * 2016-03-25 2017-09-28 Microsoft Technology Licensing, Llc 2d video analysis for 3d modeling
US20170351900A1 (en) * 2016-06-02 2017-12-07 Samsung Electronics Co., Ltd. Electronic apparatus and operating method thereof
CN108924753A (en) * 2017-04-05 2018-11-30 意法半导体(鲁塞)公司 The method and apparatus of real-time detection for scene
US10168700B2 (en) * 2016-02-11 2019-01-01 International Business Machines Corporation Control of an aerial drone using recognized gestures
US10410289B1 (en) 2014-09-22 2019-09-10 State Farm Mutual Automobile Insurance Company Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVS)
US10607406B2 (en) 2018-01-25 2020-03-31 General Electric Company Automated and adaptive three-dimensional robotic site surveying
US10621744B1 (en) * 2015-12-11 2020-04-14 State Farm Mutual Automobile Insurance Company Structural characteristic extraction from 3D images
US10655968B2 (en) 2017-01-10 2020-05-19 Alarm.Com Incorporated Emergency drone guidance device
US10706573B1 (en) 2019-03-13 2020-07-07 State Farm Mutual Automobile Insurance Company Structural characteristic extraction from 3D images

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9524520B2 (en) * 2013-04-30 2016-12-20 Wal-Mart Stores, Inc. Training a classification model to predict categories
US9524319B2 (en) 2013-04-30 2016-12-20 Wal-Mart Stores, Inc. Search relevance
JP5891339B1 (en) * 2015-10-09 2016-03-22 楽天株式会社 Information processing apparatus, information processing method, and information processing program
US10482146B2 (en) * 2016-05-10 2019-11-19 Massachusetts Institute Of Technology Systems and methods for automatic customization of content filtering
US10509459B2 (en) 2016-05-19 2019-12-17 Scenera, Inc. Scene-based sensor networks
US10412291B2 (en) 2016-05-19 2019-09-10 Scenera, Inc. Intelligent interface for interchangeable sensors
US10693843B2 (en) 2016-09-02 2020-06-23 Scenera, Inc. Security for scene-based sensor networks
CN107133280A (en) * 2017-04-14 2017-09-05 合信息技术(北京)有限公司 The response method and device of feedback

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120050524A1 (en) * 2010-08-25 2012-03-01 Lakeside Labs Gmbh Apparatus and method for generating an overview image of a plurality of images using an accuracy information
US20120050525A1 (en) * 2010-08-25 2012-03-01 Lakeside Labs Gmbh Apparatus and method for generating an overview image of a plurality of images using a reference plane
US20140340427A1 (en) * 2012-01-18 2014-11-20 Logos Technologies Llc Method, device, and system for computing a spherical projection image based on two-dimensional images
US9094670B1 (en) * 2012-09-25 2015-07-28 Amazon Technologies, Inc. Model generation and database

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6489968B1 (en) * 1999-11-18 2002-12-03 Amazon.Com, Inc. System and method for exposing popular categories of browse tree
US20050149458A1 (en) * 2002-02-27 2005-07-07 Digonex Technologies, Inc. Dynamic pricing system with graphical user interface
US7577665B2 (en) * 2005-09-14 2009-08-18 Jumptap, Inc. User characteristic influenced search results
US20120173365A1 (en) * 2005-09-14 2012-07-05 Adam Soroca System for retrieving mobile communication facility user data from a plurality of providers
US8078607B2 (en) * 2006-03-30 2011-12-13 Google Inc. Generating website profiles based on queries from webistes and user activities on the search results
WO2012030678A2 (en) * 2010-08-30 2012-03-08 Tunipop, Inc. Techniques for facilitating on-line electronic commerce transactions relating to the sale of goods and merchandise
EP2584477A4 (en) * 2011-03-30 2015-04-08 Rakuten Inc Information provision device, information provision method, information provision program, information display device, information display method, information display program, information retrieval system, and recording medium
EP2600316A1 (en) * 2011-11-29 2013-06-05 Inria Institut National de Recherche en Informatique et en Automatique Method, system and software program for shooting and editing a film comprising at least one image of a 3D computer-generated animation
WO2014144173A1 (en) * 2013-03-15 2014-09-18 Nike, Inc. Product presentation assisted by visual search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120050524A1 (en) * 2010-08-25 2012-03-01 Lakeside Labs Gmbh Apparatus and method for generating an overview image of a plurality of images using an accuracy information
US20120050525A1 (en) * 2010-08-25 2012-03-01 Lakeside Labs Gmbh Apparatus and method for generating an overview image of a plurality of images using a reference plane
US20140340427A1 (en) * 2012-01-18 2014-11-20 Logos Technologies Llc Method, device, and system for computing a spherical projection image based on two-dimensional images
US9094670B1 (en) * 2012-09-25 2015-07-28 Amazon Technologies, Inc. Model generation and database

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10410289B1 (en) 2014-09-22 2019-09-10 State Farm Mutual Automobile Insurance Company Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVS)
US10685404B1 (en) 2014-09-22 2020-06-16 State Farm Mutual Automobile Insurance Company Loss mitigation implementing unmanned aerial vehicles (UAVs)
US10650469B1 (en) 2014-09-22 2020-05-12 State Farm Mutual Automobile Insurance Company Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVs)
US10535103B1 (en) 2014-09-22 2020-01-14 State Farm Mutual Automobile Insurance Company Systems and methods of utilizing unmanned vehicles to detect insurance claim buildup
US20160282872A1 (en) * 2015-03-25 2016-09-29 Yokogawa Electric Corporation System and method of monitoring an industrial plant
US9845164B2 (en) * 2015-03-25 2017-12-19 Yokogawa Electric Corporation System and method of monitoring an industrial plant
US10353473B2 (en) * 2015-11-19 2019-07-16 International Business Machines Corporation Client device motion control via a video feed
US20160077598A1 (en) * 2015-11-19 2016-03-17 International Business Machines Corporation Client device motion control via a video feed
US10621744B1 (en) * 2015-12-11 2020-04-14 State Farm Mutual Automobile Insurance Company Structural characteristic extraction from 3D images
US10168700B2 (en) * 2016-02-11 2019-01-01 International Business Machines Corporation Control of an aerial drone using recognized gestures
US20170280130A1 (en) * 2016-03-25 2017-09-28 Microsoft Technology Licensing, Llc 2d video analysis for 3d modeling
US10635902B2 (en) * 2016-06-02 2020-04-28 Samsung Electronics Co., Ltd. Electronic apparatus and operating method thereof
US20170351900A1 (en) * 2016-06-02 2017-12-07 Samsung Electronics Co., Ltd. Electronic apparatus and operating method thereof
US10655968B2 (en) 2017-01-10 2020-05-19 Alarm.Com Incorporated Emergency drone guidance device
CN108924753A (en) * 2017-04-05 2018-11-30 意法半导体(鲁塞)公司 The method and apparatus of real-time detection for scene
US10607406B2 (en) 2018-01-25 2020-03-31 General Electric Company Automated and adaptive three-dimensional robotic site surveying
US10706573B1 (en) 2019-03-13 2020-07-07 State Farm Mutual Automobile Insurance Company Structural characteristic extraction from 3D images

Also Published As

Publication number Publication date
US10489407B2 (en) 2019-11-26
US20160085813A1 (en) 2016-03-24
WO2016044778A1 (en) 2016-03-24
US20190354532A1 (en) 2019-11-21

Similar Documents

Publication Publication Date Title
US9767524B2 (en) Interaction with virtual objects causing change of legal status
US10679676B2 (en) Automatic generation of video and directional audio from spherical content
US9965237B2 (en) Methods, systems and processor-readable media for bidirectional communications and data sharing
JP2019139781A (en) System and method for augmented and virtual reality
CN105264460B (en) Hologram object is fed back
US9626103B2 (en) Systems and methods for identifying media portions of interest
JP6321150B2 (en) 3D gameplay sharing
US20180095637A1 (en) Controls and Interfaces for User Interactions in Virtual Spaces
US20160358383A1 (en) Systems and methods for augmented reality-based remote collaboration
US10334158B2 (en) Autonomous media capturing
US8768141B2 (en) Video camera band and system
RU2621633C2 (en) System and method for augmented and virtual reality
US9253440B2 (en) Augmenting a video conference
US9142062B2 (en) Selective hand occlusion over virtual projections onto physical surfaces using skeletal tracking
US10586469B2 (en) Training using virtual reality
US20180143976A1 (en) Depth Sensing Camera Glasses With Gesture Interface
CN103905809B (en) Message processing device and recording medium
KR20150099401A (en) Control of enhanced communication between remote participants using augmented and virtual reality
CN104781849B (en) Monocular vision positions the fast initialization with building figure (SLAM) simultaneously
JP6558587B2 (en) Information processing apparatus, display apparatus, information processing method, program, and information processing system
US10536683B2 (en) System and method for presenting and viewing a spherical video segment
US20170318274A9 (en) Surround video playback
US20180096507A1 (en) Controls and Interfaces for User Interactions in Virtual Spaces
CN104620522B (en) User interest is determined by detected body marker
US20140344408A1 (en) Recognition system for sharing information

Legal Events

Date Code Title Description
AS Assignment

Owner name: FORSYTHE, HAMISH, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FORSYTHE, HAMISH;CECIL, ALEXANDER;REEL/FRAME:036640/0606

Effective date: 20150917

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION