US20160088286A1 - Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment - Google Patents
Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment Download PDFInfo
- Publication number
- US20160088286A1 US20160088286A1 US14/858,901 US201514858901A US2016088286A1 US 20160088286 A1 US20160088286 A1 US 20160088286A1 US 201514858901 A US201514858901 A US 201514858901A US 2016088286 A1 US2016088286 A1 US 2016088286A1
- Authority
- US
- United States
- Prior art keywords
- scene
- composition
- subjects
- models
- subject
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- H04N13/026—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24575—Query processing with adaptation to user needs using context
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/0601—Electronic shopping [e-shopping]
- G06Q30/0623—Item investigation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/06—Buying, selling or leasing transactions
- G06Q30/08—Auctions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/10—Geometric effects
- G06T15/20—Perspective computation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/20—Scenes; Scene-specific elements in augmented reality scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/60—Static or dynamic means for assisting the user to position a body part for biometric acquisition
- G06V40/67—Static or dynamic means for assisting the user to position a body part for biometric acquisition by interactive indications to the user
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/261—Image signal generators with monoscopic-to-stereoscopic image conversion
Definitions
- a method and system for automatically sensing using photographic equipment that captures a 3D space, scene, subject, object, and equipment for further analysis, composition and direction that can be used for creating visual design.
- Computer device hardware and software continue to advance in sophistication. Cameras, micro controllers, computer processors (e.g., ARM), and smartphones have become more capable, as well as smaller, cheaper, and ubiquitous. In parallel, more sophisticated algorithms including computer vision, machine learning and 3D models can be computed in real-time or near real-time on a smartphone or distributed over a plurality of devices over a network.
- Cameras, micro controllers, computer processors (e.g., ARM) have become more capable, as well as smaller, cheaper, and ubiquitous.
- more sophisticated algorithms including computer vision, machine learning and 3D models can be computed in real-time or near real-time on a smartphone or distributed over a plurality of devices over a network.
- the present invention is in the areas of sensing, analytics, direction, and composition of 3D spaces. It provides a dynamic real-time approach to sense, recognize, and analyze objects of interest in a scene; applies a composition model that automatically incorporates best practices from prior art as models, for example: photography, choreography, cinematography, art exhibition, and live sports events; and directs subjects and equipment in the scene to achieve the desired outcome.
- a high-quality professional-style recording is being composed using the method and system.
- traditional and ubiquitous image capture equipment can now be enabled with microcontrollers and/or sensor nodes in a network to synthesize numerous compositional inputs and communicate real-time directions to subjects and equipment using a combination of sensory (e.g., visual, audio, vibration) feedback and control messages, it becomes significantly easier to get a high-quality output on one's own. If there are multiple people or subjects who need to be posed precisely, each subject can receive personalized direction to ensure their optimal positioning relative to the scene around them.
- real-world scenes are captured using sensor data and translated into 2D, 2.5 D and 3D models in real-time using a method such that continuous spatial sensing, recognition, composition, and direction is possible without requiring additional human judgment or interaction with the equipment and/or scene.
- image processing, image filtering, video analysis motion, background subtraction, object tracking, pose, stereo correspondence, and 3D reconstruction are run perpetually to provide optimal orchestration of subjects and equipment in the scene without a human operator.
- subjects can be tagged explicitly by a user, or determined automatically by the system. If desired, subjects can be tracked or kept in frame over time and as they move throughout a scene, without further user interaction with the system.
- the subject(s) can also be automatically directed through sensory feedback (e.g., audio, visual, vibration) or any other user interface.
- an event begins the process of capturing the scene.
- the event can be an explicit hardware action such as pressing a shutter button or activating a remote control for the camera, or the event can be determined via software, a real world event, message or notification symbol; for example recognizing the subject waving their arms, a hand gesture or an object, a symbol, or identified subject or entity entering a predetermined area in the scene.
- the system allows for the identification of multiple sensory event types, including physical-world events (object entering/exiting the frame, a sunrise, a change in the lighting of the scene, the sound of someone's voice, etc.) and software-defined events (state changes, timers, sensor-based).
- a video recording is initiated when a golfer settles into her stance and aims her head down, and the camera automatically adjusts to keep her moving club in the frame during her backswing before activating burst mode so as to best capture the moment of impact with the ball during her downswing before pausing the recording seconds after the ball leaves the frame.
- Feedback can be further provided to improve her swing based on rules and constraints provided from an external golf professional, while measuring and scoring how well she complies with leading practice motion ranges.
- a video or camera scan can be voice or automatically initiated when the subject is inside the camera frame and monitor and direct them through a sequence of easy to understand body movements and steps with a combination of voice, lights and by simple mimic of on-screen poses as in a user interface or visual display.
- the subject could be practicing and precisely evaluating yoga poses, following a physical therapy program, or taking private body measurements.
- FIG. 1A , 1 B, 1 C, 1 D, 1 E show various methods for hands-free capture of a scene.
- FIG. 2A , 2 B, 2 C, 2 D, 2 E, 2 F, 2 G, 2 H, 2 I, 2 J illustrate a system of cameras that can be used to implement a method of sensing, analyzing, composing, and directing a scene.
- FIG. 3 shows examples of interfaces, inputs, and outputs to direct subjects in a scene.
- FIG. 4 shows further examples of interface when capturing close-up scenes.
- FIG. 5 shows examples of selfies, groupies, and other typical applications.
- FIG. 6 is a diagram of the Sensing Module, Analytics Module, Composition/Architecture Module, and Direction/Control Module.
- FIG. 7 is a diagram of the system's algorithm and high-level process flow.
- FIG. 8 is a detailed look at the Sensing Module from FIG. 6 .
- FIG. 9 is a high level view of the system architecture for on-premise and cloud embodiments.
- FIG. 10 illustrates various iconic and familiar compositions and reference poses.
- FIG. 11 shows an interface for choosing a composition model and assigning objects or subjects for direction.
- FIG. 12 shows further examples of compositions that can be directed.
- FIG. 13 shows an example interface for using data attached to specific geolocations, as well as an example use case.
- FIG. 14 shows how computer vision can influence composition model selection.
- FIGS. 15A and 15B show examples of Building Information Management (BIM) applications.
- BIM Building Information Management
- FIG. 16 shows how a collection of images and file types can be constructed and deconstructed into sub-components including 3D aggregate models and hashed files for protecting user privacy across system from device to network and cloud service.
- FIG. 17 shows types of inputs that inform the Models from FIG. 6
- FIG. 18 shows a method for virtual instruction to teach how to play music
- FIG. 19 is an example of how a Model can apply to Sensed data.
- FIG. 20 shows example connections to the network and to the Processing Unit.
- the present invention enables real-time sensing, spatial composition, and direction for objects, subjects, scenes, and equipment in 2D, 2.5D or 3D models in a 3D space.
- a smartphone will be used for both its ubiquity and the combination of cameras, sensors, and interface options.
- FIG. 1A shows how such a cell phone ( 110 ) can be positioned to provide hands-free capture of a scene. This can be achieved using supplemental stands different from traditional tripods designed for non-phone cameras.
- FIG. 1C shows a stand can be either foldable ( 101 ) or rigid ( 102 ) so long as it holds the sensors on the phone in a stable position.
- a braced style of stand ( 103 ) like the one shown in FIG. 1E can also be used.
- the stand can be made of any combination of materials, so long as the stand is sufficiently tall and wide as to support the weight of the capturing device ( 110 ) and hold it securely in place.
- the self-assembled stand ( 101 ) can be fashioned from materials included as a branded or unbranded removable insert ( 105 ) in a magazine or other promotion ( 106 ) with labeling and tabs sufficient so that the user is able to remove the insert ( 105 ) and assemble it into a stand ( 101 ) without any tools. This shortens the time to initial use by an end-user by reducing the steps needed to position a device for proper capture of a scene.
- the effect of the stand can also be achieved using the angle of the wall/floor and the natural surface friction of a space.
- the angle of placement ( 107 ) is determined by the phone's ( 110 ) sensors and slippage can be detected by monitoring changes in those sensors.
- the angle of elevation can be extrapolated from the camera's lens ( 111 ), allowing for very wide capture of a scene when the phone is oriented in portrait mode.
- the system is now able to deliver precise measurements and calibrations of objects in a scene. This precision could be used, for example, to capture a user's measurements and position using only one capture device ( 110 ) instead of multiple.
- adhesive or magnets ( 120 ) can be used to secure the capture device ( 110 ) and prevent it from falling.
- the capture device can also be placed in a case ( 122 ) such that the case can then be mounted via hooks, adhesives, magnets, or other ubiquitous fasteners ( FIG. 1B ). This allows for easy removal of the device ( 110 ) without compromising or disturbing the capture location ( 121 ).
- FIG. 2A , 2 B, 2 C, 2 D, 2 E, 2 F, 2 G, 2 H, 2 I, 2 J various devices can be orchestrated into an ensemble to capture a scene.
- Existing capture device types can be positioned and networked to provide an optimal awareness of the scene. Examples include: cameras ( 202 ), wearable computers such as the Apple Watch, Google Glass, or FitBit ( FIG. 2B ), pan/tilt cameras such as those found in webcams/security cameras ( FIG. 2C ), mobile devices such as smartphones or tablets ( FIG. 2D ) equipped with front and rear-facing cameras (including advanced models with body-tracking sensors and fast-focus systems such as in the LG G3), traditional digital cameras ( FIG.
- FIG. 2E laptops with integrated webcams ( FIG. 2F ), depth cameras or thermal sensors ( FIG. 2G ) like those found in the Xbox Kinect hardware, dedicated video cameras ( FIG. 2H ), and autonomous equipment with only cameras attached ( FIG. 2I ) or autonomous equipment with sensors ( FIG. 2J ) such as sonar sensors, or infrared, or laser, or thermal imaging technology.
- FIG. 2D Advances in hardware/software coupling on smartphones further extend the applicability of the system and provide opportunities for a better user experience when capturing a scene because ubiquitous smartphones and tablets ( FIG. 2D ) can increasingly be used instead of traditionally expensive video cameras ( FIG. 2E , FIG. 2H ).
- a device ( 110 ) can be mounted on a door or wall to capture the scene.
- the door allows panning of the scene by incorporating the known fixed-plane movement of the door.
- the versatility afforded by the mounts and stands allows for multiple placement options for capturing devices, which in turn allows for greater precision and flexibility when sensing, analyzing, composing, and directing a subject in a 3D space.
- subjects ( 220 ) can then be directed via the system to match desired compositional models, according to various sensed orientations and positions. These include body alignment ( 225 ), arm placement ( 230 ), and head tilt angle ( 234 ). Additionally, the subject can be directed to rotate in place ( 235 ) or to change their physical location by either moving forward, backward, or laterally ( 240 ).
- Rotation ( 225 ) in conjunction with movement along a plane ( 240 ) also allows for medical observation, such as orthopedic evaluation of a user's gait or posture. While an established procedure exists today wherein trained professional humans evaluate gait, posture, and other attributes in-person, access to those professionals is limited and the quality and consistency of the evaluations is irregular.
- the invention addresses both shortcomings through a method and system that makes use of ubiquitous smartphones ( 110 ) and the precision and modularity of models. Another instance where networked sensors and cameras can replace a human professional is precise body measurement, previously achieved by visiting a quality tailor.
- the system is able to ensure with high accuracy that the subjects go through the correct sequences and the appropriate measurements are collected efficiently and with repeatable precision. Additionally, this method of dynamic and precise capture of a subject while sensing can be used to achieve positioning required for stereographic images with e.g., a single lens or sensor.
- FIG. 3 provides examples of interface possibilities to communicate feedback to the subjects and users.
- the capturing device ( 110 ) can relay feedback that is passed to subjects through audio tones ( 345 ), voice commands ( 346 ), visually via a screen ( 347 ), or using vibration ( 348 ).
- An example of such a feedback loop is shown as a top view looking down on the subject ( 220 ) as they move along the rotation path directed in ( 225 ) according to audio tones heard by the subject ( 349 ).
- the visual on-screen feedback ( 347 ) can take the form of a superimposed image of the subject's sensed position relative to the directed position in the scene ( 350 ).
- the positions are represented as avatars, allowing human subjects to naturally mimic and achieve the desired position by aligning the two avatars ( 350 ).
- Real-time visual feedback is possible because the feedback-providing device ( 110 ) is networked ( 351 ) to all other sensing devices ( 352 ), allowing for synthesis and scoring of varied location and position inputs and providing a precise awareness of the scene's spatial composition (this method and system is discussed further in FIG. 8 ).
- One example of additional sensing data that can be networked is imagery of an infrared camera ( 360 ).
- Wi-Fi-enabled GoPro®-style action cameras 202
- wearable technologies such as a smart watch with a digital display screen ( 353 ) can participate in the network ( 351 ) and provide the same types of visual feedback ( 350 ).
- This method of networking devices for capturing and directing allows individuals to receive communications according to their preferences on any network-connected device such as, but not limited to, a desktop computer ( 354 ), laptop computer ( 355 ), phone ( 356 ), tablet ( 357 ), or other mobile computer ( 358 ).
- FIG. 4 provides examples of an interface when the screen is not visible, for example because the capture device is in too close of proximity to the subject. If the capture device is a smartphone ( 110 ) oriented to properly capture a subject's foot ( 465 ), it is unlikely that the subject will have sufficient ability to interact with the phone's screen, and there may not be additional devices or screens available to display visual feedback to the user.
- the example in ( 466 ) shows how even the bottom of a foot ( 471 ) can be captured and precise measurements can be taken using a smartphone ( 110 ).
- the phone's gyroscope By using the phone's gyroscope, the phone's camera can be directed to begin the capture when the phone is on its back, level, and the foot is completely in frame. No visual feedback is required and the system communicates direction such as rotation ( 470 ) or orientation changes ( 473 , 474 ) through spoken instructions ( 446 ) via the smartphone's speakers ( 472 ).
- Multiple sensory interface options provide ways to make the system more accessible, and allow more people to use it.
- a user can indicate they do not want to receive visual feedback (because they are visually impaired, or because the ambient lighting is too bright, or for other reasons) and their preference can be remembered, so that they can receive feedback through audio ( 446 ) and vibration ( 448 ) instead.
- compositions with small children ( 504 ) or groups ( 505 ) represent further examples of use cases that are traditionally difficult to achieve without a human camera operator, because of the number of subjects involved and the requirement that they be simultaneously directed into precise poses.
- sports-specific movements such as those in soccer ( 506 ) (goal keeper positioning, shoot on goal, or dribbling and juggling form) and activities like baseball ( 507 ) (batting, fielding, catching), martial arts ( 508 ), dance ( 509 ), or yoga ( 510 ) are traditionally difficult to self-capture as they require precise timing and the subject is preoccupied so visual feedback becomes impractical.
- the ball may only contact the athlete's foot for a short amount of time, so the window for capture is correspondingly brief.
- the existing state of the art to capture such images is to record high definition, high-speed video over the duration of the activity and generate stills afterward, often manually. This is inefficient and creates an additional burden to sift through potentially large amounts of undesired footage.
- a method and system for integrating perpetual sensor inputs, real-time analytics capabilities, and layered compositional algorithms provide a benefit to the user through the form of automatic direction and orchestration without the need for additional human operators.
- sports teams' uniforms can contain a designated symbol for sensing specific individuals, or existing uniform numbers can be used with CV and analytics methods to identify participants using software. Once identified, the system can use these markers for both identification and editing to inform capture, as well as for direction and control of the subjects.
- the system can use the order of the images to infer a motion path and can direct participants in the scene according to a compositional model matched from a database.
- the images provided can be inputted to the system as designated “capture points” ( 516 ) or moments to be marked if they occur in the scene organically. This type of system for autonomous capture is valuable because it simplifies the post-capture editing/highlighting process by reducing the amount of waste footage captured initially, as defined by the user.
- static scenes such as architectural photography ( 518 ) can also be translated from 2D to 3D.
- the method for recording models for interior ( 517 ) and exterior ( 518 ) landscapes by directing the human user holding the camera can standardize historically individually composed applications (for example in real estate appraisals, MLS listings, or promotional materials for hotels). Because the system is capable of self-direction and provides a method for repeatable, autonomous capture of high quality visual assets by sensing, analyzing, composing, and directing, the system allows professionals in the above-mentioned verticals to focus their efforts not on orchestrating the perfect shot but on storytelling.
- mounted cameras and sensors can provide information for Building Information Modeling (BIM) systems.
- BIM Building Information Modeling
- Providing real-time monitoring and sensing allows events to be not only tagged but also directed and responded to, using models that provide more granularity than is traditionally available.
- successful architectural components from existing structures can evolve into models that can inform new construction, direct building maintenance, identify how people are using the building (e.g., traffic maps), and can optimize HVAC or lighting, or adjust other environment settings.
- the system can pre-direct a 3D scene via a series of 2D images such as a traditional storyboard ( 515 ). This can be accomplished by sensing the content in the 2D image, transforming sensed 2D content into a 3D model of the scene, objects, and subjects, and ultimately assigning actors roles based on the subjects and objects they are to mimic.
- This transformation method allows for greater collaboration in film and television industries by enabling the possibility of productions where direction can be given to actors without the need for actors and directors to be in the same place at the same time, or speak a common language.
- FIG. 6 shows the method of the invention, including the identification of foundational components including Objects, Subjects, Scene, Scape, and Equipment ( 601 ).
- pre-sensed contexts and components (Object(s), Subject(s), Scene, Scape, Equipment) ( 601 ) are fed into the Sensing Module ( 603 ).
- both physical and virtual analytics such as computer vision (i.e., CV) can be applied in the Analytics Module ( 604 ) to make sense of scene components identified in the Sensing Module ( 603 ).
- they can be mapped against composition models in the Composition/Architecture Module ( 605 ) so that in an embodiment, a subject can be scored for compliance against a known composition or pattern.
- Pre-existing models can be stored in a Database ( 600 ) that can hold application states and reference models, and those models can be applied at every step of this process.
- direction of the components of the scene can occur in the Direction/Control Module ( 606 ) up to and including control of robotic or computerized equipment.
- Other types of direction include touch-UI, voice-UI, display, control message events, sounds, vibrations, and notifications.
- Equipment can be similarly directed via the Direction/Control Module ( 606 ) to automatically and autonomously identify a particular subject (e.g., a baseball player) in conjunction with other pattern recognition (such as a hit, 507 ), allowing efficient capture of subsets in frame only. This can provide an intuitive way for a user to instruct the capture of a scene (e.g., begin recording when #22 steps up to the plate, and save all photos of his swing, if applicable).
- the Sensing Module ( 603 ) can connect to the Analytics Module ( 604 ) and the Database ( 600 ), however the Composition/Architecture Module ( 605 ) and Direction/Control Module ( 606 ) can connect to the Analytics Module ( 604 ) and the Database ( 600 ) as shown in FIG. 6 .
- the capability gained from pairing the system's Sensing Module ( 603 ) and Analytics Module ( 604 ) with its Composition/Architecture Module ( 605 ) and Direction/Control Module ( 606 ) allows for on-demand orchestration of potentially large numbers of people in a building, for example automatically directing occupants to safety during an emergency evacuation such as a fire.
- the Sensing Module ( 603 ) can make sense of inputs from sources including security cameras, proximity sensors such as those found in commercial lighting systems, and models stored in a database ( 600 ) (e.g., seating charts, blueprints, maintenance schematics) to create a 3D model of the scene and its subjects and objects.
- the Analytics Module ( 604 ) can use layered CV algorithms such as background cancellation to deduce, for example, where motion is occurring.
- the Analytics Module ( 604 ) can also run facial and body recognition processes to identify human subjects in the scene, and can make use of ID badge reading hardware inputs to link sensed subjects to real-world identities.
- the Composition/Architecture Module ( 605 ) can provide the optimal choreography model for the evacuation, which can be captured organically during a previous during a fire drill at this location, or can be provided to the system in the form of an existing “best practice” for evacuation.
- All three modules can work in a feedback loop to process sensed inputs, make sense of them, and score them against the ideal compositional model for the evacuation. Additionally, the Direction/Control Module ( 606 ) can provide feedback to the evacuees using the methods and system described in FIG. 3 and FIG. 4 . The Direction/Control Module ( 606 ) can also, for example, shut off the gas line to the building if it has been properly networked beforehand. Because the Sensing Module ( 603 ) is running continuously, the system is capable of sensing if occupants are not complying with the directions being given from the Direction/Control Module ( 606 ).
- the benefits of automatically synthesizing disparate inputs into one cohesive scene is also evident in this example of an emergency evacuation, as infrared camera inputs allow the system to identify human subjects using a combination of CV algorithms and allow the system to direct them to the correct evacuation points, even if the smoke is too thick for traditional security cameras to be effective, or the evacuation points are not visible.
- the Direction/Control Module ( 606 ) can also dynamically switch between different styles of feedback, for example if high ambient noise levels are detected during the evacuation, feedback can be switched from audio to visual or haptic.
- FIG. 7 is a process flow for a process, method, and system for automatic orchestration, sensing, composition and direction of subjects, objects and equipment in a 3D space.
- any real world event ( 701 ) from a user pushing a button on the software UI to some specific event or message received by the application can begin the capture process and the Sensing Module ( 603 ).
- This sensing can be done by a single sensor for example infrared or sonic sensor device ( 702 ) or from a plurality of nodes in a network that could also include a combination of image sensing (or camera) nodes ( 703 ).
- obfuscated points clouds are recorded by the device ( 704 ). These obfuscated points clouds are less identifiable than traditional camera-captured images, and can be encrypted ( 704 ).
- a dynamic set of computer vision modules i.e., CV
- ML machine learning algorithms
- composition system can concurrently provide additional efficiency or speed in correlating what's being sensed with prior composition and/or direction models.
- the system can transform the space, subjects and objects into a 3D space with 2D, 2.5 D or 3D object and subject models ( 707 ).
- additional machine learning and heuristic algorithms can be applied across the entire system and throughout processes and methods, for example to correlate the new space being sensed with most relevant composition and or direction models or to provide other applications outside of this application with analytics on this new data.
- the system utilizes both supervised and unsupervised machine learning in parallel and can run in the background to provide context ( 706 ) around, for example, what CV and ML methods were implemented most successfully.
- Supervised and unsupervised machine learning can also identify the leading practices associated with successful outcomes, where success can be determined by criteria from the user, or expert or social feedback, or publicly available success metrics.
- the application can cache in memory most relevant composition model(s) ( 710 ) for faster association with models related to sensing and direction.
- the application can provide continual audio, speech, vibration or visual direction to a user ( 715 ) or periodically send an event or message to an application on the same or other device on the network ( 716 ) (e.g., a second camera to begin capturing data).
- Direction can be sporadic or continuous, can be specific to humans or equipment, and can be given using the approaches and interfaces detailed in FIG. 3
- the application monitors the processing of the data, it utilizes a feedback loop ( 720 ) against the composition or direction model and will adjust parameters and loop back to ( 710 ) or inclusion of software components and update dynamically on a continuous basis ( 721 ).
- New composition models will be stored ( 722 ) whether detected by the software or defined by user or expert through a user interface ( 723 ).
- New and old composition models and corresponding data are managed and version controlled ( 724 ).
- the system can dynamically and automatically utilize or recommend a relevant stored composition model ( 725 ) and direct users or any and all equipment or devices from this model. But in other use cases, the user can manually select a composition model from those previously stored ( 726 ).
- the direction model ( 727 ) provides events, messages, and notifications, or control values to other subjects, applications, robots or hardware devices. Users and/or experts can provide additional feedback as to the effectiveness of a direction model ( 728 ), to validate, augment or improve existing direction models. These models and data are version controlled ( 729 ).
- the system can sporadically or continuously provide direction ( 730 ), by visual UI, audio, voice, vibration to user(s) or control values by event or message to networked devices ( 731 ) (e.g., robotic camera dolly, quadcopter drone, pan and tilt robot, Wi-Fi-enabled GoPro®, etc.).
- networked devices e.g., robotic camera dolly, quadcopter drone, pan and tilt robot, Wi-Fi-enabled GoPro®, etc.
- Each process throughout the system can utilize a continuous feedback loop as it monitors, tracks, and reviews sensor data against training set models ( 732 ).
- the process can continuously compute and loop back to ( 710 ) in the process flow and can end ( 733 ) on an event or message from external or internal application or input from a user/expert through a UI.
- FIG. 8 is a process flow for the Sensing Module ( 603 ) of the system, which can be started ( 800 ) by a user through UI or voice command, or by sensing a pattern in the frame ( 801 ) or by an event in the application.
- a plurality of sensors capture data into memory ( 802 ) and through a combination of machine learning and computer vision sensing and recognition processing, entities, objects, subjects and scenes can be recognized ( 803 ). They also will identify most strongly correlated model to help make sense of the data patterns being sensed against ( 804 ) previously sensed models stored in a Database ( 600 ), via a feedback loop ( 815 ).
- the image sensor ( 804 ) will be dynamically adjusted to improve the sensing precision, for example, separating a foreground object or subject from the background in terms of contrast.
- a reference object in either 2D or 3D can be loaded ( 805 ) to help constrain the CV and aid in recognition of objects in the scene. Using a reference object to constrain the CV helps the Sensing Module ( 603 ) ignore noise in the image including shadows and non-target subjects, as well as objects that might enter or exit the frame.
- Other sensors can be used in parallel or serially to improve the context and quality of sensing ( 806 ). For example, collecting the transmitted geolocation positions from their wearable devices or smartphones of the subjects in an imaged space can help provide richer real-time sensing data to other parts of the system, such as the Composition Module ( 605 ).
- the entity, object and scene capture validation ( 807 ) is continuously evaluating what, and to what level of confidence, in the scene is being captured and what is recognized. This confidence level of recognition and tracking is enhanced as other devices and cameras are added to the network because their inputs and sensory capabilities can be shared and reused and their various screens and interface options can be used to provide rich feedback and direction ( FIG. 3 ).
- the sensing process might start over or move onto a plurality and dynamically ordered set of computer vision algorithm components ( 809 ) and/or machine learning algorithms components ( 810 ).
- those components can include, for example, blob detection algorithms, edge detection operators such as Canny, and edge histogram descriptors.
- the CV components are always in a feedback loop ( 808 ) provided by previously stored leading practice models in the Database ( 600 ) and machine learning processes ( 811 ).
- image sensing lens distortion i.e., smartphone camera data
- the gyroscope and compass can be used to understand the context of subject positions to a 3D space relative to camera angles ( 812 ).
- the system can generate 3D models from the device or networked service or obfuscated and/or encrypted point clouds ( 813 ). These point clouds or models also maintained in a feedback loop ( 814 ) with pre-existing leading practice models in the Database ( 600 ).
- a broader set of analytics and machine learning can be run against all models and data ( 604 ).
- the Sensing Module ( 603 ) is described earlier in FIG. 6 and a more detailed process flow is outlined here in FIG. 8 .
- the system can correlate and analyze the increased sensor information to augment the Sensing Module ( 603 ) and provide greater precision and measurement of a scene.
- FIG. 9 is a diagram of the architecture for the system ( 950 ) according to one or more embodiments.
- the Processing Unit ( 900 ) is comprised of the Sensing Module ( 603 ), Analytics Module ( 604 ), Composition/Architecture Module ( 605 ), and Direction/Control Module ( 606 ), can be connected to a processor ( 901 ) and a device local database ( 600 ) or created in any other computer medium and connected to through a network ( 902 ) including being routed by a software defined network (i.e., SDN) ( 911 ).
- the Processing Unit ( 900 ) can also be connected to an off-premise service for greater scale, performance and context by network SDN ( 912 ).
- This processing capability service cloud or data center might be connected by SDN ( 913 ) to a distributed file system ( 910 ) (e.g., HDFS with Spark or Hadoop), a plurality of service side databases ( 600 ) or cloud computing platform ( 909 ).
- the Processing Unit can be coupled to a processor inside a host data processing system ( 903 ) (e.g., a remote server or local server) through a wired interface and/or a wireless network interface.
- processing can be done on distributed devices for use cases requiring real-time performance (e.g., CV for capturing a subject's position) and that processing can be correlated with other processing throughout the service (e.g., other subject's positioning in the scene).
- FIG. 10 shows examples of iconic posing and professional compositions, including both stereotypical model poses ( 1000 ) and famous celebrity poses such as Marilyn Monroe ( 1001 ).
- These existing compositions can be provided to the system by the user and can be subsequently understood by the system, such that subjects can then be auto-directed to pose relative to a scene that optimally reproduces these compositions, with feedback given in real-time as the system determines all precise spatial orientation and compliance with the model.
- a solo subject can also be directed to pose in the style of professional models ( 1002 ), incorporating architectural features such as walls and with special attention given to precise hand, arm, leg placement and positioning even when no specific image is providing sole compositional guidance or reference.
- the system can synthesize multiple desirable compositions from a database ( 600 ) into one composite reference composition model.
- the system also provides the ability to ingest existing 2D art ( 1006 ) which is then transformed into a 3D model used to auto-direct composition and can act as a proxy for the types of scene attributes a user might be able to recognize but not articulate or manually program.
- groups of subjects can be automatically directed to pose and positioned so that hierarchy and status are conveyed ( 1010 ).
- This can be achieved using the same image synthesis method and system as in ( 1002 ), and by directing each subject individually and while posing them relative to each other to ensure compliance with the reference model.
- the system's simultaneous direction of multiple subjects in frame can dramatically shorten the time required to achieve a quality composition.
- a family 1005
- the system is able to direct them and reliably execute the ideal photograph at the right time and using ubiquitous hardware they already own (e.g., smartphones).
- the system is able to make use of facial recognition ( 1007 ) to deliver specific direction to each participant, in this embodiment achieving optimal positioning of the child's arm ( 1008 , 1009 ).
- the system is able to direct a kiss ( 1003 ) using the Sensing Module ( 603 ), Analytics Module ( 604 ), Composition/Architecture Model ( 605 ), and Direction/Control Module ( 606 ) and the method described in FIG. 7 to ensure both participants are in compliance with the choreography model throughout the activity.
- the system is also able to make use of sensed behaviors as triggers for other events, so that in one embodiment a dancer's movements can be used as inputs to direct the composition of live music, or in another embodiment specific choreography can be used to control the lighting of an event. This allows experts or professionals to create models to be emulated by others (e.g., for instruction or entertainment).
- FIG. 11 is provided as an example of a consumer-facing UI for the system that would all for assignment of models to scenes ( 1103 ), and of roles to subjects ( 1100 ) and objects.
- Virtual subject identifiers can be superimposed over a visual representation of the scene ( 1101 ) to provide auto-linkage from group to composition, and allows for intuitive dragging and reassignments ( 1105 ).
- Sensed subjects once assigned, can be linked to complex profile information ( 1104 ) including LinkedIn, Facebook, or various proprietary corporate LDAP or organizational hierarchy information. Once identified, subjects can be directed simultaneously and individually by the system, through the interfaces described in FIG. 3 .
- stickers or other markers can be attached to the real-world subjects and tagged in this manner.
- These stickers or markers can be any sufficiently differentiated pattern (including stripes, dots, solid colors, text) and can be any material, including simple paper and adhesive, allowing them to come packaged in the magazine insert from FIG. 1 ( 105 ) as well.
- FIG. 12 provides further examples of compositions difficult to achieve traditionally, in this case because of objects or entities relative to the landscape of the scene.
- Nature photography in particular poses a challenge due to the uncontrollable lighting on natural features such as mountains in the background versus the subject in the foreground ( 1200 ).
- users are able to create rules or conditions to govern the capture process and achieve the ideal composition with minimal waste and excess. Those rules can be used to suggest alternate compositions or directions if the desired outcome is determined to be unattainable, for example because of weather.
- existing photographs ( 1201 ) can be captured by the system, as a method of creating a reference model.
- compositional analysis and geolocation data can deliver specific user-defined outcomes such as a self-portrait facing away from the camera, executed when no one else is in the frame and the clouds are over the trees ( 1202 ).
- the composition model is able to direct a subject to stand in front of a less visually “busy” section of the building ( 1203 ).
- each pin ( 1306 ) provides composition and direction for camera settings and controls, positioning, camera angles ( 1302 ), architectural features, lighting, and traffic in the scene. This information is synthesized and can be presented to the user in such a way that the compositional process is easy to understand and highly automated, while delivering high quality capture of a scene. For example, consider a typical tourist destination that can be ideally composed ( 1307 ) involving the Arc de Triomphe.
- the system is able to synthesize a wide range of information (including lighting and shadows depending on date/time, weather, expected crowd sizes, ratings of comparable iterations of this photo taken previously) which it uses to suggest desirable compositions and execute them with precision and reliability, resulting in a pleasant and stress-free experience for the user.
- FIG. 14 is a representation of computer vision and simple models informing composition.
- a building's exterior ( 1401 ) invokes a perspective model ( 1402 ) automatically to influence composition through a CV process and analytics of images during framing of architectural photography.
- the lines in the model ( 1402 ) can communicate ideal perspective to direct perspective, depth, and other compositional qualities, to produce emotional effects in architectural photography applications such as real estate listings.
- the system can make use of a smartphone or camera-equipped aerial drone ( 1500 ) to perform surveillance and visual inspection of traditionally difficult or dangerous-to-inspect structures such as bridges.
- a smartphone or camera-equipped aerial drone 1500
- the system can make use of appropriate reference models for weld inspections, corrosion checks, and insurance estimates and damage appraisals.
- Relative navigation based on models of real-world structures ( 1502 ) provides greater flexibility and accuracy when directing equipment when compared to existing methods such as GPS.
- the Sensing Module 603 it is able to interpret nested and hierarchical instructions such as “search for corrosion on the underside of each support beam.”
- FIG. 15B depicts an active construction site, where a drone can provide instant inspections that are precise and available 24/7.
- a human inspector can monitor the video and sensory feed or take control from the system if desired, or the system is able to autonomously control the drone, recognizing and scoring the construction site's sensed environment for compliance based on existing models (e.g., local building codes).
- Other BIM (Building Information Management) applications include persistent monitoring and reporting as well as responsive structures that react to sensed changes in their environment, for example a window washing system that uses constrained CV to monitor only the exposed panes of glass in a building and can intelligently sense the need for cleaning in a specific location and coordinate an appropriate equipment response, autonomously and without human intervention.
- Human subjects can be deconstructed similarly to buildings, as seen in FIG. 16 .
- a close and precise measurement of the subject's body ( 1601 ) which can be abstracted into, for example, a point cloud ( 1602 )
- composite core rigging ( 1603 ) can then be applied such that a new composite reference core or base NURB 3D model is created ( 1604 ).
- This deconstruction, atomization, and reconstruction of subjects allows for precision modeling and the fusing of real and virtual worlds.
- a simpler representation might be created and stored at the device for user interface or in a social site's datacenters. This obfuscates the subject's body to protect their privacy or mask their vivid body model to protect any privacy or social “body image” concerns. Furthermore, data encryption and hash processing of these images can also be automatically applied in the application on the user's device and throughout the service to protect user privacy and security.
- the system can either create a new composition model for the Database ( 600 ), or select a composition model based on attributes deemed most appropriate for composition: body type, size, shape, height, arm position, face position. Further precise composition body models can be created for precise direction applications in photo, film, theater, musical performance, dance, yoga.
- FIG. 17 catalogues some of the models and/or data that can be stored centrally in a database ( 600 ) available to all methods and processes throughout the system, to facilitate a universal scoring approach for all items.
- models for best practices for shooting a close up movie scene ( 1702 ) are stored and include such items as camera angles, out of focus affects, aperture and exposure settings, depth of field, lighting equipment types with positions and settings, dolly and boom positions relative to the subject (i.e., actor), where “extras” should stand in the scene and entrance timing, set composition.
- film equipment can be directed to react in context with the entire space.
- An example is a networked system of camera dollies, mic booms, and lighting equipment on a film set that identifies actors in a scene and automatically cooperates with other networked equipment to provide optimal composition dynamically and in real-time, freeing the human director to focus on storytelling.
- the Database ( 600 ) can also hold 2D images of individuals and contextualized body theory models ( 1707 ), 3D models of individuals ( 1705 ), and 2D and 3D models of clothing ( 1704 ), allowing the system to score and correlate between models.
- the system can select an appropriate suit for someone it senses is tall and thin ( 1705 ) by considering the body theory and fashion models ( 1707 ) as well as clothing attributes ( 1704 ) such as the intended fit profile or the number of buttons.
- the system can keep these models and their individual components correlated to social feedback ( 1703 ) such as Facebook, YouTube, Instagram, or Twitter using metrics such as views, likes, or changes in followers and subscribers.
- social feedback such as Facebook, YouTube, Instagram, or Twitter using metrics such as views, likes, or changes in followers and subscribers.
- a number of use cases could directly provide context and social proof around identified individuals in a play or movie, from the overall composition cinematography of a scene in a play, music recital, movie or sports event to how well-received a personal image ( 501 ) or group image or video was ( 1101 ).
- This also continuously provides a method and process for tuning best practice models of all types of compositions from photography, painting, movies, skiing, mountain biking, surfing, competitive sports, exercises, yoga poses ( 510 ), dance, music, performances.
- composition models can also be analyzed for trends from social popularity, from fashion, to popular dance moves and latest form alterations to yoga or fitness exercises.
- a camera ( 202 ) and broad spectrum of hardware ( 1706 ), such as lights, robotic camera booms or dollies, autonomous quadcopters, could be evaluated individually, or as part of the overall composition including such items as lights, dolly movements, camera with its multitude of settings and attributes.
- the system can facilitate the learning an instrument through the provision of real-time feedback.
- 3D models of an instrument for example a guitar fretboard model
- each finger can be assigned a marker so that specific feedback can be provided to the user (e.g., “place 2 nd finger on the A string on the 2 nd fret.”) in a format that is understandable and useful to them.
- a human expert can monitor the process and provide input or the sensing, analyzing, composing and directing can be completely autonomous.
- renditions and covers of existing songs can also be scored and compared against the original and other covers, providing a video-game like experience but with fewer hardware requirements and greater freedom.
- FIG. 19 shows an example of a golf swing ( 1901 ) to illustrate the potential of a database of models.
- a golf swing 1901
- that model is available for immediate application and stored in a Database ( 600 ).
- a plurality of sensed movements can be synthesized into one, so that leading practice golf swings are sufficiently documented.
- the models can be converted to compositional models, so that analysis and comparison can take place between the sensed movements and stored compositional swing, and direction and feedback can be given to the user ( 1902 , 1903 ).
- FIG. 20 is a systematic view of an integrated system for Composition and Orchestration of a 3D or 2.5D space illustrating communication between user and their devices to server through a network ( 902 ) or SDN ( 911 , 912 , 913 ), according to one embodiment.
- a user or multiple users can connect to the Processing Unit ( 900 ) that hosts the composition event.
- the user hardware such as a sensor ( 2001 ), TV ( 2003 ), camera ( 2004 ), mobile device such as a tablet or smartphone etc. ( 2005 ), wearable ( 2006 ), server ( 2007 ), laptop ( 2008 ) or desktop computer ( 2009 ) or any wireless device, or any electronic device can communicate directly with other devices in a network or to the devices of specific users ( 2002 , 902 ).
- the orchestration system might privately send unique positions and directions to four separate devices (e.g., watch, smartphone, quadcopter ( 1706 ), and an internet-connected TV) in quickly composing high-quality and repeatable photographs of actors and fans at a meet-and-greet event.
- four separate devices e.g., watch, smartphone, quadcopter ( 1706 ), and an internet-connected TV
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Finance (AREA)
- Accounting & Taxation (AREA)
- Multimedia (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Human Computer Interaction (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Computing Systems (AREA)
- Geometry (AREA)
- Computer Graphics (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Method and system for automatic composition and orchestration of a 3D space or scene using networked devices and computer vision to bring ease of use and autonomy to a range of compositions. A scene, its objects, subjects and background are identified and classified, and relationships and behaviors are deduced through analysis. Compositional theories are applied, and context attributes (for example location, external data, camera metadata, and the relative positions of subjects and objects in the scene) are considered automatically to produce optimal composition and allow for direction of networked equipment and devices. Events inform the capture process, for example, a video recording initiated when a rock climber waves her hand, an autonomous camera automatically adjusting to keep her body in frame throughout the sequence of moves. Model analysis allows for direction, including audio tones to indicate proper form for the subject and instructions sent to equipment ensure optimal scene orchestration.
Description
- The instant application is a utility application of the previously filed U.S. Provisional Application 62/053,055 filed on 19 Sep. 2014. The pending U.S. Provisional Application 62/053,055 is hereby incorporated by reference in its entireties for all of its teachings.
- A method and system for automatically sensing using photographic equipment that captures a 3D space, scene, subject, object, and equipment for further analysis, composition and direction that can be used for creating visual design.
- Computer device hardware and software continue to advance in sophistication. Cameras, micro controllers, computer processors (e.g., ARM), and smartphones have become more capable, as well as smaller, cheaper, and ubiquitous. In parallel, more sophisticated algorithms including computer vision, machine learning and 3D models can be computed in real-time or near real-time on a smartphone or distributed over a plurality of devices over a network.
- At the same time, multiple cameras including front-facing cameras on smartphones have enabled the popularity of the selfie as a way for anyone to quickly capture a moment and share it with others. But the primary mechanism for composition has not advanced beyond an extended arm or a selfie stick and use of the device's screen as a visual reference for the user to achieve basic scene framing. Recently, there have been GPS-based drone cameras introduced such as Lily that improve on the selfie-stick, but they are not autonomous and instead require the user to wear a tracking device to continually establish the focal point of the composition and pass directional “commands” to the drone via buttons on the device. This is limiting when trying to include multiple dynamic subjects and or objects in the frame (a “groupie”), or when the user is preoccupied or distracted (for example at a concert, or while engaged in other activities).
- The present invention is in the areas of sensing, analytics, direction, and composition of 3D spaces. It provides a dynamic real-time approach to sense, recognize, and analyze objects of interest in a scene; applies a composition model that automatically incorporates best practices from prior art as models, for example: photography, choreography, cinematography, art exhibition, and live sports events; and directs subjects and equipment in the scene to achieve the desired outcome.
- In one embodiment, a high-quality professional-style recording is being composed using the method and system. Because traditional and ubiquitous image capture equipment can now be enabled with microcontrollers and/or sensor nodes in a network to synthesize numerous compositional inputs and communicate real-time directions to subjects and equipment using a combination of sensory (e.g., visual, audio, vibration) feedback and control messages, it becomes significantly easier to get a high-quality output on one's own. If there are multiple people or subjects who need to be posed precisely, each subject can receive personalized direction to ensure their optimal positioning relative to the scene around them.
- In one embodiment, real-world scenes are captured using sensor data and translated into 2D, 2.5 D and 3D models in real-time using a method such that continuous spatial sensing, recognition, composition, and direction is possible without requiring additional human judgment or interaction with the equipment and/or scene.
- In one embodiment, image processing, image filtering, video analysis motion, background subtraction, object tracking, pose, stereo correspondence, and 3D reconstruction are run perpetually to provide optimal orchestration of subjects and equipment in the scene without a human operator.
- In one embodiment, subjects can be tagged explicitly by a user, or determined automatically by the system. If desired, subjects can be tracked or kept in frame over time and as they move throughout a scene, without further user interaction with the system. The subject(s) can also be automatically directed through sensory feedback (e.g., audio, visual, vibration) or any other user interface.
- In one embodiment as a method, an event begins the process of capturing the scene. The event can be an explicit hardware action such as pressing a shutter button or activating a remote control for the camera, or the event can be determined via software, a real world event, message or notification symbol; for example recognizing the subject waving their arms, a hand gesture or an object, a symbol, or identified subject or entity entering a predetermined area in the scene.
- The system allows for the identification of multiple sensory event types, including physical-world events (object entering/exiting the frame, a sunrise, a change in the lighting of the scene, the sound of someone's voice, etc.) and software-defined events (state changes, timers, sensor-based). In one embodiment, a video recording is initiated when a golfer settles into her stance and aims her head down, and the camera automatically adjusts to keep her moving club in the frame during her backswing before activating burst mode so as to best capture the moment of impact with the ball during her downswing before pausing the recording seconds after the ball leaves the frame. Feedback can be further provided to improve her swing based on rules and constraints provided from an external golf professional, while measuring and scoring how well she complies with leading practice motion ranges.
- In another embodiment, a video or camera scan can be voice or automatically initiated when the subject is inside the camera frame and monitor and direct them through a sequence of easy to understand body movements and steps with a combination of voice, lights and by simple mimic of on-screen poses as in a user interface or visual display. For a few examples, the subject could be practicing and precisely evaluating yoga poses, following a physical therapy program, or taking private body measurements.
-
FIG. 1A , 1B, 1C, 1D, 1E show various methods for hands-free capture of a scene. -
FIG. 2A , 2B, 2C, 2D, 2E, 2F, 2G, 2H, 2I, 2J illustrate a system of cameras that can be used to implement a method of sensing, analyzing, composing, and directing a scene. -
FIG. 3 shows examples of interfaces, inputs, and outputs to direct subjects in a scene. -
FIG. 4 shows further examples of interface when capturing close-up scenes. -
FIG. 5 shows examples of selfies, groupies, and other typical applications. -
FIG. 6 is a diagram of the Sensing Module, Analytics Module, Composition/Architecture Module, and Direction/Control Module. -
FIG. 7 is a diagram of the system's algorithm and high-level process flow. -
FIG. 8 is a detailed look at the Sensing Module fromFIG. 6 . -
FIG. 9 is a high level view of the system architecture for on-premise and cloud embodiments. -
FIG. 10 illustrates various iconic and familiar compositions and reference poses. -
FIG. 11 shows an interface for choosing a composition model and assigning objects or subjects for direction. -
FIG. 12 shows further examples of compositions that can be directed. -
FIG. 13 shows an example interface for using data attached to specific geolocations, as well as an example use case. -
FIG. 14 shows how computer vision can influence composition model selection. -
FIGS. 15A and 15B show examples of Building Information Management (BIM) applications. -
FIG. 16 shows how a collection of images and file types can be constructed and deconstructed into sub-components including 3D aggregate models and hashed files for protecting user privacy across system from device to network and cloud service. -
FIG. 17 shows types of inputs that inform the Models fromFIG. 6 -
FIG. 18 shows a method for virtual instruction to teach how to play music -
FIG. 19 is an example of how a Model can apply to Sensed data. -
FIG. 20 shows example connections to the network and to the Processing Unit. - The present invention enables real-time sensing, spatial composition, and direction for objects, subjects, scenes, and equipment in 2D, 2.5D or 3D models in a 3D space. In a common embodiment, a smartphone will be used for both its ubiquity and the combination of cameras, sensors, and interface options.
-
FIG. 1A shows how such a cell phone (110) can be positioned to provide hands-free capture of a scene. This can be achieved using supplemental stands different from traditional tripods designed for non-phone cameras.FIG. 1C shows a stand can be either foldable (101) or rigid (102) so long as it holds the sensors on the phone in a stable position. A braced style of stand (103) like the one shown inFIG. 1E can also be used. The stand can be made of any combination of materials, so long as the stand is sufficiently tall and wide as to support the weight of the capturing device (110) and hold it securely in place. - In an embodiment, the self-assembled stand (101) can be fashioned from materials included as a branded or unbranded removable insert (105) in a magazine or other promotion (106) with labeling and tabs sufficient so that the user is able to remove the insert (105) and assemble it into a stand (101) without any tools. This shortens the time to initial use by an end-user by reducing the steps needed to position a device for proper capture of a scene.
- As seen in
FIG. 1D , the effect of the stand can also be achieved using the angle of the wall/floor and the natural surface friction of a space. In this embodiment, the angle of placement (107) is determined by the phone's (110) sensors and slippage can be detected by monitoring changes in those sensors. The angle of elevation can be extrapolated from the camera's lens (111), allowing for very wide capture of a scene when the phone is oriented in portrait mode. Combined with a known fixed position from the bottom of the phone to the lens (104), the system is now able to deliver precise measurements and calibrations of objects in a scene. This precision could be used, for example, to capture a user's measurements and position using only one capture device (110) instead of multiple. - When positioning the device on a door, wall, or other vertical surface (
FIG. 1A ), adhesive or magnets (120) can be used to secure the capture device (110) and prevent it from falling. For rented apartments or other temporary spaces, the capture device can also be placed in a case (122) such that the case can then be mounted via hooks, adhesives, magnets, or other ubiquitous fasteners (FIG. 1B ). This allows for easy removal of the device (110) without compromising or disturbing the capture location (121). - Referring now to
FIG. 2A , 2B, 2C, 2D, 2E, 2F, 2G, 2H, 2I, 2J, various devices can be orchestrated into an ensemble to capture a scene. Existing capture device types can be positioned and networked to provide an optimal awareness of the scene. Examples include: cameras (202), wearable computers such as the Apple Watch, Google Glass, or FitBit (FIG. 2B ), pan/tilt cameras such as those found in webcams/security cameras (FIG. 2C ), mobile devices such as smartphones or tablets (FIG. 2D ) equipped with front and rear-facing cameras (including advanced models with body-tracking sensors and fast-focus systems such as in the LG G3), traditional digital cameras (FIG. 2E ), laptops with integrated webcams (FIG. 2F ), depth cameras or thermal sensors (FIG. 2G ) like those found in the Xbox Kinect hardware, dedicated video cameras (FIG. 2H ), and autonomous equipment with only cameras attached (FIG. 2I ) or autonomous equipment with sensors (FIG. 2J ) such as sonar sensors, or infrared, or laser, or thermal imaging technology. - Advances in hardware/software coupling on smartphones further extend the applicability of the system and provide opportunities for a better user experience when capturing a scene because ubiquitous smartphones and tablets (
FIG. 2D ) can increasingly be used instead of traditionally expensive video cameras (FIG. 2E ,FIG. 2H ). - Using the mounts described in
FIG. 1A , a device (110) can be mounted on a door or wall to capture the scene. The door allows panning of the scene by incorporating the known fixed-plane movement of the door. For alternate vantage points, it is also possible to use the mounts to position a device on a table (213) or the floor using a stand (103), or to use a traditional style tripod (215). The versatility afforded by the mounts and stands allows for multiple placement options for capturing devices, which in turn allows for greater precision and flexibility when sensing, analyzing, composing, and directing a subject in a 3D space. - Once recognized in a scene, subjects (220) can then be directed via the system to match desired compositional models, according to various sensed orientations and positions. These include body alignment (225), arm placement (230), and head tilt angle (234). Additionally, the subject can be directed to rotate in place (235) or to change their physical location by either moving forward, backward, or laterally (240).
- Rotation (225) in conjunction with movement along a plane (240) also allows for medical observation, such as orthopedic evaluation of a user's gait or posture. While an established procedure exists today wherein trained professional humans evaluate gait, posture, and other attributes in-person, access to those professionals is limited and the quality and consistency of the evaluations is irregular. The invention addresses both shortcomings through a method and system that makes use of ubiquitous smartphones (110) and the precision and modularity of models. Another instance where networked sensors and cameras can replace a human professional is precise body measurement, previously achieved by visiting a quality tailor. By creating a 3D scene and directing subjects (220) intuitively as they move within it, the system is able to ensure with high accuracy that the subjects go through the correct sequences and the appropriate measurements are collected efficiently and with repeatable precision. Additionally, this method of dynamic and precise capture of a subject while sensing can be used to achieve positioning required for stereographic images with e.g., a single lens or sensor.
-
FIG. 3 provides examples of interface possibilities to communicate feedback to the subjects and users. The capturing device (110) can relay feedback that is passed to subjects through audio tones (345), voice commands (346), visually via a screen (347), or using vibration (348). An example of such a feedback loop is shown as a top view looking down on the subject (220) as they move along the rotation path directed in (225) according to audio tones heard by the subject (349). - The visual on-screen feedback (347) can take the form of a superimposed image of the subject's sensed position relative to the directed position in the scene (350). In one embodiment, the positions are represented as avatars, allowing human subjects to naturally mimic and achieve the desired position by aligning the two avatars (350). Real-time visual feedback is possible because the feedback-providing device (110) is networked (351) to all other sensing devices (352), allowing for synthesis and scoring of varied location and position inputs and providing a precise awareness of the scene's spatial composition (this method and system is discussed further in
FIG. 8 ). One example of additional sensing data that can be networked is imagery of an infrared camera (360). - Other devices such as Wi-Fi-enabled GoPro®-style action cameras (202) and wearable technologies such as a smart watch with a digital display screen (353) can participate in the network (351) and provide the same types of visual feedback (350). This method of networking devices for capturing and directing allows individuals to receive communications according to their preferences on any network-connected device such as, but not limited to, a desktop computer (354), laptop computer (355), phone (356), tablet (357), or other mobile computer (358).
-
FIG. 4 provides examples of an interface when the screen is not visible, for example because the capture device is in too close of proximity to the subject. If the capture device is a smartphone (110) oriented to properly capture a subject's foot (465), it is unlikely that the subject will have sufficient ability to interact with the phone's screen, and there may not be additional devices or screens available to display visual feedback to the user. - The example in (466) shows how even the bottom of a foot (471) can be captured and precise measurements can be taken using a smartphone (110). By using the phone's gyroscope, the phone's camera can be directed to begin the capture when the phone is on its back, level, and the foot is completely in frame. No visual feedback is required and the system communicates direction such as rotation (470) or orientation changes (473, 474) through spoken instructions (446) via the smartphone's speakers (472).
- Multiple sensory interface options provide ways to make the system more accessible, and allow more people to use it. In an embodiment, a user can indicate they do not want to receive visual feedback (because they are visually impaired, or because the ambient lighting is too bright, or for other reasons) and their preference can be remembered, so that they can receive feedback through audio (446) and vibration (448) instead.
- Referring now to
FIG. 5 , examples of different types of scenes are shown to indicate how various compositional models can be applied. Traditionally, sensing, analytics, composition, and direction have been manual processes. The selfie shown in (501) is a photo or video typically difficult to capture by the operator at his or her arm length and/or reliant on a front-facing camera so immediate visual feedback is provided. Absent extensive planning and rehearsal, an additional human photographer has previously been required to achieve well-composed scene capture as seen in (502) and (503). Compositions with small children (504) or groups (505) represent further examples of use cases that are traditionally difficult to achieve without a human camera operator, because of the number of subjects involved and the requirement that they be simultaneously directed into precise poses. - Additionally, sports-specific movements such as those in soccer (506) (goal keeper positioning, shoot on goal, or dribbling and juggling form) and activities like baseball (507) (batting, fielding, catching), martial arts (508), dance (509), or yoga (510) are traditionally difficult to self-capture as they require precise timing and the subject is preoccupied so visual feedback becomes impractical. Looking again at (506), the ball may only contact the athlete's foot for a short amount of time, so the window for capture is correspondingly brief. The existing state of the art to capture such images is to record high definition, high-speed video over the duration of the activity and generate stills afterward, often manually. This is inefficient and creates an additional burden to sift through potentially large amounts of undesired footage.
- A method and system for integrating perpetual sensor inputs, real-time analytics capabilities, and layered compositional algorithms (discussed further in
FIG. 6 ) provide a benefit to the user through the form of automatic direction and orchestration without the need for additional human operators. In one embodiment, sports teams' uniforms can contain a designated symbol for sensing specific individuals, or existing uniform numbers can be used with CV and analytics methods to identify participants using software. Once identified, the system can use these markers for both identification and editing to inform capture, as well as for direction and control of the subjects. - In another embodiment, the system can use the order of the images to infer a motion path and can direct participants in the scene according to a compositional model matched from a database. Or, the images provided can be inputted to the system as designated “capture points” (516) or moments to be marked if they occur in the scene organically. This type of system for autonomous capture is valuable because it simplifies the post-capture editing/highlighting process by reducing the amount of waste footage captured initially, as defined by the user.
- In another embodiment, static scenes such as architectural photography (518) can also be translated from 2D to 3D. The method for recording models for interior (517) and exterior (518) landscapes by directing the human user holding the camera can standardize historically individually composed applications (for example in real estate appraisals, MLS listings, or promotional materials for hotels). Because the system is capable of self-direction and provides a method for repeatable, autonomous capture of high quality visual assets by sensing, analyzing, composing, and directing, the system allows professionals in the above-mentioned verticals to focus their efforts not on orchestrating the perfect shot but on storytelling.
- In another embodiment, mounted cameras and sensors can provide information for Building Information Modeling (BIM) systems. Providing real-time monitoring and sensing allows events to be not only tagged but also directed and responded to, using models that provide more granularity than is traditionally available. In one embodiment, successful architectural components from existing structures can evolve into models that can inform new construction, direct building maintenance, identify how people are using the building (e.g., traffic maps), and can optimize HVAC or lighting, or adjust other environment settings.
- As their ubiquity drives their cost down, cameras and sensors used for creating 3D building models will proliferate. Once a 3D model of a building has been captured (517), the precise measurements can be shared and made useful to other networked devices. As an example, the state of the art now is for each device to create its own siloes of information. Dyson's vacuum cleaner The Eye, for example, captures multiple 360 images each second on its way to mapping a plausible vacuuming route through a building's interior, but those images remain isolated and aren't synthesized into a richer understanding of the physical space. Following 3D space and markers using relative navigation of model parameters and attribute values is much more reliable and less costly, regardless of whether image sensing is involved.
- In another embodiment, the system can pre-direct a 3D scene via a series of 2D images such as a traditional storyboard (515). This can be accomplished by sensing the content in the 2D image, transforming sensed 2D content into a 3D model of the scene, objects, and subjects, and ultimately assigning actors roles based on the subjects and objects they are to mimic. This transformation method allows for greater collaboration in film and television industries by enabling the possibility of productions where direction can be given to actors without the need for actors and directors to be in the same place at the same time, or speak a common language.
-
FIG. 6 shows the method of the invention, including the identification of foundational components including Objects, Subjects, Scene, Scape, and Equipment (601). - Once the capture process has been started (602), pre-sensed contexts and components (Object(s), Subject(s), Scene, Scape, Equipment) (601) are fed into the Sensing Module (603). Now both physical and virtual analytics such as computer vision (i.e., CV) can be applied in the Analytics Module (604) to make sense of scene components identified in the Sensing Module (603). And they can be mapped against composition models in the Composition/Architecture Module (605) so that in an embodiment, a subject can be scored for compliance against a known composition or pattern. Pre-existing models can be stored in a Database (600) that can hold application states and reference models, and those models can be applied at every step of this process. Once the analysis has taken place comparing sensed scenes to composed scenes, direction of the components of the scene can occur in the Direction/Control Module (606) up to and including control of robotic or computerized equipment. Other types of direction include touch-UI, voice-UI, display, control message events, sounds, vibrations, and notifications. Equipment can be similarly directed via the Direction/Control Module (606) to automatically and autonomously identify a particular subject (e.g., a baseball player) in conjunction with other pattern recognition (such as a hit, 507), allowing efficient capture of subsets in frame only. This can provide an intuitive way for a user to instruct the capture of a scene (e.g., begin recording when #22 steps up to the plate, and save all photos of his swing, if applicable).
- The Sensing Module (603) can connect to the Analytics Module (604) and the Database (600), however the Composition/Architecture Module (605) and Direction/Control Module (606) can connect to the Analytics Module (604) and the Database (600) as shown in
FIG. 6 . - In another embodiment, the capability gained from pairing the system's Sensing Module (603) and Analytics Module (604) with its Composition/Architecture Module (605) and Direction/Control Module (606) allows for on-demand orchestration of potentially large numbers of people in a building, for example automatically directing occupants to safety during an emergency evacuation such as a fire. The Sensing Module (603) can make sense of inputs from sources including security cameras, proximity sensors such as those found in commercial lighting systems, and models stored in a database (600) (e.g., seating charts, blueprints, maintenance schematics) to create a 3D model of the scene and its subjects and objects. Next, the Analytics Module (604) can use layered CV algorithms such as background cancellation to deduce, for example, where motion is occurring. The Analytics Module (604) can also run facial and body recognition processes to identify human subjects in the scene, and can make use of ID badge reading hardware inputs to link sensed subjects to real-world identities. The Composition/Architecture Module (605) can provide the optimal choreography model for the evacuation, which can be captured organically during a previous during a fire drill at this location, or can be provided to the system in the form of an existing “best practice” for evacuation. All three modules (Sensing Module (603), Analytics Module (604), and Composition/Architecture Module (605)) can work in a feedback loop to process sensed inputs, make sense of them, and score them against the ideal compositional model for the evacuation. Additionally, the Direction/Control Module (606) can provide feedback to the evacuees using the methods and system described in
FIG. 3 andFIG. 4 . The Direction/Control Module (606) can also, for example, shut off the gas line to the building if it has been properly networked beforehand. Because the Sensing Module (603) is running continuously, the system is capable of sensing if occupants are not complying with the directions being given from the Direction/Control Module (606). The benefits of automatically synthesizing disparate inputs into one cohesive scene is also evident in this example of an emergency evacuation, as infrared camera inputs allow the system to identify human subjects using a combination of CV algorithms and allow the system to direct them to the correct evacuation points, even if the smoke is too thick for traditional security cameras to be effective, or the evacuation points are not visible. The Direction/Control Module (606) can also dynamically switch between different styles of feedback, for example if high ambient noise levels are detected during the evacuation, feedback can be switched from audio to visual or haptic. -
FIG. 7 is a process flow for a process, method, and system for automatic orchestration, sensing, composition and direction of subjects, objects and equipment in a 3D space. Once started (700), any real world event (701) from a user pushing a button on the software UI to some specific event or message received by the application can begin the capture process and the Sensing Module (603). This sensing can be done by a single sensor for example infrared or sonic sensor device (702) or from a plurality of nodes in a network that could also include a combination of image sensing (or camera) nodes (703). - To protect subject privacy and provide high levels of trust in the system, traditional images are neither captured nor stored, and only obfuscated points clouds are recorded by the device (704). These obfuscated points clouds are less identifiable than traditional camera-captured images, and can be encrypted (704). In real-time as this data is captured at any number of nodes and types, either by a set of device local (e.g., smartphone) or by a cloud based service, a dynamic set of computer vision modules (i.e., CV) (705) and machine learning algorithms (ML) are included and reordered as they are applied to optimally identify the objects and subjects in a 3D or 2D space. An external to the invention “context system” (706) can concurrently provide additional efficiency or speed in correlating what's being sensed with prior composition and/or direction models. Depending on the results from the CV and on the specific use-case, the system can transform the space, subjects and objects into a 3D space with 2D, 2.5 D or 3D object and subject models (707).
- In some use-cases, additional machine learning and heuristic algorithms (708) can be applied across the entire system and throughout processes and methods, for example to correlate the new space being sensed with most relevant composition and or direction models or to provide other applications outside of this application with analytics on this new data. The system utilizes both supervised and unsupervised machine learning in parallel and can run in the background to provide context (706) around, for example, what CV and ML methods were implemented most successfully. Supervised and unsupervised machine learning can also identify the leading practices associated with successful outcomes, where success can be determined by criteria from the user, or expert or social feedback, or publicly available success metrics. For performance, the application can cache in memory most relevant composition model(s) (710) for faster association with models related to sensing and direction. While monitoring and tracking the new stored sensed data (711), this can be converted and dynamically updated (712) into a new unique composition model if the pattern is unique, for example as determined automatically using statistical analysis, ML, or manually through a user/expert review interface.
- In embodiments where a user is involved in the process, the application can provide continual audio, speech, vibration or visual direction to a user (715) or periodically send an event or message to an application on the same or other device on the network (716) (e.g., a second camera to begin capturing data). Direction can be sporadic or continuous, can be specific to humans or equipment, and can be given using the approaches and interfaces detailed in
FIG. 3 - As the application monitors the processing of the data, it utilizes a feedback loop (720) against the composition or direction model and will adjust parameters and loop back to (710) or inclusion of software components and update dynamically on a continuous basis (721). New composition models will be stored (722) whether detected by the software or defined by user or expert through a user interface (723). New and old composition models and corresponding data are managed and version controlled (724).
- By analyzing the output from the Sensing Module (603), the system can dynamically and automatically utilize or recommend a relevant stored composition model (725) and direct users or any and all equipment or devices from this model. But in other use cases, the user can manually select a composition model from those previously stored (726).
- From the composition model, the direction model (727) provides events, messages, and notifications, or control values to other subjects, applications, robots or hardware devices. Users and/or experts can provide additional feedback as to the effectiveness of a direction model (728), to validate, augment or improve existing direction models. These models and data are version controlled (729).
- In many embodiments, throughout the process the system can sporadically or continuously provide direction (730), by visual UI, audio, voice, vibration to user(s) or control values by event or message to networked devices (731) (e.g., robotic camera dolly, quadcopter drone, pan and tilt robot, Wi-Fi-enabled GoPro®, etc.).
- Each process throughout the system can utilize a continuous feedback loop as it monitors, tracks, and reviews sensor data against training set models (732). The process can continuously compute and loop back to (710) in the process flow and can end (733) on an event or message from external or internal application or input from a user/expert through a UI.
-
FIG. 8 is a process flow for the Sensing Module (603) of the system, which can be started (800) by a user through UI or voice command, or by sensing a pattern in the frame (801) or by an event in the application. A plurality of sensors capture data into memory (802) and through a combination of machine learning and computer vision sensing and recognition processing, entities, objects, subjects and scenes can be recognized (803). They also will identify most strongly correlated model to help make sense of the data patterns being sensed against (804) previously sensed models stored in a Database (600), via a feedback loop (815). In one embodiment, the image sensor (804) will be dynamically adjusted to improve the sensing precision, for example, separating a foreground object or subject from the background in terms of contrast. A reference object in either 2D or 3D can be loaded (805) to help constrain the CV and aid in recognition of objects in the scene. Using a reference object to constrain the CV helps the Sensing Module (603) ignore noise in the image including shadows and non-target subjects, as well as objects that might enter or exit the frame. - Other sensors can be used in parallel or serially to improve the context and quality of sensing (806). For example, collecting the transmitted geolocation positions from their wearable devices or smartphones of the subjects in an imaged space can help provide richer real-time sensing data to other parts of the system, such as the Composition Module (605). Throughout the processes, the entity, object and scene capture validation (807) is continuously evaluating what, and to what level of confidence, in the scene is being captured and what is recognized. This confidence level of recognition and tracking is enhanced as other devices and cameras are added to the network because their inputs and sensory capabilities can be shared and reused and their various screens and interface options can be used to provide rich feedback and direction (
FIG. 3 ). - The sensing process might start over or move onto a plurality and dynamically ordered set of computer vision algorithm components (809) and/or machine learning algorithms components (810). In various embodiments, those components can include, for example, blob detection algorithms, edge detection operators such as Canny, and edge histogram descriptors. The CV components are always in a feedback loop (808) provided by previously stored leading practice models in the Database (600) and machine learning processes (811). In an embodiment, image sensing lens distortion (i.e., smartphone camera data) can be error corrected for barrel distortion and the gyroscope and compass can be used to understand the context of subject positions to a 3D space relative to camera angles (812). The system can generate 3D models from the device or networked service or obfuscated and/or encrypted point clouds (813). These point clouds or models also maintained in a feedback loop (814) with pre-existing leading practice models in the Database (600).
- A broader set of analytics and machine learning can be run against all models and data (604). The Sensing Module (603) is described earlier in
FIG. 6 and a more detailed process flow is outlined here inFIG. 8 . As powerful hardware is commercialized and further capabilities are unlocked via APIs, the system can correlate and analyze the increased sensor information to augment the Sensing Module (603) and provide greater precision and measurement of a scene. -
FIG. 9 is a diagram of the architecture for the system (950) according to one or more embodiments. In an on-premise embodiment, the Processing Unit (900) is comprised of the Sensing Module (603), Analytics Module (604), Composition/Architecture Module (605), and Direction/Control Module (606), can be connected to a processor (901) and a device local database (600) or created in any other computer medium and connected to through a network (902) including being routed by a software defined network (i.e., SDN) (911). The Processing Unit (900) can also be connected to an off-premise service for greater scale, performance and context by network SDN (912). This processing capability service cloud or data center might be connected by SDN (913) to a distributed file system (910) (e.g., HDFS with Spark or Hadoop), a plurality of service side databases (600) or cloud computing platform (909). In one or more embodiments, the Processing Unit can be coupled to a processor inside a host data processing system (903) (e.g., a remote server or local server) through a wired interface and/or a wireless network interface. In another embodiment, processing can be done on distributed devices for use cases requiring real-time performance (e.g., CV for capturing a subject's position) and that processing can be correlated with other processing throughout the service (e.g., other subject's positioning in the scene). -
FIG. 10 shows examples of iconic posing and professional compositions, including both stereotypical model poses (1000) and famous celebrity poses such as Marilyn Monroe (1001). These existing compositions can be provided to the system by the user and can be subsequently understood by the system, such that subjects can then be auto-directed to pose relative to a scene that optimally reproduces these compositions, with feedback given in real-time as the system determines all precise spatial orientation and compliance with the model. - In one embodiment, a solo subject can also be directed to pose in the style of professional models (1002), incorporating architectural features such as walls and with special attention given to precise hand, arm, leg placement and positioning even when no specific image is providing sole compositional guidance or reference. To achieve this, the system can synthesize multiple desirable compositions from a database (600) into one composite reference composition model. The system also provides the ability to ingest existing 2D art (1006) which is then transformed into a 3D model used to auto-direct composition and can act as a proxy for the types of scene attributes a user might be able to recognize but not articulate or manually program.
- In another embodiment, groups of subjects can be automatically directed to pose and positioned so that hierarchy and status are conveyed (1010). This can be achieved using the same image synthesis method and system as in (1002), and by directing each subject individually and while posing them relative to each other to ensure compliance with the reference model. The system's simultaneous direction of multiple subjects in frame can dramatically shorten the time required to achieve a quality composition. Whereas previously a family (1005) would have used time-delay and extensive back-and-forth positioning or enlisted a professional human photographer, now the system is able to direct them and reliably execute the ideal photograph at the right time and using ubiquitous hardware they already own (e.g., smartphones). The system is able to make use of facial recognition (1007) to deliver specific direction to each participant, in this embodiment achieving optimal positioning of the child's arm (1008,1009). In another embodiment, the system is able to direct a kiss (1003) using the Sensing Module (603), Analytics Module (604), Composition/Architecture Model (605), and Direction/Control Module (606) and the method described in
FIG. 7 to ensure both participants are in compliance with the choreography model throughout the activity. The system is also able to make use of sensed behaviors as triggers for other events, so that in one embodiment a dancer's movements can be used as inputs to direct the composition of live music, or in another embodiment specific choreography can be used to control the lighting of an event. This allows experts or professionals to create models to be emulated by others (e.g., for instruction or entertainment). -
FIG. 11 is provided as an example of a consumer-facing UI for the system that would all for assignment of models to scenes (1103), and of roles to subjects (1100) and objects. Virtual subject identifiers can be superimposed over a visual representation of the scene (1101) to provide auto-linkage from group to composition, and allows for intuitive dragging and reassignments (1105). Sensed subjects, once assigned, can be linked to complex profile information (1104) including LinkedIn, Facebook, or various proprietary corporate LDAP or organizational hierarchy information. Once identified, subjects can be directed simultaneously and individually by the system, through the interfaces described inFIG. 3 . - In scenarios where distinguishing between subjects is difficult (poor light, similar clothing, camouflage in nature) stickers or other markers can be attached to the real-world subjects and tagged in this manner. Imagine a distinguishing sticker placed on each of the five subjects (901) and helping to keep them correctly identified. These stickers or markers can be any sufficiently differentiated pattern (including stripes, dots, solid colors, text) and can be any material, including simple paper and adhesive, allowing them to come packaged in the magazine insert from
FIG. 1 (105) as well. -
FIG. 12 provides further examples of compositions difficult to achieve traditionally, in this case because of objects or entities relative to the landscape of the scene. Nature photography in particular poses a challenge due to the uncontrollable lighting on natural features such as mountains in the background versus the subject in the foreground (1200). Using the interface described inFIG. 11 , users are able to create rules or conditions to govern the capture process and achieve the ideal composition with minimal waste and excess. Those rules can be used to suggest alternate compositions or directions if the desired outcome is determined to be unattainable, for example because of weather. Additionally, existing photographs (1201) can be captured by the system, as a method of creating a reference model. In one embodiment, auto-sensing capabilities described inFIG. 8 combined with compositional analysis and geolocation data can deliver specific user-defined outcomes such as a self-portrait facing away from the camera, executed when no one else is in the frame and the clouds are over the trees (1202). In another embodiment, the composition model is able to direct a subject to stand in front of a less visually “busy” section of the building (1203). - Much of the specific location information the system makes use of to inform the composition and direction decisions is embodied in a location model, as described in
FIG. 13 . Representing specific geolocations (1305), each pin (1306) provides composition and direction for camera settings and controls, positioning, camera angles (1302), architectural features, lighting, and traffic in the scene. This information is synthesized and can be presented to the user in such a way that the compositional process is easy to understand and highly automated, while delivering high quality capture of a scene. For example, consider a typical tourist destination that can be ideally composed (1307) involving the Arc de Triomphe. The system is able to synthesize a wide range of information (including lighting and shadows depending on date/time, weather, expected crowd sizes, ratings of comparable iterations of this photo taken previously) which it uses to suggest desirable compositions and execute them with precision and reliability, resulting in a pleasant and stress-free experience for the user. -
FIG. 14 is a representation of computer vision and simple models informing composition. A building's exterior (1401) invokes a perspective model (1402) automatically to influence composition through a CV process and analytics of images during framing of architectural photography. The lines in the model (1402) can communicate ideal perspective to direct perspective, depth, and other compositional qualities, to produce emotional effects in architectural photography applications such as real estate listings. - Referring now to
FIG. 15A , the system can make use of a smartphone or camera-equipped aerial drone (1500) to perform surveillance and visual inspection of traditionally difficult or dangerous-to-inspect structures such as bridges. Using 3D-constrained CV to navigate and control the drone autonomously and more precisely than traditional GPS waypoints, the system can make use of appropriate reference models for weld inspections, corrosion checks, and insurance estimates and damage appraisals. Relative navigation based on models of real-world structures (1502) provides greater flexibility and accuracy when directing equipment when compared to existing methods such as GPS. Because the system can make use of the Sensing Module (603) it is able to interpret nested and hierarchical instructions such as “search for corrosion on the underside of each support beam.”FIG. 15B depicts an active construction site, where a drone can provide instant inspections that are precise and available 24/7. A human inspector can monitor the video and sensory feed or take control from the system if desired, or the system is able to autonomously control the drone, recognizing and scoring the construction site's sensed environment for compliance based on existing models (e.g., local building codes). Other BIM (Building Information Management) applications include persistent monitoring and reporting as well as responsive structures that react to sensed changes in their environment, for example a window washing system that uses constrained CV to monitor only the exposed panes of glass in a building and can intelligently sense the need for cleaning in a specific location and coordinate an appropriate equipment response, autonomously and without human intervention. - Human subjects (1600) can be deconstructed similarly to buildings, as seen in
FIG. 16 . Beginning with a close and precise measurement of the subject's body (1601) which can be abstracted into, for example, a point cloud (1602), composite core rigging (1603) can then be applied such that a new composite reference core orbase NURB 3D model is created (1604). This deconstruction, atomization, and reconstruction of subjects allows for precision modeling and the fusing of real and virtual worlds. - In one embodiment, such as a body measurement application for Body Mass Index or other health use-case, fitness application, garment fit or virtual fitting application, a simpler representation (1605) might be created and stored at the device for user interface or in a social site's datacenters. This obfuscates the subject's body to protect their privacy or mask their vivid body model to protect any privacy or social “body image” concerns. Furthermore, data encryption and hash processing of these images can also be automatically applied in the application on the user's device and throughout the service to protect user privacy and security.
- Depending on the output from the Sensing Module (603), the system can either create a new composition model for the Database (600), or select a composition model based on attributes deemed most appropriate for composition: body type, size, shape, height, arm position, face position. Further precise composition body models can be created for precise direction applications in photo, film, theater, musical performance, dance, yoga.
-
FIG. 17 catalogues some of the models and/or data that can be stored centrally in a database (600) available to all methods and processes throughout the system, to facilitate a universal scoring approach for all items. In one example, models for best practices for shooting a close up movie scene (1702) are stored and include such items as camera angles, out of focus affects, aperture and exposure settings, depth of field, lighting equipment types with positions and settings, dolly and boom positions relative to the subject (i.e., actor), where “extras” should stand in the scene and entrance timing, set composition. By sensing and understanding the subjects and contexts of a scene over time via those models, film equipment can be directed to react in context with the entire space. An example is a networked system of camera dollies, mic booms, and lighting equipment on a film set that identifies actors in a scene and automatically cooperates with other networked equipment to provide optimal composition dynamically and in real-time, freeing the human director to focus on storytelling. - The Database (600) can also hold 2D images of individuals and contextualized body theory models (1707), 3D models of individuals (1705), and 2D and 3D models of clothing (1704), allowing the system to score and correlate between models. In one embodiment, the system can select an appropriate suit for someone it senses is tall and thin (1705) by considering the body theory and fashion models (1707) as well as clothing attributes (1704) such as the intended fit profile or the number of buttons.
- The system can keep these models and their individual components correlated to social feedback (1703) such as Facebook, YouTube, Instagram, or Twitter using metrics such as views, likes, or changes in followers and subscribers. By connecting the system to a number of social applications, a number of use cases could directly provide context and social proof around identified individuals in a play or movie, from the overall composition cinematography of a scene in a play, music recital, movie or sports event to how well-received a personal image (501) or group image or video was (1101). This also continuously provides a method and process for tuning best practice models of all types of compositions from photography, painting, movies, skiing, mountain biking, surfing, competitive sports, exercises, yoga poses (510), dance, music, performances.
- All of these composition models can also be analyzed for trends from social popularity, from fashion, to popular dance moves and latest form alterations to yoga or fitness exercises. In one example use case, a camera (202) and broad spectrum of hardware (1706), such as lights, robotic camera booms or dollies, autonomous quadcopters, could be evaluated individually, or as part of the overall composition including such items as lights, dolly movements, camera with its multitude of settings and attributes.
- Referring now to
FIG. 18 , in one embodiment the system can facilitate the learning an instrument through the provision of real-time feedback. 3D models of an instrument, for example a guitar fretboard model, can be synthesized and used to constrain the CV algorithms so that only the fingers and relevant sections of the instrument (e.g., frets for guitars, keys for pianos, heads for drums) are being analyzed. Using the subject assignment interface fromFIG. 11 , each finger can be assigned a marker so that specific feedback can be provided to the user (e.g., “place 2nd finger on the A string on the 2nd fret.”) in a format that is understandable and useful to them. While there are many different ways to learn guitar, no other system looks at the proper hand (1802) and body (1800) position. Because the capture device (110) can be networked with other devices, instruction can be given holistically and complex behaviors and patterns such as rhythm and pick/strum technique (1805) can be analyzed effectively. Models can be created to inform behaviors varying from proper bow technique for violin to proper posture when playing keyboard. In one embodiment, advanced composition models and challenge models can be loaded into the database, making the system useful not just for beginners but anyone looking to improve their practice regimens. These models can be used as part of a curriculum to instruct, test and certify music students remotely. As withFIG. 15 , a human expert can monitor the process and provide input or the sensing, analyzing, composing and directing can be completely autonomous. In another embodiment, renditions and covers of existing songs can also be scored and compared against the original and other covers, providing a video-game like experience but with fewer hardware requirements and greater freedom. -
FIG. 19 shows an example of a golf swing (1901) to illustrate the potential of a database of models. Once the swing has been scanned, with that pre-modeled club or putter, that model is available for immediate application and stored in a Database (600). And a plurality of sensed movements can be synthesized into one, so that leading practice golf swings are sufficiently documented. Once stored, the models can be converted to compositional models, so that analysis and comparison can take place between the sensed movements and stored compositional swing, and direction and feedback can be given to the user (1902, 1903). -
FIG. 20 is a systematic view of an integrated system for Composition and Orchestration of a 3D or 2.5D space illustrating communication between user and their devices to server through a network (902) or SDN (911, 912, 913), according to one embodiment. In one embodiment a user or multiple users can connect to the Processing Unit (900) that hosts the composition event. In another embodiment, the user hardware such as a sensor (2001), TV (2003), camera (2004), mobile device such as a tablet or smartphone etc. (2005), wearable (2006), server (2007), laptop (2008) or desktop computer (2009) or any wireless device, or any electronic device can communicate directly with other devices in a network or to the devices of specific users (2002, 902). For example, in one embodiment the orchestration system might privately send unique positions and directions to four separate devices (e.g., watch, smartphone, quadcopter (1706), and an internet-connected TV) in quickly composing high-quality and repeatable photographs of actors and fans at a meet-and-greet event.
Claims (7)
1. A method, comprising:
Capturing a 2D image in a specific format of an object, subject, and scene using a device;
Sensing an object, subject, and scene automatically and continuously using the device;
Analyzing the 2D image of the object, subject, and scene captured to determine the most relevant composition and direction model;
Transforming an object, subject, and scene into a 3D model using existing reference composition/architecture model; and
Storing the 3D model of the scene in a database for use and maintaining it in a feedback loop.
2. The method of claim 1 , further comprising:
Performing continuous contextual analysis of an image and its resulting 3D model to provide an update to subsequent 3D modeling processes; and
Dynamically updating and responding to contextual analytics performed.
3. The method of claim 2 , further comprising:
Coordinating accurate tracking of objects and subjects in a scene by orchestrating autonomous equipment movements using a feedback loop.
4. The method of claim 3 , further comprising:
Controlling the direction of a scene and its subjects via devices using a feedback loop
5. The method of claim 4 , further comprising:
Creating and dynamically modifying in real-time the 2D or 3D model for the subject, object, scene, and equipment in any spatial orientation and Providing immediate feedback in a user interface.
6. The method of claim 1 , wherein the device is at least one of a camera, wearable device, desktop computer, laptop computer, phone, tablet, and other mobile computer.
7. A system, comprising:
A processing unit that can exist on a user device, on-premise, or as an off-premise service to house the following modules;
A sensing module that can understand the subjects and context of a scene over time via models;
An analytics module that can analyze sensed scenes and subjects to determine the most relevant composition and direction models or create them if necessary;
A composition/architecture module that can simultaneously store the direction of multiple subjects or objects of a scene according to one or more composition models;
A direction/control module that can provide direction and control to each subject, object, and equipment individually and relative to a scene model; and
A database that can store models for use and maintain them in a feedback loop with the above modules.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/858,901 US20160088286A1 (en) | 2014-09-19 | 2015-09-18 | Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment |
PCT/US2015/051041 WO2016044778A1 (en) | 2014-09-19 | 2015-09-18 | Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462053055P | 2014-09-19 | 2014-09-19 | |
US14/858,901 US20160088286A1 (en) | 2014-09-19 | 2015-09-18 | Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
US20160088286A1 true US20160088286A1 (en) | 2016-03-24 |
Family
ID=55525935
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/717,805 Active 2036-10-25 US10489407B2 (en) | 2014-09-19 | 2015-05-20 | Dynamic modifications of results for search interfaces |
US14/858,901 Abandoned US20160088286A1 (en) | 2014-09-19 | 2015-09-18 | Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment |
US16/531,929 Active 2035-06-14 US11275746B2 (en) | 2014-09-19 | 2019-08-05 | Dynamic modifications of results for search interfaces |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/717,805 Active 2036-10-25 US10489407B2 (en) | 2014-09-19 | 2015-05-20 | Dynamic modifications of results for search interfaces |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/531,929 Active 2035-06-14 US11275746B2 (en) | 2014-09-19 | 2019-08-05 | Dynamic modifications of results for search interfaces |
Country Status (2)
Country | Link |
---|---|
US (3) | US10489407B2 (en) |
WO (1) | WO2016044778A1 (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160077598A1 (en) * | 2015-11-19 | 2016-03-17 | International Business Machines Corporation | Client device motion control via a video feed |
US20160282872A1 (en) * | 2015-03-25 | 2016-09-29 | Yokogawa Electric Corporation | System and method of monitoring an industrial plant |
US20170280130A1 (en) * | 2016-03-25 | 2017-09-28 | Microsoft Technology Licensing, Llc | 2d video analysis for 3d modeling |
US20170351900A1 (en) * | 2016-06-02 | 2017-12-07 | Samsung Electronics Co., Ltd. | Electronic apparatus and operating method thereof |
CN108492121A (en) * | 2018-04-18 | 2018-09-04 | 景德镇止语堂陶瓷有限公司 | A kind of system and method based on the VR technical identification Freehandhand-drawing tea set true and falses |
CN108924753A (en) * | 2017-04-05 | 2018-11-30 | 意法半导体(鲁塞)公司 | The method and apparatus of real-time detection for scene |
US10168700B2 (en) * | 2016-02-11 | 2019-01-01 | International Business Machines Corporation | Control of an aerial drone using recognized gestures |
US10410289B1 (en) | 2014-09-22 | 2019-09-10 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVS) |
US10607406B2 (en) | 2018-01-25 | 2020-03-31 | General Electric Company | Automated and adaptive three-dimensional robotic site surveying |
US10621744B1 (en) * | 2015-12-11 | 2020-04-14 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction from 3D images |
US20200128193A1 (en) * | 2018-10-22 | 2020-04-23 | At&T Intellectual Property I, L.P. | Camera Array Orchestration |
US10655968B2 (en) | 2017-01-10 | 2020-05-19 | Alarm.Com Incorporated | Emergency drone guidance device |
US10837782B1 (en) | 2017-01-10 | 2020-11-17 | Alarm.Com Incorporated | Drone-guided property navigation techniques |
US10853955B1 (en) * | 2020-05-29 | 2020-12-01 | Illuscio, Inc. | Systems and methods for point cloud encryption |
US10896496B2 (en) | 2018-08-01 | 2021-01-19 | International Business Machines Corporation | Determination of high risk images using peer-based review and machine learning |
US11102420B2 (en) * | 2015-10-08 | 2021-08-24 | Gopro, Inc. | Smart shutter in low light |
US20210286912A1 (en) * | 2020-03-11 | 2021-09-16 | Siemens Schweiz Ag | Method and Arrangement for Creating a Digital Building Model |
US11468583B1 (en) | 2022-05-26 | 2022-10-11 | Illuscio, Inc. | Systems and methods for detecting and correcting data density during point cloud generation |
US11527017B1 (en) | 2022-05-03 | 2022-12-13 | Illuscio, Inc. | Systems and methods for dynamic decimation of point clouds and data points in a three-dimensional space |
US11586774B1 (en) | 2021-11-12 | 2023-02-21 | Illuscio, Inc. | Systems and methods for dynamic checksum generation and validation with customizable levels of integrity verification |
US20230376636A1 (en) * | 2019-07-24 | 2023-11-23 | Faro Technologies, Inc. | Tracking data acquired by coordinate measurement devices through a workflow |
US11856938B1 (en) | 2017-03-31 | 2024-01-02 | Alarm.Com Incorporated | Robotic rover |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9524520B2 (en) * | 2013-04-30 | 2016-12-20 | Wal-Mart Stores, Inc. | Training a classification model to predict categories |
US9524319B2 (en) | 2013-04-30 | 2016-12-20 | Wal-Mart Stores, Inc. | Search relevance |
US10489407B2 (en) | 2014-09-19 | 2019-11-26 | Ebay Inc. | Dynamic modifications of results for search interfaces |
WO2017061038A1 (en) * | 2015-10-09 | 2017-04-13 | 楽天株式会社 | Information processing device, information processing method, and information processing program |
US10482146B2 (en) * | 2016-05-10 | 2019-11-19 | Massachusetts Institute Of Technology | Systems and methods for automatic customization of content filtering |
US10412291B2 (en) | 2016-05-19 | 2019-09-10 | Scenera, Inc. | Intelligent interface for interchangeable sensors |
US10509459B2 (en) | 2016-05-19 | 2019-12-17 | Scenera, Inc. | Scene-based sensor networks |
US10693843B2 (en) | 2016-09-02 | 2020-06-23 | Scenera, Inc. | Security for scene-based sensor networks |
US11093511B2 (en) | 2016-10-12 | 2021-08-17 | Salesforce.Com, Inc. | Ranking search results using hierarchically organized coefficients for determining relevance |
CN107133280A (en) * | 2017-04-14 | 2017-09-05 | 合信息技术(北京)有限公司 | The response method and device of feedback |
US11157745B2 (en) | 2018-02-20 | 2021-10-26 | Scenera, Inc. | Automated proximity discovery of networked cameras |
US11232163B2 (en) * | 2018-08-23 | 2022-01-25 | Walmart Apollo, Llc | Method and apparatus for ecommerce search ranking |
US11127064B2 (en) | 2018-08-23 | 2021-09-21 | Walmart Apollo, Llc | Method and apparatus for ecommerce search ranking |
US10990840B2 (en) | 2019-03-15 | 2021-04-27 | Scenera, Inc. | Configuring data pipelines with image understanding |
CN110175579A (en) * | 2019-05-29 | 2019-08-27 | 北京百度网讯科技有限公司 | Attitude determination method, the methods of exhibiting of scene image, device, equipment and medium |
CN112016440B (en) * | 2020-08-26 | 2024-02-20 | 杭州云栖智慧视通科技有限公司 | Target pushing method based on multi-target tracking |
GB2598748A (en) * | 2020-09-10 | 2022-03-16 | Advanced Commerce Ltd | Scheduling displays on a terminal device |
WO2022147746A1 (en) * | 2021-01-08 | 2022-07-14 | Ebay Inc. | Intelligent computer search engine removal of search results |
KR102396287B1 (en) * | 2021-03-31 | 2022-05-10 | 쿠팡 주식회사 | Electronic apparatus and information providing method thereof |
US11847680B1 (en) * | 2021-06-28 | 2023-12-19 | Amazon Technologies, Inc. | Computer-implemented method, a computing device, and a non-transitory computer readable storage medium for presenting attribute variations |
CN114168632A (en) * | 2021-12-07 | 2022-03-11 | 泰康保险集团股份有限公司 | Abnormal data identification method and device, electronic equipment and storage medium |
CN115114690B (en) * | 2022-07-20 | 2024-06-21 | 上海航空工业(集团)有限公司 | Flight engineering algorithm arranging method, flight engineering algorithm arranging device and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120050524A1 (en) * | 2010-08-25 | 2012-03-01 | Lakeside Labs Gmbh | Apparatus and method for generating an overview image of a plurality of images using an accuracy information |
US20120050525A1 (en) * | 2010-08-25 | 2012-03-01 | Lakeside Labs Gmbh | Apparatus and method for generating an overview image of a plurality of images using a reference plane |
US20140340427A1 (en) * | 2012-01-18 | 2014-11-20 | Logos Technologies Llc | Method, device, and system for computing a spherical projection image based on two-dimensional images |
US9094670B1 (en) * | 2012-09-25 | 2015-07-28 | Amazon Technologies, Inc. | Model generation and database |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6489968B1 (en) * | 1999-11-18 | 2002-12-03 | Amazon.Com, Inc. | System and method for exposing popular categories of browse tree |
US20050149458A1 (en) * | 2002-02-27 | 2005-07-07 | Digonex Technologies, Inc. | Dynamic pricing system with graphical user interface |
US8078607B2 (en) * | 2006-03-30 | 2011-12-13 | Google Inc. | Generating website profiles based on queries from webistes and user activities on the search results |
US7577665B2 (en) * | 2005-09-14 | 2009-08-18 | Jumptap, Inc. | User characteristic influenced search results |
US20120179566A1 (en) * | 2005-09-14 | 2012-07-12 | Adam Soroca | System for retrieving mobile communication facility user data from a plurality of providers |
WO2012030678A2 (en) * | 2010-08-30 | 2012-03-08 | Tunipop, Inc. | Techniques for facilitating on-line electronic commerce transactions relating to the sale of goods and merchandise |
BR112013002095A2 (en) * | 2011-03-30 | 2016-05-24 | Rakuten Inc | device and method of providing information, recording medium, device and method of displaying information, and information search system |
EP2600316A1 (en) * | 2011-11-29 | 2013-06-05 | Inria Institut National de Recherche en Informatique et en Automatique | Method, system and software program for shooting and editing a film comprising at least one image of a 3D computer-generated animation |
EP2973040A1 (en) * | 2013-03-15 | 2016-01-20 | NIKE Innovate C.V. | Product presentation assisted by visual search |
US10489407B2 (en) | 2014-09-19 | 2019-11-26 | Ebay Inc. | Dynamic modifications of results for search interfaces |
-
2015
- 2015-05-20 US US14/717,805 patent/US10489407B2/en active Active
- 2015-09-18 US US14/858,901 patent/US20160088286A1/en not_active Abandoned
- 2015-09-18 WO PCT/US2015/051041 patent/WO2016044778A1/en active Application Filing
-
2019
- 2019-08-05 US US16/531,929 patent/US11275746B2/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120050524A1 (en) * | 2010-08-25 | 2012-03-01 | Lakeside Labs Gmbh | Apparatus and method for generating an overview image of a plurality of images using an accuracy information |
US20120050525A1 (en) * | 2010-08-25 | 2012-03-01 | Lakeside Labs Gmbh | Apparatus and method for generating an overview image of a plurality of images using a reference plane |
US20140340427A1 (en) * | 2012-01-18 | 2014-11-20 | Logos Technologies Llc | Method, device, and system for computing a spherical projection image based on two-dimensional images |
US9094670B1 (en) * | 2012-09-25 | 2015-07-28 | Amazon Technologies, Inc. | Model generation and database |
Cited By (68)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10535103B1 (en) | 2014-09-22 | 2020-01-14 | State Farm Mutual Automobile Insurance Company | Systems and methods of utilizing unmanned vehicles to detect insurance claim buildup |
US10685404B1 (en) | 2014-09-22 | 2020-06-16 | State Farm Mutual Automobile Insurance Company | Loss mitigation implementing unmanned aerial vehicles (UAVs) |
US12062097B1 (en) | 2014-09-22 | 2024-08-13 | State Farm Mutual Automobile Insurance Company | Disaster damage analysis and loss mitigation implementing unmanned aerial vehicles (UAVs) |
US12033221B2 (en) | 2014-09-22 | 2024-07-09 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVs) |
US10949930B1 (en) | 2014-09-22 | 2021-03-16 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVS) |
US12020330B2 (en) | 2014-09-22 | 2024-06-25 | State Farm Mutual Automobile Insurance Company | Accident reconstruction implementing unmanned aerial vehicles (UAVs) |
US11816736B2 (en) | 2014-09-22 | 2023-11-14 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVs) |
US11002540B1 (en) | 2014-09-22 | 2021-05-11 | State Farm Mutual Automobile Insurance Company | Accident reconstruction implementing unmanned aerial vehicles (UAVs) |
US10963968B1 (en) | 2014-09-22 | 2021-03-30 | State Farm Mutual Automobile Insurance Company | Unmanned aerial vehicle (UAV) data collection and claim pre-generation for insured approval |
US10949929B1 (en) | 2014-09-22 | 2021-03-16 | State Farm Mutual Automobile Insurance Company | Loss mitigation implementing unmanned aerial vehicles (UAVS) |
US11710191B2 (en) | 2014-09-22 | 2023-07-25 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVs) |
US11704738B2 (en) | 2014-09-22 | 2023-07-18 | State Farm Mutual Automobile Insurance Company | Unmanned aerial vehicle (UAV) data collection and claim pre-generation for insured approval |
US10909628B1 (en) | 2014-09-22 | 2021-02-02 | State Farm Mutual Automobile Insurance Company | Accident fault determination implementing unmanned aerial vehicles (UAVS) |
US11334940B1 (en) | 2014-09-22 | 2022-05-17 | State Farm Mutual Automobile Insurance Company | Accident reconstruction implementing unmanned aerial vehicles (UAVs) |
US11334953B1 (en) | 2014-09-22 | 2022-05-17 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVS) |
US10650469B1 (en) | 2014-09-22 | 2020-05-12 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVs) |
US11195234B1 (en) | 2014-09-22 | 2021-12-07 | State Farm Mutual Automobile Insurance Company | Systems and methods of utilizing unmanned vehicles to detect insurance claim buildup |
US10410289B1 (en) | 2014-09-22 | 2019-09-10 | State Farm Mutual Automobile Insurance Company | Insurance underwriting and re-underwriting implementing unmanned aerial vehicles (UAVS) |
US20160282872A1 (en) * | 2015-03-25 | 2016-09-29 | Yokogawa Electric Corporation | System and method of monitoring an industrial plant |
US9845164B2 (en) * | 2015-03-25 | 2017-12-19 | Yokogawa Electric Corporation | System and method of monitoring an industrial plant |
US11588980B2 (en) | 2015-10-08 | 2023-02-21 | Gopro, Inc. | Smart shutter in low light |
US11102420B2 (en) * | 2015-10-08 | 2021-08-24 | Gopro, Inc. | Smart shutter in low light |
US20160077598A1 (en) * | 2015-11-19 | 2016-03-17 | International Business Machines Corporation | Client device motion control via a video feed |
US10712827B2 (en) * | 2015-11-19 | 2020-07-14 | International Business Machines Corporation | Client device motion control via a video feed |
US11163369B2 (en) * | 2015-11-19 | 2021-11-02 | International Business Machines Corporation | Client device motion control via a video feed |
US10353473B2 (en) * | 2015-11-19 | 2019-07-16 | International Business Machines Corporation | Client device motion control via a video feed |
US11042944B1 (en) | 2015-12-11 | 2021-06-22 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction and insurance quote generating using 3D images |
US10832332B1 (en) | 2015-12-11 | 2020-11-10 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US10621744B1 (en) * | 2015-12-11 | 2020-04-14 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction from 3D images |
US11682080B1 (en) | 2015-12-11 | 2023-06-20 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US12062100B2 (en) | 2015-12-11 | 2024-08-13 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US11151655B1 (en) | 2015-12-11 | 2021-10-19 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction and claims processing using 3D images |
US11704737B1 (en) | 2015-12-11 | 2023-07-18 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US10706573B1 (en) | 2015-12-11 | 2020-07-07 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction from 3D images |
US10832333B1 (en) * | 2015-12-11 | 2020-11-10 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US11508014B1 (en) | 2015-12-11 | 2022-11-22 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US12039611B2 (en) | 2015-12-11 | 2024-07-16 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction using drone-generated 3D image data |
US11599950B2 (en) | 2015-12-11 | 2023-03-07 | State Farm Mutual Automobile Insurance Company | Structural characteristic extraction from 3D images |
US10168700B2 (en) * | 2016-02-11 | 2019-01-01 | International Business Machines Corporation | Control of an aerial drone using recognized gestures |
US20170280130A1 (en) * | 2016-03-25 | 2017-09-28 | Microsoft Technology Licensing, Llc | 2d video analysis for 3d modeling |
US10635902B2 (en) * | 2016-06-02 | 2020-04-28 | Samsung Electronics Co., Ltd. | Electronic apparatus and operating method thereof |
US20170351900A1 (en) * | 2016-06-02 | 2017-12-07 | Samsung Electronics Co., Ltd. | Electronic apparatus and operating method thereof |
US10655968B2 (en) | 2017-01-10 | 2020-05-19 | Alarm.Com Incorporated | Emergency drone guidance device |
US11788844B2 (en) * | 2017-01-10 | 2023-10-17 | Alarm.Com Incorporated | Drone-guided property navigation techniques |
US11060873B2 (en) | 2017-01-10 | 2021-07-13 | Alarm.Com Incorporated | Emergency drone guidance device |
US10837782B1 (en) | 2017-01-10 | 2020-11-17 | Alarm.Com Incorporated | Drone-guided property navigation techniques |
US11698260B2 (en) | 2017-01-10 | 2023-07-11 | Alarm.Com Incorporated | Emergency drone guidance device |
US12066292B2 (en) | 2017-01-10 | 2024-08-20 | Alarm.Com Incorporated | Emergency drone guidance device |
US20210063164A1 (en) * | 2017-01-10 | 2021-03-04 | Alarm.Com Incorporated | Drone-guided property navigation techniques |
US11856938B1 (en) | 2017-03-31 | 2024-01-02 | Alarm.Com Incorporated | Robotic rover |
CN108924753A (en) * | 2017-04-05 | 2018-11-30 | 意法半导体(鲁塞)公司 | The method and apparatus of real-time detection for scene |
US10789477B2 (en) | 2017-04-05 | 2020-09-29 | Stmicroelectronics (Rousset) Sas | Method and apparatus for real-time detection of a scene |
US10607406B2 (en) | 2018-01-25 | 2020-03-31 | General Electric Company | Automated and adaptive three-dimensional robotic site surveying |
CN108492121A (en) * | 2018-04-18 | 2018-09-04 | 景德镇止语堂陶瓷有限公司 | A kind of system and method based on the VR technical identification Freehandhand-drawing tea set true and falses |
US10896496B2 (en) | 2018-08-01 | 2021-01-19 | International Business Machines Corporation | Determination of high risk images using peer-based review and machine learning |
US10841509B2 (en) * | 2018-10-22 | 2020-11-17 | At&T Intellectual Property I, L.P. | Camera array orchestration |
US20200128193A1 (en) * | 2018-10-22 | 2020-04-23 | At&T Intellectual Property I, L.P. | Camera Array Orchestration |
US20230376636A1 (en) * | 2019-07-24 | 2023-11-23 | Faro Technologies, Inc. | Tracking data acquired by coordinate measurement devices through a workflow |
US12112099B2 (en) * | 2020-03-11 | 2024-10-08 | Siemens Schweiz Ag | Method and arrangement for creating a digital building model |
US20210286912A1 (en) * | 2020-03-11 | 2021-09-16 | Siemens Schweiz Ag | Method and Arrangement for Creating a Digital Building Model |
US11074703B1 (en) | 2020-05-29 | 2021-07-27 | Illuscio, Inc. | Systems and methods for generating point clouds with different resolutions using encryption |
US10964035B1 (en) | 2020-05-29 | 2021-03-30 | Illuscio, Inc. | Systems and methods for point cloud decryption |
US10853955B1 (en) * | 2020-05-29 | 2020-12-01 | Illuscio, Inc. | Systems and methods for point cloud encryption |
US11586774B1 (en) | 2021-11-12 | 2023-02-21 | Illuscio, Inc. | Systems and methods for dynamic checksum generation and validation with customizable levels of integrity verification |
US11881002B2 (en) | 2022-05-03 | 2024-01-23 | Illuscio, Inc. | Systems and methods for dynamic decimation of point clouds and data points in a three-dimensional space |
US11527017B1 (en) | 2022-05-03 | 2022-12-13 | Illuscio, Inc. | Systems and methods for dynamic decimation of point clouds and data points in a three-dimensional space |
US11468583B1 (en) | 2022-05-26 | 2022-10-11 | Illuscio, Inc. | Systems and methods for detecting and correcting data density during point cloud generation |
US11580656B1 (en) | 2022-05-26 | 2023-02-14 | Illuscio, Inc. | Systems and methods for detecting and correcting data density during point cloud generation |
Also Published As
Publication number | Publication date |
---|---|
WO2016044778A1 (en) | 2016-03-24 |
US11275746B2 (en) | 2022-03-15 |
US10489407B2 (en) | 2019-11-26 |
US20190354532A1 (en) | 2019-11-21 |
US20160085813A1 (en) | 2016-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20160088286A1 (en) | Method and system for an automatic sensing, analysis, composition and direction of a 3d space, scene, object, and equipment | |
US10721439B1 (en) | Systems and methods for directing content generation using a first-person point-of-view device | |
US9460340B2 (en) | Self-initiated change of appearance for subjects in video and images | |
CN104866101B (en) | The real-time interactive control method and device of virtual objects | |
JP6470356B2 (en) | Program and method executed by computer for providing virtual space, and information processing apparatus for executing the program | |
JP2020174345A (en) | System and camera device for capturing image | |
WO2019205284A1 (en) | Ar imaging method and apparatus | |
US11688079B2 (en) | Digital representation of multi-sensor data stream | |
EP3268096A1 (en) | Avatar control system | |
US11682157B2 (en) | Motion-based online interactive platform | |
CN114638918B (en) | Real-time performance capturing virtual live broadcast and recording system | |
US20180124374A1 (en) | System and Method for Reducing System Requirements for a Virtual Reality 360 Display | |
CN105933637A (en) | Video communication method and system | |
US20230168737A1 (en) | Augmented reality object manipulation | |
KR102200239B1 (en) | Real-time computer graphics video broadcasting service system | |
CN116940966A (en) | Real world beacons indicating virtual locations | |
US20160316249A1 (en) | System for providing a view of an event from a distance | |
US20230218984A1 (en) | Methods and systems for interactive gaming platform scene generation utilizing captured visual data and artificial intelligence-generated environment | |
CN108320331A (en) | A kind of method and apparatus for the augmented reality video information generating user's scene | |
JP2023140922A (en) | Display terminal, information processing system, communication system, display method, information processing method, communication method, and program | |
JP2017228146A (en) | Image processing device, image processing method, and computer program | |
TWI836582B (en) | Virtual reality system and object detection method applicable to virtual reality system | |
CN108833741A (en) | The virtual film studio system and method combined are caught with dynamic in real time for AR | |
DeHart | Directing audience attention: cinematic composition in 360 natural history films | |
Su et al. | Multistage: Acting across distance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: FORSYTHE, HAMISH, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FORSYTHE, HAMISH;CECIL, ALEXANDER;REEL/FRAME:036640/0606 Effective date: 20150917 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |