US20220222772A1 - True Positive Transplant - Google Patents

True Positive Transplant Download PDF

Info

Publication number
US20220222772A1
US20220222772A1 US17/657,464 US202217657464A US2022222772A1 US 20220222772 A1 US20220222772 A1 US 20220222772A1 US 202217657464 A US202217657464 A US 202217657464A US 2022222772 A1 US2022222772 A1 US 2022222772A1
Authority
US
United States
Prior art keywords
images
computing system
image
computer
augmented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US17/657,464
Inventor
Ignacio Pablo Mellado Bataller
Patrick Christopher Leger
Alexa Greenberg
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
X Development LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by X Development LLC filed Critical X Development LLC
Priority to US17/657,464 priority Critical patent/US20220222772A1/en
Assigned to X DEVELOPMENT LLC reassignment X DEVELOPMENT LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GREENBERG, ALEXA, BATALLER, IGNACIO PABLO MELLADO, LEGER, CHRIS
Publication of US20220222772A1 publication Critical patent/US20220222772A1/en
Assigned to GOOGLE LLC reassignment GOOGLE LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: X DEVELOPMENT LLC
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/041Digitisers, e.g. for touch screens or touch pads, characterised by the transducing means
    • G06T3/0006
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06K9/6256
    • G06K9/6269
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • overfitting is when a predictive model makes inferences that correspond too closely or exactly to a particular data set.
  • the predictive model often contains superfluous parameters that capture idiosyncrasies of the particular data set. Because of these parameters, the predictive model generally performs well on the particular data set, but performs poorly on new, previously unseen data sets.
  • Example embodiments involve a data augmentation system.
  • the system may include a segmentation module operable to segment a foreground object in an image from a background of the image.
  • the system may also include a transformation module operable to transform one or more object properties of an object. Using these two modules, the system may generate augmented images that contain variations of the foreground object.
  • a computer-implemented method includes locating, by a computing system, a foreground object disposed within a seed image, where the computing system includes an initial set of images for training a predictive model.
  • the method also includes identifying, by the computing system, an object class corresponding to the foreground object.
  • the method further includes, based on the identified object class, determining, by the computing system, a target value for an object property of the foreground object.
  • the method also includes applying, by the computing system, a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value.
  • the method additionally includes transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image.
  • the method even further includes augmenting, by the computing system, the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • a computing system may include an initial set of images for training a predictive model.
  • the computing system may also include one or more processors configured to cause the computing system to carry out operations.
  • the operations may include locating a foreground object disposed within a seed image.
  • the operations may also include identifying an object class corresponding to the foreground object.
  • the operations may further include, based on the identified object class, determining a target value for an object property of the foreground object.
  • the operations may additionally include applying a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value.
  • the operations may further include transplanting the transformed object into a background image so as to produce an augmented image.
  • the operations may even further include augmenting the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • an article of manufacture may include a non-transitory computer-readable medium having stored thereon program instructions that, upon execution by one or more processors of a computing system that contains an initial set of images for training a predictive model, cause the computing system to carry out operations.
  • the operations may include locating a foreground object disposed within a seed image.
  • the operations may also include identifying an object class corresponding to the foreground object.
  • the operations may further include, based on the identified object class, determining a target value for an object property of the foreground object.
  • the operations may additionally include applying a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value.
  • the operations may further include transplanting the transformed object into a background image so as to produce an augmented image.
  • the operations may even further include augmenting the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • FIG. 1 illustrates a computing system, in accordance with example embodiments.
  • FIG. 2 illustrates operations of a segmentation module, in accordance with example embodiments.
  • FIG. 3 illustrates operations of a transformation module, in accordance with example embodiments.
  • FIG. 4A depicts a ground truth object property table, in accordance with example embodiments.
  • FIG. 4B depicts a frequency distribution, in accordance with example embodiments.
  • FIG. 5 illustrates an example system, in accordance with example embodiments.
  • FIG. 6 depicts a message flow, in accordance with example embodiments.
  • FIG. 7 illustrates a method, in accordance with example embodiments.
  • Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features unless indicated as such. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein.
  • ordinal numbers such as “first,” “second,” “third” and so on is to distinguish respective elements rather than to denote a particular order of those elements.
  • the terms “multiple” and “a plurality of” refer to “two or more” or “more than one.”
  • any enumeration of elements, blocks, or steps in this specification or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are carried out in a particular order.
  • a computing system could be configured to augment an initial set of training images with one or more “augmented images”.
  • augmented images could include variations of the objects contained in the initial set of training images. For instance, if the initial set of training images contained an image with a chair, then the augmented images could include images with variations of that chair. As one example, the augmented images could include images that depict the chair rotated at different angles (e.g., the chair rotated at 90° from its original orientation, the chair rotated at 180° from its original orientation, etc.).
  • the augmented images could include images with the chair disposed in different background environments (e.g., the chair disposed in kitchen environment, the chair disposed in a living room environment, the chair disposed in a bedroom environment, etc.).
  • the augmented images could include images with the chair scaled to different sizes (e.g., the height of the chair scaled down 50% from its original size, the height of the chair scaled up 50% from its original size, etc.). Other variations could also exist.
  • the computing system may include a segmentation module operable to receive an image and responsively segment object(s) disposed within the image from the background of the image.
  • the images provided to the segmentation module could be considered to be “seed images” because the objects within these images may be used as a basis to generate hundreds, if not thousands of augmented images.
  • a human operator could provide any or all of the seed images, including the locations of object(s) within the seed images and object classes of the object(s) within the seed images.
  • the computing system may receive pre-segmented object(s) from a client device, in which case the operations of the segmentation module may be optional.
  • the computing system may also include a transformation module operable to receive object(s) segmented by the segmentation module and responsively apply a transformation function to transform the segmented object(s) into one or more transformed objects. Then, the transformation module could transplant the transformed object(s) into one or more background images so as to produce one or more augmented images. The transformation module could add the augmented image(s) to an initial set of training images to produce an augmented set of images for training a predictive model.
  • a transformation module operable to receive object(s) segmented by the segmentation module and responsively apply a transformation function to transform the segmented object(s) into one or more transformed objects. Then, the transformation module could transplant the transformed object(s) into one or more background images so as to produce one or more augmented images. The transformation module could add the augmented image(s) to an initial set of training images to produce an augmented set of images for training a predictive model.
  • the transformation module could utilize the ground truth property values of the object(s) being transformed as a basis to perform intelligent and representative object transformations. For example, if images with chairs generally depict chairs having heights between 50-100 centimeters (cm), then a representative transformation of a chair would transform the chair to have a height between 50-100 cm. As another example, if images with fire hydrants generally depict fire hydrants being colored either red or yellow, then a representative transformation of a fire hydrant would transform the fire hydrant to be colored either red or yellow.
  • a technical advantage of this approach is that the object transformations are based on actual object properties exhibited in real-world images. This can improve the performance of the predictive model when it makes inferences on real-world images.
  • the transformation module described herein could also contain background images taken/captured from a wide variety of environments.
  • the transformation module may contain background images taken/captured from parks, offices, streets, playgrounds, beaches, homes, and so on.
  • the transformation module could transplant transformed object(s) into these background images to create augmented image(s) for training the predictive model.
  • the variability of the background images helps to further increase the diversity of the augmented image(s) generated by the transformation module.
  • Examples described herein also provide for a system that automatically detects class imbalances in training data and responsively generates augmented image(s) that can balance the disproportional training data.
  • the described system upon detecting a class imbalance, the described system could pause or otherwise halt the training process of a predictive model.
  • the system could segment poorly represented object(s) disposed in the training data.
  • the described system could apply the aforementioned transformation module to generate augmented image(s) using the segmented object(s).
  • These augmented image(s) could be added to the training data to create augmented training data.
  • the described system could later resume the training process with the augmented training data.
  • the example computing systems described herein may be part of or may take the form of a robotic system.
  • a robotic system may include sensors for capturing information of the environment in which the robotic system is operating.
  • the sensors may monitor the environment in real time, and detect obstacles, elements of the terrain, weather conditions, temperature, or other aspects of the environment.
  • the sensors may capture data corresponding to one or more characteristics of objects in the environment, such as a size, shape, profile, structure, or orientation of the objects.
  • the robotic system may use the captured sensor information as input into the aforementioned predictive models, which may assist the robotic system with classifying/identifying objects in its environment.
  • the robotic system when navigating through an environment, may capture images of the environment and may store the captured images for later use. Then, in order to train the aforementioned predictive models, the robotic system may use the methods described herein to add augmented images to the images previously captured representing the robotic system's environment. Because the robotic system may operate in a limited set of environments—and thus only captures images from the limited set of environments—the augmented images can help the robotic system identify objects and otherwise operate in previously unseen environments.
  • a central computing system may receive images from multiple robotic devices, and may use the images to develop augmented training image sets for use by any or all of the robotic devices.
  • FIG. 1 illustrates computing system 100 , in accordance with example embodiments.
  • Computing system 100 may be an example system that could automatically augment an initial set of training images with one or more augmented images.
  • Computing system 100 may be implemented in various forms, such as a server device, mobile device, a robotic device, an autonomous vehicle, or some other arrangement. Some example implementations involve a computing system 100 engineered to be low cost at scale and designed to support a variety of tasks. Computing system 100 may also be optimized for machine learning.
  • computing system 100 may include processor(s) 102 , data storage 104 , and controller(s) 108 , which together may be part of control system 110 .
  • Computing system 100 may also include network interface 112 , power source 114 , sensors 116 , robotic subsystem 120 , segmentation module 130 , transformation module 140 , and prediction module 150 . Nonetheless, computing system 100 is shown for illustrative purposes, and may include more or fewer components.
  • the various components of computing system 100 may be connected in any manner, including wired or wireless connections. Further, in some examples, components of computing system 100 may be distributed among multiple physical entities rather than a single physical entity. Other example illustrations of computing system 100 may exist as well.
  • Processor(s) 102 may operate as one or more general-purpose hardware processors and/or one or more special purpose processors (e.g., digital signal processors (DSPs), tensor processing units (TPUs), graphics processing units (GPUs), application specific integrated circuits (ASICs), etc.).
  • DSPs digital signal processors
  • TPUs tensor processing units
  • GPUs graphics processing units
  • ASICs application specific integrated circuits
  • Processor(s) 102 may be configured to execute computer-readable program instructions 106 , and manipulate data 107 , both of which are stored in data storage 104 .
  • Processor(s) 102 may also directly or indirectly interact with other components of computing system 100 , such as network interface 112 , power source 114 , sensors 116 , robotic subsystem 120 , segmentation module 130 , transformation module 140 , and prediction module 150 .
  • processor(s) 102 may be configured to execute instructions stored in data storage 104 so as to carry out one or more operations, for example,
  • Data storage 104 may be one or more types of hardware memory.
  • data storage 104 may include or take the form of one or more computer-readable storage media that can be read or accessed by processor(s) 102 .
  • the one or more computer-readable storage media can include volatile or non-volatile storage components, such as optical, magnetic, organic, or another type of memory or storage, which can be integrated in whole or in part with processor(s) 102 .
  • data storage 104 can be a single physical device.
  • data storage 104 can be implemented using two or more physical devices, which may communicate with one another via wired or wireless communication.
  • data storage 104 may include the computer-readable program instructions 106 and data 107 .
  • Data 107 may be any type of data, such as configuration data, executable data, or diagnostic data, among other possibilities.
  • Controller(s) 108 may include one or more electrical circuits, units of digital logic, computer chips, or microprocessors that are configured to (perhaps among other tasks) interface between any combination of control system 110 , network interface 112 , power source 114 , sensors 116 , robotic subsystem 120 , segmentation module 130 , transformation module 140 , and prediction module 150 , or a user of computing system 100 .
  • controller(s) 108 may be a purpose-built embedded device for performing specific operations with one or more subsystems of computing system 100 .
  • Control system 110 may monitor and physically change the operating conditions of computing system 100 . In doing so, control system 110 may serve as a link between portions of computing system 100 , such as between network interface 112 , power source 114 , sensors 116 , robotic subsystem 120 , segmentation module 130 , transformation module 140 , and prediction module 150 . Further, control system 110 may serve as an interface between computing system 100 and a user. In some embodiments, control system 110 may include various components for communicating with computing system 100 , including buttons, keyboards, etc.
  • control system 110 may communicate with other systems of computing system 100 via wired or wireless connections. Operations of control system 110 may be carried out by processor(s) 102 . Alternatively, these operations may be carried out by controller(s) 108 , or a combination of processor(s) 102 and controller(s) 108 .
  • Network interface 112 may serve as an interface between computing system 100 and another computing device.
  • Network interface 112 can include one or more wireless interfaces and/or wireline interfaces that are configurable to communicate via a network.
  • Wireless interfaces can include one or more wireless transmitters, receivers, and/or transceivers, such as a BluetoothTM transceiver, a Zigbee® transceiver, a Wi-FiTM transceiver, a WiMAXTM transceiver, and/or other similar types of wireless transceivers configurable to communicate via a wireless network.
  • Wireline interfaces can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network.
  • wireline transmitters such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network.
  • USB Universal Serial Bus
  • network interface 112 can be configured to provide reliable, secured, and/or authenticated communications.
  • information for facilitating reliable communications e.g., guaranteed message delivery
  • a message header and/or footer e.g., packet/message sequencing information, encapsulation headers and/or footers, size/time information, and transmission verification information such as cyclic redundancy check (CRC) and/or parity check values.
  • CRC cyclic redundancy check
  • Communications can be made secure (e.g., be encoded or encrypted) and/or decrypted/decoded using one or more cryptographic protocols and/or algorithms, such as, but not limited to, Data Encryption Standard (DES), Advanced Encryption Standard (AES), a Rivest-Shamir-Adelman (RSA) algorithm, a Diffie-Hellman algorithm, a secure sockets protocol such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS), and/or Digital Signature Algorithm (DSA).
  • DES Data Encryption Standard
  • AES Advanced Encryption Standard
  • RSA Rivest-Shamir-Adelman
  • Diffie-Hellman algorithm a secure sockets protocol
  • SSL Secure Sockets Layer
  • TLS Transport Layer Security
  • DSA Digital Signature Algorithm
  • Other cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decrypt/decode) communications.
  • Power source(s) 114 may be configured to supply power to various components of computing system 100 .
  • computing system 100 may include a hydraulic system, electrical system, batteries, or other types of power systems.
  • computing system 100 may include one or more batteries configured to provide charge to components of computing system 100 .
  • Some of mechanical components 122 or electrical components 124 may each connect to a different power source, may be powered by the same power source, or be powered by multiple power sources.
  • computing system 100 may include a hydraulic system configured to provide power to mechanical components 122 using fluid power. Components of computing system 100 may operate based on hydraulic fluid being transmitted throughout the hydraulic system to various hydraulic motors and hydraulic cylinders, for example. The hydraulic system may transfer hydraulic power by way of pressurized hydraulic fluid through tubes, flexible hoses, or other links between components of computing system 100 . Power source(s) 114 may charge using various types of charging, such as wired connections to an outside power source, wireless charging, combustion, or other examples.
  • Sensor(s) 116 may be arranged to sense aspects of computing system 100 .
  • Sensor(s) 116 may include one or more force sensors, torque sensors, velocity sensors, acceleration sensors, position sensors, proximity sensors, motion sensors, location sensors, load sensors, temperature sensors, touch sensors, depth sensors, ultrasonic range sensors, infrared sensors, object sensors, or cameras, among other possibilities.
  • computing system 100 may be configured to receive sensor data from sensors that are physically separated from the computing system (e.g., sensors that are positioned on other computing systems or located within the environment in which the computing system 100 is operating).
  • Sensor(s) 116 may provide sensor data to processor(s) 102 (perhaps by way of data 107 ) to allow for interaction of computing system 100 with its environment, as well as monitoring of the operation of computing system 100 .
  • the sensor data may be used in evaluation of various factors for activation, movement, and deactivation of mechanical components 122 and electrical components 124 by control system 110 .
  • sensor(s) 116 may capture data corresponding to the terrain of the environment or location of nearby objects, which may assist with environment recognition and navigation.
  • the information captured by sensor(s) 116 may be provided to segmentation module 130 , transformation module 140 , and prediction module 150 to augment a set of training data for the computing system 100 .
  • sensor(s) 116 may include RADAR (e.g., for long-range object detection, distance determination, or speed determination), LIDAR (e.g., for short-range object detection, distance determination, or speed determination), SONAR (e.g., for underwater object detection, distance determination, or speed determination), VICON® (e.g., for motion capture), one or more cameras (e.g., stereoscopic cameras for 3D vision), a global positioning system (GPS) transceiver, or other sensors for capturing information of the environment in which computing system 100 is operating. Sensor(s) 116 may monitor the environment in real time, and detect obstacles, elements of the terrain, weather conditions, temperature, or other aspects of the environment. In another example, sensor(s) 116 may capture data corresponding to one or more characteristics of a target or identified object, such as a size, shape, profile, structure, or orientation of the object.
  • RADAR e.g., for long-range object detection, distance determination, or speed determination
  • LIDAR e.g
  • computing system 100 may include sensor(s) 116 configured to receive information indicative of the state of computing system 100 , including sensor(s) 116 that may monitor the state of the various components of computing system 100 .
  • Sensor(s) 116 may measure activity of systems of computing system 100 and receive information based on the operation of the various features of computing system 100 , such as the operation of an extendable arm, an end effector, or other mechanical or electrical features of computing system 100 .
  • the data provided by sensor(s) 116 may enable control system 110 to determine errors in operation as well as monitor overall operation of components of computing system 100 .
  • computing system 100 may use force/torque sensors to measure load on various components of computing system 100 .
  • computing system 100 may include one or more force/torque sensors on an arm or end effector to measure the load on the actuators that move one or more members of the arm or end effector.
  • the computing system 100 may include a force/torque sensor at or near the wrist or end effector, but not at or near other joints of a robotic arm.
  • computing system 100 may use one or more position sensors to sense the position of the actuators of the computing system. For instance, such position sensors may sense states of extension, retraction, positioning, or rotation of the actuators on an arm or end effector.
  • sensor(s) 116 may include one or more velocity or acceleration sensors.
  • sensor(s) 116 may include an inertial measurement unit (IMU).
  • IMU inertial measurement unit
  • the IMU may sense velocity and acceleration in the world frame, with respect to the gravity vector. The velocity and acceleration sensed by the IMU may then be translated to that of computing system 100 based on the location of the IMU in computing system 100 and the kinematics of computing system 100 .
  • Computing system 100 may include other types of sensors not explicitly discussed herein. Additionally or alternatively, the computing system may use particular sensors for purposes not enumerated herein.
  • the mechanical components 122 in robotic subsystem 120 represent hardware of computing system 100 that may enable computing system 100 to perform physical operations.
  • computing system 100 may include one or more physical members, such as an arm, an end effector, a head, a neck, a torso, a base, and wheels.
  • the physical members or other parts of computing system 100 may further include actuators arranged to move the physical members in relation to one another.
  • Computing system 100 may also include one or more structured bodies for housing control system 110 or other components, and may further include other types of mechanical components.
  • the particular mechanical components 122 used may vary based on the design of the computing system, and may also be based on the operations or tasks the computing system may be configured to perform.
  • mechanical components 122 may include one or more removable components.
  • Computing system 100 may be configured to add or remove such removable components, which may involve assistance from a user.
  • computing system 100 may be configured with removable end effectors or digits that can be replaced or changed as needed or desired.
  • computing system 100 may include one or more removable or replaceable battery units, control systems, power systems, bumpers, or sensors. Other types of removable components may be included within some implementations.
  • the electrical components 124 in robotic subsystem 120 may include various mechanisms capable of processing, transferring, or providing electrical charge or electric signals.
  • electrical components 124 may include electrical wires, circuitry, or wireless communication transmitters and receivers to enable operations of computing system 100 .
  • Electrical components 124 may interwork with mechanical components 122 to enable computing system 100 to perform various operations.
  • Electrical components 124 may be configured to provide power from power source(s) 114 to the various mechanical components 122 , for example.
  • computing system 100 may include electric motors.
  • Other examples of electrical components 124 may exist as well.
  • computing system 100 may include a body, which may connect to or house appendages and components of a robotic system.
  • the structure of the body may vary within examples and may further depend on particular operations that a given robot may have been designed to perform. For example, a robot developed to carry heavy loads may have a wide body that enables placement of the load. Similarly, a robot designed to operate in tight spaces may have a relatively tall, narrow body.
  • the body or the other components may be developed using various types of materials, such as metals or plastics.
  • a robot may have a body with a different structure or made of various types of materials.
  • the body or the other components may include or carry sensor(s) 116 . These sensors may be positioned in various locations on the robotic system, such as on a body, a head, a neck, a base, a torso, an arm, or an end effector, among other examples.
  • the robotic system may be configured to carry a load, such as a type of cargo that is to be transported. In some examples, the load may be placed by the robotic system into a bin or other container attached to the robotic system.
  • the load may also represent external batteries or other types of power sources (e.g., solar panels) that the robotic system may utilize. Carrying the load represents one example use for which the robotic system may be configured, but the robotic system may be configured to perform other operations as well.
  • Segmentation module 130 may be a software application, computing device, or subsystem within computing system 100 that is operable to receive seed image(s) and responsively segment object(s) disposed in the seed image(s) from the backgrounds of the seed image(s). In some implementations, segmentation module 130 may receive a single image and may segment a single object from that single image. In other implementations, segmentation module 130 may receive multiple images and may segment different objects from each of the multiple images. After segmenting the object(s), segmentation module 130 could transmit the segmented object(s) to transformation module 140 .
  • FIG. 2 shows how segmentation module 130 receives image 210 and then responsively analyzes image 210 to segment object 220 from background environment 230 . Then segmentation module 130 may transmit the segmented version of object 220 (perhaps along with other parameters) to transformation module 140 .
  • image 210 could be a labeled image containing categorical labels for each of its pixels. These categorical labels could help identify the object classes for objects in image 210 . For example, pixels in image 210 that correspond to object 220 may be labeled with the categorical label “fire hydrant” whereas the pixels in image 210 that correspond to background environment 230 may be labeled as “background” or “street”. Segmentation module 130 can utilize these categorical labels to segment object 220 from background environment 230 . Specifically, segmentation module 130 could determine each pixel in image 210 that has a categorical label corresponding to an object of interest (e.g., “fire hydrant”) and may extract those determined pixels from image 210 . Then, segmentation module 130 could transmit the extracted pixels to transformation module 140 .
  • object of interest e.g., “fire hydrant”
  • an image provided to segmentation module 130 could contain multiple objects of interest.
  • image 210 is shown to contain auxiliary object 222 A and auxiliary object 222 B, both of which may be labeled with the categorical label “car”.
  • segmentation module 130 could configured to extract all objects of interest (e.g., extract object 220 along with auxiliary object 222 A and auxiliary object 222 B) or may be configured to only extract certain objects of interest (e.g., only extract object 220 but not auxiliary object 222 A and auxiliary object 222 B).
  • a user for example via a graphical user interface, can provide segmentation module 130 with information on which objects of interest to segment from an input image.
  • transformation module 140 and/or prediction module 150 could provide segmentation module 130 with information on which objects of interest to segment from an input image.
  • image 210 may be an unlabeled image.
  • segmentation module 130 may include an object detection module that may detect various objects in image 210 but may be unable to identify the categorical labels/object classes associated with the detected objects. To obtain these object classes, segmentation module 130 may present the detected objects to a user, perhaps through a graphical user interface, who may assign categorical labels to each of the detected objects. Using the assigned labels, segmentation module 130 may proceed with determining each pixel in image 210 that has a categorical label corresponding to an object of interest and may extract those determined pixels from image 210 .
  • Transformation module 140 may be a software application, computing device, or subsystem within computing system 100 that is operable receive object(s) segmented by segmentation module 130 and responsively apply a transformation function to transform the object(s) into one or more transformed objects. Then, transformation module 140 could transplant the transformed object(s) into one or more background images so as to produce one or more augmented images.
  • FIG. 3 shows how transformation module 140 receives object 310 and transforms object 310 into transformed object 320 and transformed object 330 . After this, transformation module 140 transplants transformed object 320 onto background 322 to produce augmented image 324 and transplants transformed object 330 onto background 332 to produce augmented image 334 . Both augmented image 324 and augmented image 334 could then be used to augment a training data set used by prediction module 150 .
  • the transformation applied by transformation module 140 could map each pixel in object 310 to one or more output pixels in transformed object 320 (or transformed object 330 ).
  • the mapping could take the form of an affine transformation, a linear transformation, or another type of image processing transformation.
  • the mapping modifies one or more object properties of object 310 .
  • object properties may include, but are not limited to: the height or width of object 310 , the relative size of object 310 (e.g., the amount that object 310 is sized up or sized down from its initial size), the relative rotation of object 310 (e.g., the amount that object 310 is rotated clockwise or counterclockwise from its initial orientation), or the color of object 310 , among other possibilities.
  • transformation module 140 transforms the relative rotation of object 310 approximately 45° clockwise to generate transformed object 320 and transforms the relative rotation of object 310 approximately 180° clockwise to generate transformed object 330 .
  • transformation module 140 utilizes randomly generated object property values to transform object 310 . For instance, transformation module 140 could randomly generate a first rotation value between 0° and 90° and could use the first rotation value as a basis to transform the relative rotation of object 310 and generate a transformed object. Then, transformation module 140 could randomly generate a second rotation value between 0° and 90° and could use the second rotation value as a basis to transform the relative rotation of object 310 and generate a second transformed object.
  • transformation module 140 utilizes the ground truth property values of the object being transformed as a basis to perform intelligent and representative object transformations.
  • ground truth property values may refer to property values that an object frequently exhibits in real-world images. For example, if real-world images of chairs generally depict chairs having a height of 75 cm, then a ground truth height value for a chair may be 75 cm. Basing transformations on how objects actually appear in real-world images could improve the performance of predictive model 150 when it makes inferences on real-world images.
  • ground truth table 400 includes record 402 and record 404 , both of which have corresponding entries for height property 412 , width property 414 , and rotation property 416 .
  • Height property 412 could provide transformation module 140 with the necessary details for transforming the height of an object. As shown in FIG. 4A , the height property 412 entry for record 402 indicates that the ground truth height for “fork” objects is between 10 cm and 30 cm. Similarly, the height property 412 entry for record 404 indicates that the ground truth height for “fire hydrant” objects is between 100 cm and 200 cm.
  • Width property 414 could provide transformation module 140 with the necessary details for transforming the width of an object. As shown in FIG. 4A , the width property 414 entry for record 402 indicates that the ground truth width for “fork” objects is between 2 cm and 9 cm. Similarly, the width property 414 entry for record 404 indicates that the ground truth width for “fire hydrant” objects is between 50 cm and 100 cm.
  • Rotation property 416 could provide transformation module 140 with the necessary details for transforming the relative rotation of an object.
  • the rotation property 416 entry for record 402 indicates that the ground truth rotation for “fork” objects is between ⁇ 90° and 90° (e.g., where 0° corresponds to completely vertical, a negative degree corresponds to a counterclockwise rotation from 0°, and a positive degree corresponds to a clockwise rotation from) 0°.
  • the rotation property 416 entry for record 404 indicates that the ground truth rotation for “fire hydrant” objects is between ⁇ 10° and 10°.
  • ground truth table 400 The layout and entries in ground truth table 400 are provided as an example and are not intended to be limiting with respect to the embodiments herein.
  • the ground truth property values in ground truth table 400 could be sets of discrete values.
  • the ground truth property values in ground truth table 400 could be ranges of values.
  • transformation module 140 could use ground truth table 400 to identify the ground truth property values for an object being transformed. From these ground truth property values, transformation module 140 could select a target property value to use for an object transformation. For example, if transformation module 140 were to transform a “fork” object, transformation module 140 may refer to entry 402 in ground truth table 400 and could select a target height value somewhere between 10 cm and 30 cm (e.g., 25 cm).
  • Transformation module 140 would then transform the “fork” object to have a resulting height of 25 cm.
  • the selection of the target property value from the ground truth property values could be performed randomly or could be performed based on a statistical metric of the ground truth property values (e.g., a median value is always selected, values within one standard deviation of a mean are always selected, etc.).
  • the ground truth property values in ground truth table 400 are in the form of a frequency distribution.
  • FIG. 4B illustrates frequency distribution 440 for the ground truth height property of a “fork” object, where the x-axis corresponds to height and the y-axis corresponds to the number of the “fork” objects in real-world images that exhibit a specific height. Transformation module 140 could randomly sample a value from frequency distribution 440 to use as the target property value in an object transformation.
  • Prediction module 150 may contain one or more predictive models including, but not limited to: an artificial neural network, a Bayesian network, a hidden Markov model, a Markov decision process, a logistic regression function, a support vector machine, a statistical machine learning algorithm, and/or a heuristic machine learning system.
  • predictive models including, but not limited to: an artificial neural network, a Bayesian network, a hidden Markov model, a Markov decision process, a logistic regression function, a support vector machine, a statistical machine learning algorithm, and/or a heuristic machine learning system.
  • the predictive models of prediction module 150 may be trained on an input set of training data to recognize patterns in the training data and provide output inferences and/or predictions about patterns in the training data.
  • the resulting trained predictive models can be called trained predictive models.
  • the predictive models can be trained by providing the initial set of images or the augmented set of images generated by transformation module 140 as training input.
  • the predictive models may use various training techniques, such as unsupervised learning, supervised learning, semi-supervised learning, reinforcement learning, transfer learning, incremental learning, or curriculum learning, among other possibilities.
  • the predictive models can be trained using one or more computer processors and/or on-device coprocessors.
  • the on-device coprocessor(s) can include, but are not limited to one or more graphic processing units (GPUs), one or more tensor processing units (TPUs), one or more digital signal processors (DSPs), and/or one or more application specific integrated circuits (ASICs).
  • GPUs graphic processing units
  • TPUs tensor processing units
  • DSPs digital signal processors
  • ASICs application specific integrated circuits
  • the trained predictive models of prediction module 150 can receive input data and generate corresponding inference(s) and/or prediction(s) about the input data.
  • the input data can include a collection of images provided by one or more sources.
  • the collection of images can include images of objects that are similar to the objects used to train the predictive models.
  • the inference(s) and/or prediction(s) made by the trained predictive models can include output images, segmentation masks, numerical values, and/or other output data.
  • FIG. 5 illustrates system 500 , in accordance with example embodiments.
  • System 500 is provided to illustrate the operational relationship between segmentation module 130 , transformation module 140 , and prediction module 150 .
  • system 500 includes ground truth object database 510 , background database 520 , and training database 540 .
  • system 500 may have fewer, more, or alternative elements.
  • segmentation module 130 receives seed image(s) 502 and responsively segments object(s) disposed in seed image(s) 502 from the backgrounds of seed image(s) 502 to generate segmented object(s) 530 .
  • seed image(s) 502 can be provided by a user. This may be accomplished by way of a web page or series of web pages hosted by system 500 and provided to the user upon request.
  • seed image(s) 502 may be provided by prediction module 150 .
  • prediction module 150 could detect a class imbalance in training database 540 or augmented image(s) 532 and could transmit images corresponding to underrepresented classes to segmentation module 130 .
  • transformation module 140 could receive segmented object(s) 530 from segmentation module 130 .
  • transformation module 140 could receive (i) ground truth object properties 512 from ground truth object database 510 and (ii) background image(s) 522 from background database 520 .
  • Ground truth object database 510 may include one or more ground truth tables, such as ground truth table 400 , each containing ground truth property values for objects.
  • a user can provide the ground truth property values to populate ground truth object database 510 .
  • an image analysis system can provide the ground truth property values to populate ground truth object database 510 .
  • Such an image analysis system may be operable to receive a set of labeled images and responsively analyze objects in the set of labeled images to determine ground truth property values for each object. Other ways of populating ground truth object database 510 also exist.
  • Transformation module 140 may utilize the object classes of segmented object(s) 530 to request ground truth property values 512 for segmented object(s) 530 .
  • the object classes of segmented object(s) 530 may be identified via categorical labels associated with the pixels in seed image(s) 502 or may be provided by a user.
  • Background database 520 could include background images taken/captured from a wide variety of environments.
  • background database 520 may contain background images taken/captured from parks, offices, streets, playgrounds, beaches, homes, and so on.
  • the variability images in background database 520 helps to further increase the diversity of augmented image(s) 532 generated by transformation module 140 .
  • Transformation module 140 may request and receive background image(s) 522 from background database 520 .
  • transformation module 140 may randomly request background image(s) 522 from background database 520 . That is, in response to a request from transformation module 140 , background database 520 may transmit any random background image to transformation module 140 .
  • transformation module 140 uses the object classes of segmented object(s) 530 to request specific background image(s) 522 from background database 520 .
  • segmented object(s) 530 only include “cup” objects and “bowl” objects, then it may be more suitable for transformation module 140 to request backgrounds that cups and bowls would likely be found in (e.g., living rooms environments, dining room environments, restaurant environments) rather than backgrounds that cups and bowls would not likely be found in (e.g., the bottom of the ocean, the top of a volcano, etc.).
  • transformation module 140 could use ground truth property values 512 to transform segmented object(s) 530 into transformed object(s) and then could transplant the transformed object(s) into background image(s) 522 to create augmented image(s) 532 .
  • transformation module 140 may determine a target pixel position on the particular background image for which to place the particular transformed object. In some embodiments, transformation module 140 randomly determines the target pixel position. In other embodiments, transformation module 140 uses the object class of the particular transformed object to determine the target pixel position. For instance, the object class of the particular transformed object could be associated with ground truth position values. Similar to ground truth property values, ground truth position values may be pixel positions that an object is frequently located at in real-world images. For example, if images of chairs generally depict chairs being positioned near the centermost pixels of the images, then the ground truth position value for a chair in a background image may be the centermost pixels of the background image.
  • the ground truth position values for the particular object could be based on other objects in the particular background image.
  • the particular background image may contain categorical labels for each of its pixels. Transformation module 140 could use these categorical labels to identify whether the particular background image contains secondary objects of interest.
  • the secondary objects of interest could be based on the object class of the particular object. For example, if the particular object is a “fork” object, then secondary objects of interest may include “table” objects or “counter-top” objects. If transformation module 140 determines that the particular background image contains secondary objects of interest, then transformation module 140 could use the object classes of the secondary objects of interest to determine the ground truth position values for the particular object.
  • a secondary “table” object may specify that all “fork” objects should be positioned near the top of the “table” object.
  • a technical advantage of this approach is that the ground truth position values are based on actual object positions in real-world images. This can further improve the performance of predictive model 150 when it makes inferences on real-world images.
  • the augmented image(s) 532 generated by transformation module 140 could be added to existing images contained in training database 540 . Together, the existing images and augmented image(s) 532 could be used to train predictive models in prediction module 150 .
  • prediction module 150 may determine class imbalances in augmented image(s) 532 . To do this, a user may first provide prediction module 150 with a set of object classes that they believe should be evenly represented in augmented image(s) 532 . Prediction module 150 could then determine the frequency at which each of the provided object classes appears in augmented image(s) 532 . For example, if the user instructs prediction module 150 to determine whether augmented image(s) 532 contains a class balance between “fork” objects and “spoon” objects, then prediction module 150 may determine the frequency at which “fork” objects appear in augmented image(s) 532 and may determine the frequency at which “spoon” objects appear in augmented image(s) 532 .
  • prediction module 150 could determine whether the frequency r any object class is below a threshold.
  • the threshold could be based on the frequency at which each of the provided object classes appears in augmented image(s) 532 .
  • the threshold could be based on the median or mean frequency that each of the provided object classes appears in augmented image(s) 532 .
  • the threshold could be based on a percentage value (e.g., whether an object class is represented in at least 35% or 45% of the images in augmented image(s) 532 ).
  • prediction module 150 may select from augmented image(s) 532 an image containing the given object class and may transmit that image to segmentation module 130 to be used as a seed image.
  • the determination of class imbalances could occur before or after augmented image(s) 532 are added to training database 540 .
  • FIG. 6 depicts message flow 600 , in accordance with example embodiments.
  • Message flow 600 illustrates a process in which prediction module 150 identifies imbalances in its training data, temporarily suspends its training, and requests additional augmented images to balance its training data.
  • message flow 600 may utilize segmentation module 130 , transformation module 140 , and prediction module 150 during operations.
  • additional components, steps, or blocks may be added to message flow 600 without departing from the scope of this disclosure.
  • prediction module 150 begins training one or more predictive models using an initial set of training data.
  • the initial set of training data could be contained within prediction module 150 or may be requested by prediction module 150 from a training database, such as training database 540 .
  • prediction module 150 determines a class imbalance in the initial set of training data. As described above, this may involve a user may providing prediction module 150 with a set of object classes that they believe should be evenly represented in the initial set of training data and then prediction module 150 determining the frequency at which each of the provided object classes appears in the initial set of training data. After determining an underrepresented object class, at block 606 prediction module 150 transmits one or more images of the underrepresented object class to segmentation module 130 .
  • segmentation module 130 receives the image(s) of the underrepresented object class and responsively segments underrepresented object(s) from the image(s). Then at block 610 , segmentation module 130 transmits the segmented object(s) to transformation module 140 .
  • transformation module 140 may receive the segmented object(s) from segmentation module 130 and may responsively transform the segmented object(s) into one or more transformed objects.
  • the transformation at block 612 may utilize the ground truth property values for the segmented object(s).
  • the ground truth property values are stored in transformation module 140 .
  • block 612 involves transformation module 140 requesting and receiving the ground truth property values from a ground truth object database, such as ground truth object database 510 .
  • transformation module 140 may transplant the transformed object(s) onto one or more background image to generate one or more augmented images.
  • the background images used at block 614 may be based on the ground truth property values for the segmented object(s).
  • the background image(s) are stored in transformation module 140 .
  • block 614 involves transformation module 140 requesting and receiving the background image(s) from a background database, such as background database 520 .
  • transformation module 140 transmits the augmented image(s) generated at block 614 to prediction module 150 .
  • block 616 may additional and/or alternatively involve transformation module 140 transmitting the augmented image(s) to a training database containing data for training prediction module 150 , such as training database 540 .
  • prediction module 150 may resume training using the initial images from block 602 in addition to the augmented image(s) received at block 616 .
  • prediction module 150 may apply the trained predictive models onto a validation data set. If the trained predictive models perform poorly on a particular class of objects (e.g., an area under ROC curve below 0.5 or an accuracy below 0.5), prediction module 150 may request from segmentation module 130 /transformation module 140 additional augmented image(s) for the poorly performing class. Prediction module 150 may retrain the predictive models with these additional augmented image(s) to increase the overall performance of the predictive models.
  • FIG. 7 illustrates a method 700 , in accordance with example embodiments.
  • Method 700 may include various blocks or steps. The blocks or steps may be carried out individually or in combination. The blocks or steps may be carried out in any order and/or in series or in parallel. Further, blocks or steps may be omitted or added to method 700 . The blocks of method 700 may be carried out by various elements of computing system 100 as illustrated and described in reference to FIG. 1 .
  • Block 710 may involve locating, by a computing system, a foreground object disposed within a seed image.
  • the computing system may include an initial set of images for training a predictive model.
  • Block 720 may involve identifying, by the computing system, an object class corresponding to the foreground object.
  • Block 730 may involve, based on the identified object class, determining, by the computing system, a target value for an object property of the foreground object;
  • Block 740 may involve applying, by the computing system, a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value.
  • Block 750 may involve transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image.
  • Block 760 may involve augmenting, by the computing system, the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • the object property includes a relative size or a relative rotation of the foreground object. In other embodiments, the object property includes a height, a width, or a color of the foreground object.
  • the identified object class includes a set of possible target values for the object property and determining the target value for the object property comprises selecting the target value from the set of possible target values.
  • the target value could be selected randomly from the set of possible target values or could be selected based on statistically properties of the set of possible target values.
  • the set of possible target values includes a probability distribution for the object property and selecting the target value from the set of possible target values comprises taking a random sample from the probability distribution.
  • transplanting the transformed object into the background image comprises determining a target position value for the foreground object and placing the transformed object in the background image in accordance with the target position value.
  • the identified object class includes a set of possible target position values for the foreground object and determining the target position value for the foreground object comprises selecting the target position value from the set of possible target position values.
  • Some embodiments involve based on the identified object class, establishing, by the computing system, secondary objects of interest and determining, by the computing system, that the background image contains at least one of the secondary objects of interest, where placing the transformed object in the background image in accordance with the target position value comprises placing the transformed object to be adjacent to at least one of the secondary objects of interest.
  • Some embodiments involve, after augmenting the initial set of images, determining, by the computing system and for each object class of a plurality of object classes, a frequency at which the object class appears in the augmented set of images. These embodiments may further involve, based on the frequency, determining, by the computing system, a second seed image.
  • determining the second seed image comprises making a determination, for the object class, that the frequency at which the object class appears in the augmented set of images is below a threshold and, based on the determination, selecting, from the augmented set of images, an image that is associated with the object class to be the second seed image.
  • Some embodiments involve locating, by the computing system, a second foreground object disposed within the second seed image and identifying, by the computing system, a second object class corresponding to the second foreground object. Such embodiments may also involve, based on the identified second object class, determining, by the computing system, a target value for an object property of the second foreground object and applying, by the computing system, a transformation function to transform the second foreground object into a second transformed object, where the transformation function modifies the object property of the second foreground object from having an initial value to having the target value.
  • Such embodiments may further involve transplanting, by the computing system, the second transformed object into the background image so as to produce a second augmented image and augmenting, by the computing system, the augmented set of images with the second augmented image so as to produce a second augmented set of images for training the predictive model.
  • the transformation function is an affine image transformation.
  • the transformation function could map each pixel in the foreground object to one or more output pixels in the transformed object.
  • the transformation function is a linear transformation.
  • both the foreground object and the transformed object are associated with the object class.
  • Some embodiments involve training, by the computing system, the predictive model to determine a respective object class associated with each image in the augmented set of images.
  • Some embodiments involve selecting, from the initial set of images, a candidate image to be the seed image. Some embodiments involve receiving, from a client device, the seed image.
  • the computing system is a robotic system that operates in a plurality of environments and the initial set of images are images previously captured by the robotic system as the robotic system operated in the plurality of environments.
  • identifying the object class corresponding to the foreground object comprises generating one or more graphical user interfaces that contain data fields for inputting the object class; transmitting, to a client device, the one or more graphical user interfaces; and receiving, from the client device, the object class by way of the data fields.
  • Some embodiments involve, based on the identified object class, determining, by the computing system, a second target value for an object property of the foreground object. Such embodiments also involve applying, by the computing system, a second transformation function to transform the foreground object into a second transformed object, where the second transformation function modifies the object property of the foreground object from having an initial value to having the second target value. Such embodiments further involve transplanting, by the computing system, the second transformed object into the background image so as to produce a second augmented image and augmenting, by the computing system, the augmented set of images with the second augmented image so as to produce a second augmented set of images for training the predictive model.
  • Some embodiments involve, based on the identified object class, determining, by the computing system, a target value for a second object property of the foreground object, where the transformation function further modifies the second object property of the foreground object from having an initial value to having the target value.
  • a step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique.
  • a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data).
  • the program code can include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique.
  • the program code and/or related data can be stored on any type of computer readable medium such as a storage device including a disk, hard drive, or other storage medium.
  • the computer readable medium can also include non-transitory computer readable media such as computer-readable media that store data for short periods of time like register memory, processor cache, and random access memory (RAM).
  • the computer readable media can also include non-transitory computer readable media that store program code and/or data for longer periods of time.
  • the computer readable media may include secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example.
  • the computer readable media can also be any other volatile or non-volatile storage systems.
  • a computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Image Analysis (AREA)

Abstract

Systems and methods for augmenting a data set are provided. An example method may include locating a foreground object disposed within a seed image, identifying an object class corresponding to the foreground object, and, based on the identified object class, determining a target value for an object property of the foreground object. The example method may also include applying a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value. The example method may further include transplanting the transformed object into a background image so as to produce an augmented image and augmenting an initial set of images with the augmented image so as to produce an augmented set of images for training a predictive model.

Description

    CROSS REFERENCE TO RELATED APPLICATIONS
  • The present application is a continuation of and claims priority to U.S. patent application Ser. No. 17/124,103, filed Dec. 16, 2020 which is a continuation of and claims priority to U.S. patent application Ser. No. 16/717,013, filed Dec. 17, 2019, the content of which is herewith incorporated by reference.
  • BACKGROUND
  • In statistics or machine learning, “overfitting” is when a predictive model makes inferences that correspond too closely or exactly to a particular data set. When overfitting occurs, the predictive model often contains superfluous parameters that capture idiosyncrasies of the particular data set. Because of these parameters, the predictive model generally performs well on the particular data set, but performs poorly on new, previously unseen data sets.
  • SUMMARY
  • Example embodiments involve a data augmentation system. The system may include a segmentation module operable to segment a foreground object in an image from a background of the image. The system may also include a transformation module operable to transform one or more object properties of an object. Using these two modules, the system may generate augmented images that contain variations of the foreground object.
  • In a first aspect, a computer-implemented method is provided. The method includes locating, by a computing system, a foreground object disposed within a seed image, where the computing system includes an initial set of images for training a predictive model. The method also includes identifying, by the computing system, an object class corresponding to the foreground object. The method further includes, based on the identified object class, determining, by the computing system, a target value for an object property of the foreground object. The method also includes applying, by the computing system, a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value. The method additionally includes transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image. The method even further includes augmenting, by the computing system, the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • In a second aspect, a computing system is provided. The computing system may include an initial set of images for training a predictive model. The computing system may also include one or more processors configured to cause the computing system to carry out operations. The operations may include locating a foreground object disposed within a seed image. The operations may also include identifying an object class corresponding to the foreground object. The operations may further include, based on the identified object class, determining a target value for an object property of the foreground object. The operations may additionally include applying a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value. The operations may further include transplanting the transformed object into a background image so as to produce an augmented image. The operations may even further include augmenting the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • In a third aspect, an article of manufacture is provided. The article of manufacture may include a non-transitory computer-readable medium having stored thereon program instructions that, upon execution by one or more processors of a computing system that contains an initial set of images for training a predictive model, cause the computing system to carry out operations. The operations may include locating a foreground object disposed within a seed image. The operations may also include identifying an object class corresponding to the foreground object. The operations may further include, based on the identified object class, determining a target value for an object property of the foreground object. The operations may additionally include applying a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value. The operations may further include transplanting the transformed object into a background image so as to produce an augmented image. The operations may even further include augmenting the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • Other aspects, embodiments, and implementations will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings.
  • BRIEF DESCRIPTION OF THE FIGURES
  • FIG. 1 illustrates a computing system, in accordance with example embodiments.
  • FIG. 2 illustrates operations of a segmentation module, in accordance with example embodiments.
  • FIG. 3 illustrates operations of a transformation module, in accordance with example embodiments.
  • FIG. 4A depicts a ground truth object property table, in accordance with example embodiments.
  • FIG. 4B depicts a frequency distribution, in accordance with example embodiments.
  • FIG. 5 illustrates an example system, in accordance with example embodiments.
  • FIG. 6 depicts a message flow, in accordance with example embodiments.
  • FIG. 7 illustrates a method, in accordance with example embodiments.
  • DETAILED DESCRIPTION
  • Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features unless indicated as such. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein.
  • Thus, the example embodiments described herein are not meant to be limiting. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations.
  • Throughout this description, the articles “a” or “an” are used to introduce elements of the example embodiments. Any reference to “a” or “an” refers to “at least one,” and any reference to “the” refers to “the at least one,” unless otherwise specified, or unless the context clearly dictates otherwise. The intent of using the conjunction “or” within a described list of at least two terms is to indicate any of the listed terms or any combination of the listed terms.
  • The use of ordinal numbers such as “first,” “second,” “third” and so on is to distinguish respective elements rather than to denote a particular order of those elements. For purpose of this description, the terms “multiple” and “a plurality of” refer to “two or more” or “more than one.”
  • Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment. In the figures, similar symbols typically identify similar components, unless context dictates otherwise. Further, unless otherwise noted, figures are not drawn to scale and are used for illustrative purposes only. Moreover, the figures are representational only and not all components are shown. For example, additional structural or restraining components might not be shown.
  • Additionally, any enumeration of elements, blocks, or steps in this specification or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are carried out in a particular order.
  • I. Overview
  • Lack of diversity in training data is known to cause overfitting. For example, if a predictive model was being trained to classify objects in images but only had training data containing images of chairs disposed in an outdoor environment, then the trained version of the predictive model could produce stellar results when applied to images containing chairs disposed in an outdoor environment, but could produce poor results when applied to images containing chairs disposed in other environments (e.g., chairs disposed in a living room environment).
  • One solution to address this lack of diversity is to generate more varied training data. In the field of object classification and detection, this would typically involve collecting images from a wide variety of environments (e.g., living rooms, dining rooms, outdoors, offices spaces, conference rooms, etc.) and then having a human manually label the objects contained within the images. However, as the need for large amounts of training data increases, for instance to train robotic or autonomous vehicle object detection modules, the process of collecting images and manually labeling objects therein becomes unduly time consuming and inefficient.
  • Disclosed herein are systems and methods that may help address this technical problem. In some examples, a computing system could be configured to augment an initial set of training images with one or more “augmented images”. Such augmented images could include variations of the objects contained in the initial set of training images. For instance, if the initial set of training images contained an image with a chair, then the augmented images could include images with variations of that chair. As one example, the augmented images could include images that depict the chair rotated at different angles (e.g., the chair rotated at 90° from its original orientation, the chair rotated at 180° from its original orientation, etc.). As another example, the augmented images could include images with the chair disposed in different background environments (e.g., the chair disposed in kitchen environment, the chair disposed in a living room environment, the chair disposed in a bedroom environment, etc.). As yet another example, the augmented images could include images with the chair scaled to different sizes (e.g., the height of the chair scaled down 50% from its original size, the height of the chair scaled up 50% from its original size, etc.). Other variations could also exist.
  • To facilitate this process, the computing system may include a segmentation module operable to receive an image and responsively segment object(s) disposed within the image from the background of the image. The images provided to the segmentation module could be considered to be “seed images” because the objects within these images may be used as a basis to generate hundreds, if not thousands of augmented images. In some embodiments, a human operator could provide any or all of the seed images, including the locations of object(s) within the seed images and object classes of the object(s) within the seed images. In further embodiments, the computing system may receive pre-segmented object(s) from a client device, in which case the operations of the segmentation module may be optional.
  • The computing system may also include a transformation module operable to receive object(s) segmented by the segmentation module and responsively apply a transformation function to transform the segmented object(s) into one or more transformed objects. Then, the transformation module could transplant the transformed object(s) into one or more background images so as to produce one or more augmented images. The transformation module could add the augmented image(s) to an initial set of training images to produce an augmented set of images for training a predictive model.
  • In example embodiments, the transformation module could utilize the ground truth property values of the object(s) being transformed as a basis to perform intelligent and representative object transformations. For example, if images with chairs generally depict chairs having heights between 50-100 centimeters (cm), then a representative transformation of a chair would transform the chair to have a height between 50-100 cm. As another example, if images with fire hydrants generally depict fire hydrants being colored either red or yellow, then a representative transformation of a fire hydrant would transform the fire hydrant to be colored either red or yellow. A technical advantage of this approach is that the object transformations are based on actual object properties exhibited in real-world images. This can improve the performance of the predictive model when it makes inferences on real-world images.
  • The transformation module described herein could also contain background images taken/captured from a wide variety of environments. For instance, the transformation module may contain background images taken/captured from parks, offices, streets, playgrounds, beaches, homes, and so on. The transformation module could transplant transformed object(s) into these background images to create augmented image(s) for training the predictive model. Advantageously, the variability of the background images helps to further increase the diversity of the augmented image(s) generated by the transformation module.
  • Examples described herein also provide for a system that automatically detects class imbalances in training data and responsively generates augmented image(s) that can balance the disproportional training data. In an example process, upon detecting a class imbalance, the described system could pause or otherwise halt the training process of a predictive model. Using the aforementioned segmentation module, the system could segment poorly represented object(s) disposed in the training data. Afterwards, the described system could apply the aforementioned transformation module to generate augmented image(s) using the segmented object(s). These augmented image(s) could be added to the training data to create augmented training data. The described system could later resume the training process with the augmented training data.
  • The example computing systems described herein may be part of or may take the form of a robotic system. Such a robotic system may include sensors for capturing information of the environment in which the robotic system is operating. For example, the sensors may monitor the environment in real time, and detect obstacles, elements of the terrain, weather conditions, temperature, or other aspects of the environment. The sensors may capture data corresponding to one or more characteristics of objects in the environment, such as a size, shape, profile, structure, or orientation of the objects. The robotic system may use the captured sensor information as input into the aforementioned predictive models, which may assist the robotic system with classifying/identifying objects in its environment.
  • In further examples, when navigating through an environment, the robotic system may capture images of the environment and may store the captured images for later use. Then, in order to train the aforementioned predictive models, the robotic system may use the methods described herein to add augmented images to the images previously captured representing the robotic system's environment. Because the robotic system may operate in a limited set of environments—and thus only captures images from the limited set of environments—the augmented images can help the robotic system identify objects and otherwise operate in previously unseen environments. In further examples, a central computing system may receive images from multiple robotic devices, and may use the images to develop augmented training image sets for use by any or all of the robotic devices.
  • These as well as other aspects, advantages, and alternatives will become apparent to those reading the following description, with reference where appropriate to the accompanying drawings. Further, it should be understood that the discussion in this overview and elsewhere in this document is provided by way of example only and that numerous variations are possible.
  • II. Example Computing Systems
  • FIG. 1 illustrates computing system 100, in accordance with example embodiments. Computing system 100 may be an example system that could automatically augment an initial set of training images with one or more augmented images. Computing system 100 may be implemented in various forms, such as a server device, mobile device, a robotic device, an autonomous vehicle, or some other arrangement. Some example implementations involve a computing system 100 engineered to be low cost at scale and designed to support a variety of tasks. Computing system 100 may also be optimized for machine learning.
  • As shown in FIG. 1, computing system 100 may include processor(s) 102, data storage 104, and controller(s) 108, which together may be part of control system 110. Computing system 100 may also include network interface 112, power source 114, sensors 116, robotic subsystem 120, segmentation module 130, transformation module 140, and prediction module 150. Nonetheless, computing system 100 is shown for illustrative purposes, and may include more or fewer components. The various components of computing system 100 may be connected in any manner, including wired or wireless connections. Further, in some examples, components of computing system 100 may be distributed among multiple physical entities rather than a single physical entity. Other example illustrations of computing system 100 may exist as well.
  • Processor(s) 102 may operate as one or more general-purpose hardware processors and/or one or more special purpose processors (e.g., digital signal processors (DSPs), tensor processing units (TPUs), graphics processing units (GPUs), application specific integrated circuits (ASICs), etc.). Processor(s) 102 may be configured to execute computer-readable program instructions 106, and manipulate data 107, both of which are stored in data storage 104. Processor(s) 102 may also directly or indirectly interact with other components of computing system 100, such as network interface 112, power source 114, sensors 116, robotic subsystem 120, segmentation module 130, transformation module 140, and prediction module 150. In example embodiments, processor(s) 102 may be configured to execute instructions stored in data storage 104 so as to carry out one or more operations, for example, the operations of message flow 600 or method 700 as described below.
  • Data storage 104 may be one or more types of hardware memory. For example, data storage 104 may include or take the form of one or more computer-readable storage media that can be read or accessed by processor(s) 102. The one or more computer-readable storage media can include volatile or non-volatile storage components, such as optical, magnetic, organic, or another type of memory or storage, which can be integrated in whole or in part with processor(s) 102. In some embodiments, data storage 104 can be a single physical device. In other embodiments, data storage 104 can be implemented using two or more physical devices, which may communicate with one another via wired or wireless communication. As noted previously, data storage 104 may include the computer-readable program instructions 106 and data 107. Data 107 may be any type of data, such as configuration data, executable data, or diagnostic data, among other possibilities.
  • Controller(s) 108 may include one or more electrical circuits, units of digital logic, computer chips, or microprocessors that are configured to (perhaps among other tasks) interface between any combination of control system 110, network interface 112, power source 114, sensors 116, robotic subsystem 120, segmentation module 130, transformation module 140, and prediction module 150, or a user of computing system 100. In some implementations, controller(s) 108 may be a purpose-built embedded device for performing specific operations with one or more subsystems of computing system 100.
  • Control system 110 may monitor and physically change the operating conditions of computing system 100. In doing so, control system 110 may serve as a link between portions of computing system 100, such as between network interface 112, power source 114, sensors 116, robotic subsystem 120, segmentation module 130, transformation module 140, and prediction module 150. Further, control system 110 may serve as an interface between computing system 100 and a user. In some embodiments, control system 110 may include various components for communicating with computing system 100, including buttons, keyboards, etc.
  • During operation, control system 110 may communicate with other systems of computing system 100 via wired or wireless connections. Operations of control system 110 may be carried out by processor(s) 102. Alternatively, these operations may be carried out by controller(s) 108, or a combination of processor(s) 102 and controller(s) 108.
  • Network interface 112 may serve as an interface between computing system 100 and another computing device. Network interface 112 can include one or more wireless interfaces and/or wireline interfaces that are configurable to communicate via a network. Wireless interfaces can include one or more wireless transmitters, receivers, and/or transceivers, such as a Bluetooth™ transceiver, a Zigbee® transceiver, a Wi-Fi™ transceiver, a WiMAX™ transceiver, and/or other similar types of wireless transceivers configurable to communicate via a wireless network. Wireline interfaces can include one or more wireline transmitters, receivers, and/or transceivers, such as an Ethernet transceiver, a Universal Serial Bus (USB) transceiver, or similar transceiver configurable to communicate via a twisted pair wire, a coaxial cable, a fiber-optic link, or a similar physical connection to a wireline network.
  • In some embodiments, network interface 112 can be configured to provide reliable, secured, and/or authenticated communications. For each communication described herein, information for facilitating reliable communications (e.g., guaranteed message delivery) can be provided, perhaps as part of a message header and/or footer (e.g., packet/message sequencing information, encapsulation headers and/or footers, size/time information, and transmission verification information such as cyclic redundancy check (CRC) and/or parity check values). Communications can be made secure (e.g., be encoded or encrypted) and/or decrypted/decoded using one or more cryptographic protocols and/or algorithms, such as, but not limited to, Data Encryption Standard (DES), Advanced Encryption Standard (AES), a Rivest-Shamir-Adelman (RSA) algorithm, a Diffie-Hellman algorithm, a secure sockets protocol such as Secure Sockets Layer (SSL) or Transport Layer Security (TLS), and/or Digital Signature Algorithm (DSA). Other cryptographic protocols and/or algorithms can be used as well or in addition to those listed herein to secure (and then decrypt/decode) communications.
  • Power source(s) 114 may be configured to supply power to various components of computing system 100. Among other possible power systems, computing system 100 may include a hydraulic system, electrical system, batteries, or other types of power systems. As an example illustration, computing system 100 may include one or more batteries configured to provide charge to components of computing system 100. Some of mechanical components 122 or electrical components 124 may each connect to a different power source, may be powered by the same power source, or be powered by multiple power sources.
  • Any type of power source may be used to power computing system 100, such as electrical power or a gasoline engine. Additionally or alternatively, computing system 100 may include a hydraulic system configured to provide power to mechanical components 122 using fluid power. Components of computing system 100 may operate based on hydraulic fluid being transmitted throughout the hydraulic system to various hydraulic motors and hydraulic cylinders, for example. The hydraulic system may transfer hydraulic power by way of pressurized hydraulic fluid through tubes, flexible hoses, or other links between components of computing system 100. Power source(s) 114 may charge using various types of charging, such as wired connections to an outside power source, wireless charging, combustion, or other examples.
  • Sensor(s) 116 may be arranged to sense aspects of computing system 100. Sensor(s) 116 may include one or more force sensors, torque sensors, velocity sensors, acceleration sensors, position sensors, proximity sensors, motion sensors, location sensors, load sensors, temperature sensors, touch sensors, depth sensors, ultrasonic range sensors, infrared sensors, object sensors, or cameras, among other possibilities. Within some examples, computing system 100 may be configured to receive sensor data from sensors that are physically separated from the computing system (e.g., sensors that are positioned on other computing systems or located within the environment in which the computing system 100 is operating).
  • Sensor(s) 116 may provide sensor data to processor(s) 102 (perhaps by way of data 107) to allow for interaction of computing system 100 with its environment, as well as monitoring of the operation of computing system 100. The sensor data may be used in evaluation of various factors for activation, movement, and deactivation of mechanical components 122 and electrical components 124 by control system 110. For example, sensor(s) 116 may capture data corresponding to the terrain of the environment or location of nearby objects, which may assist with environment recognition and navigation. The information captured by sensor(s) 116 may be provided to segmentation module 130, transformation module 140, and prediction module 150 to augment a set of training data for the computing system 100.
  • In some examples, sensor(s) 116 may include RADAR (e.g., for long-range object detection, distance determination, or speed determination), LIDAR (e.g., for short-range object detection, distance determination, or speed determination), SONAR (e.g., for underwater object detection, distance determination, or speed determination), VICON® (e.g., for motion capture), one or more cameras (e.g., stereoscopic cameras for 3D vision), a global positioning system (GPS) transceiver, or other sensors for capturing information of the environment in which computing system 100 is operating. Sensor(s) 116 may monitor the environment in real time, and detect obstacles, elements of the terrain, weather conditions, temperature, or other aspects of the environment. In another example, sensor(s) 116 may capture data corresponding to one or more characteristics of a target or identified object, such as a size, shape, profile, structure, or orientation of the object.
  • Further, computing system 100 may include sensor(s) 116 configured to receive information indicative of the state of computing system 100, including sensor(s) 116 that may monitor the state of the various components of computing system 100. Sensor(s) 116 may measure activity of systems of computing system 100 and receive information based on the operation of the various features of computing system 100, such as the operation of an extendable arm, an end effector, or other mechanical or electrical features of computing system 100. The data provided by sensor(s) 116 may enable control system 110 to determine errors in operation as well as monitor overall operation of components of computing system 100.
  • As an example, computing system 100 may use force/torque sensors to measure load on various components of computing system 100. In some implementations, computing system 100 may include one or more force/torque sensors on an arm or end effector to measure the load on the actuators that move one or more members of the arm or end effector. In some examples, the computing system 100 may include a force/torque sensor at or near the wrist or end effector, but not at or near other joints of a robotic arm. In further examples, computing system 100 may use one or more position sensors to sense the position of the actuators of the computing system. For instance, such position sensors may sense states of extension, retraction, positioning, or rotation of the actuators on an arm or end effector.
  • As another example, sensor(s) 116 may include one or more velocity or acceleration sensors. For instance, sensor(s) 116 may include an inertial measurement unit (IMU). The IMU may sense velocity and acceleration in the world frame, with respect to the gravity vector. The velocity and acceleration sensed by the IMU may then be translated to that of computing system 100 based on the location of the IMU in computing system 100 and the kinematics of computing system 100.
  • Computing system 100 may include other types of sensors not explicitly discussed herein. Additionally or alternatively, the computing system may use particular sensors for purposes not enumerated herein.
  • The mechanical components 122 in robotic subsystem 120 represent hardware of computing system 100 that may enable computing system 100 to perform physical operations. As a few examples, computing system 100 may include one or more physical members, such as an arm, an end effector, a head, a neck, a torso, a base, and wheels. The physical members or other parts of computing system 100 may further include actuators arranged to move the physical members in relation to one another. Computing system 100 may also include one or more structured bodies for housing control system 110 or other components, and may further include other types of mechanical components. The particular mechanical components 122 used may vary based on the design of the computing system, and may also be based on the operations or tasks the computing system may be configured to perform.
  • In some examples, mechanical components 122 may include one or more removable components. Computing system 100 may be configured to add or remove such removable components, which may involve assistance from a user. For example, computing system 100 may be configured with removable end effectors or digits that can be replaced or changed as needed or desired. In some implementations, computing system 100 may include one or more removable or replaceable battery units, control systems, power systems, bumpers, or sensors. Other types of removable components may be included within some implementations.
  • The electrical components 124 in robotic subsystem 120 may include various mechanisms capable of processing, transferring, or providing electrical charge or electric signals. Among possible examples, electrical components 124 may include electrical wires, circuitry, or wireless communication transmitters and receivers to enable operations of computing system 100. Electrical components 124 may interwork with mechanical components 122 to enable computing system 100 to perform various operations. Electrical components 124 may be configured to provide power from power source(s) 114 to the various mechanical components 122, for example. Further, computing system 100 may include electric motors. Other examples of electrical components 124 may exist as well.
  • In some embodiments, computing system 100 may include a body, which may connect to or house appendages and components of a robotic system. As such, the structure of the body may vary within examples and may further depend on particular operations that a given robot may have been designed to perform. For example, a robot developed to carry heavy loads may have a wide body that enables placement of the load. Similarly, a robot designed to operate in tight spaces may have a relatively tall, narrow body. Further, the body or the other components may be developed using various types of materials, such as metals or plastics. Within other examples, a robot may have a body with a different structure or made of various types of materials.
  • The body or the other components may include or carry sensor(s) 116. These sensors may be positioned in various locations on the robotic system, such as on a body, a head, a neck, a base, a torso, an arm, or an end effector, among other examples. The robotic system may be configured to carry a load, such as a type of cargo that is to be transported. In some examples, the load may be placed by the robotic system into a bin or other container attached to the robotic system. The load may also represent external batteries or other types of power sources (e.g., solar panels) that the robotic system may utilize. Carrying the load represents one example use for which the robotic system may be configured, but the robotic system may be configured to perform other operations as well.
  • Segmentation module 130 may be a software application, computing device, or subsystem within computing system 100 that is operable to receive seed image(s) and responsively segment object(s) disposed in the seed image(s) from the backgrounds of the seed image(s). In some implementations, segmentation module 130 may receive a single image and may segment a single object from that single image. In other implementations, segmentation module 130 may receive multiple images and may segment different objects from each of the multiple images. After segmenting the object(s), segmentation module 130 could transmit the segmented object(s) to transformation module 140.
  • To conceptually illustrate the operations of segmentation module 130, FIG. 2 is provided. Specifically, FIG. 2 shows how segmentation module 130 receives image 210 and then responsively analyzes image 210 to segment object 220 from background environment 230. Then segmentation module 130 may transmit the segmented version of object 220 (perhaps along with other parameters) to transformation module 140.
  • In example embodiments, image 210 could be a labeled image containing categorical labels for each of its pixels. These categorical labels could help identify the object classes for objects in image 210. For example, pixels in image 210 that correspond to object 220 may be labeled with the categorical label “fire hydrant” whereas the pixels in image 210 that correspond to background environment 230 may be labeled as “background” or “street”. Segmentation module 130 can utilize these categorical labels to segment object 220 from background environment 230. Specifically, segmentation module 130 could determine each pixel in image 210 that has a categorical label corresponding to an object of interest (e.g., “fire hydrant”) and may extract those determined pixels from image 210. Then, segmentation module 130 could transmit the extracted pixels to transformation module 140.
  • In some embodiments, an image provided to segmentation module 130 could contain multiple objects of interest. For example, image 210 is shown to contain auxiliary object 222A and auxiliary object 222B, both of which may be labeled with the categorical label “car”. In these situations, segmentation module 130 could configured to extract all objects of interest (e.g., extract object 220 along with auxiliary object 222A and auxiliary object 222B) or may be configured to only extract certain objects of interest (e.g., only extract object 220 but not auxiliary object 222A and auxiliary object 222B). In some implementations, a user, for example via a graphical user interface, can provide segmentation module 130 with information on which objects of interest to segment from an input image. In other implementations and as further described below, transformation module 140 and/or prediction module 150 could provide segmentation module 130 with information on which objects of interest to segment from an input image.
  • In some embodiments, image 210 may be an unlabeled image. In these situations, segmentation module 130 may include an object detection module that may detect various objects in image 210 but may be unable to identify the categorical labels/object classes associated with the detected objects. To obtain these object classes, segmentation module 130 may present the detected objects to a user, perhaps through a graphical user interface, who may assign categorical labels to each of the detected objects. Using the assigned labels, segmentation module 130 may proceed with determining each pixel in image 210 that has a categorical label corresponding to an object of interest and may extract those determined pixels from image 210.
  • Transformation module 140 may be a software application, computing device, or subsystem within computing system 100 that is operable receive object(s) segmented by segmentation module 130 and responsively apply a transformation function to transform the object(s) into one or more transformed objects. Then, transformation module 140 could transplant the transformed object(s) into one or more background images so as to produce one or more augmented images.
  • To conceptually illustrate the operations of transformation module 140, FIG. 3 is provided. Specifically, FIG. 3 shows how transformation module 140 receives object 310 and transforms object 310 into transformed object 320 and transformed object 330. After this, transformation module 140 transplants transformed object 320 onto background 322 to produce augmented image 324 and transplants transformed object 330 onto background 332 to produce augmented image 334. Both augmented image 324 and augmented image 334 could then be used to augment a training data set used by prediction module 150.
  • The transformation applied by transformation module 140 could map each pixel in object 310 to one or more output pixels in transformed object 320 (or transformed object 330). The mapping could take the form of an affine transformation, a linear transformation, or another type of image processing transformation. In some cases, the mapping modifies one or more object properties of object 310. These object properties may include, but are not limited to: the height or width of object 310, the relative size of object 310 (e.g., the amount that object 310 is sized up or sized down from its initial size), the relative rotation of object 310 (e.g., the amount that object 310 is rotated clockwise or counterclockwise from its initial orientation), or the color of object 310, among other possibilities. For example, as shown in FIG. 3, transformation module 140 transforms the relative rotation of object 310 approximately 45° clockwise to generate transformed object 320 and transforms the relative rotation of object 310 approximately 180° clockwise to generate transformed object 330.
  • In some embodiments, transformation module 140 utilizes randomly generated object property values to transform object 310. For instance, transformation module 140 could randomly generate a first rotation value between 0° and 90° and could use the first rotation value as a basis to transform the relative rotation of object 310 and generate a transformed object. Then, transformation module 140 could randomly generate a second rotation value between 0° and 90° and could use the second rotation value as a basis to transform the relative rotation of object 310 and generate a second transformed object.
  • In some embodiments, transformation module 140 utilizes the ground truth property values of the object being transformed as a basis to perform intelligent and representative object transformations. As described herein, ground truth property values may refer to property values that an object frequently exhibits in real-world images. For example, if real-world images of chairs generally depict chairs having a height of 75 cm, then a ground truth height value for a chair may be 75 cm. Basing transformations on how objects actually appear in real-world images could improve the performance of predictive model 150 when it makes inferences on real-world images.
  • Examples of ground truth property values are depicted in FIG. 4A. As shown, ground truth table 400 includes record 402 and record 404, both of which have corresponding entries for height property 412, width property 414, and rotation property 416.
  • Height property 412 could provide transformation module 140 with the necessary details for transforming the height of an object. As shown in FIG. 4A, the height property 412 entry for record 402 indicates that the ground truth height for “fork” objects is between 10 cm and 30 cm. Similarly, the height property 412 entry for record 404 indicates that the ground truth height for “fire hydrant” objects is between 100 cm and 200 cm.
  • Width property 414 could provide transformation module 140 with the necessary details for transforming the width of an object. As shown in FIG. 4A, the width property 414 entry for record 402 indicates that the ground truth width for “fork” objects is between 2 cm and 9 cm. Similarly, the width property 414 entry for record 404 indicates that the ground truth width for “fire hydrant” objects is between 50 cm and 100 cm.
  • Rotation property 416 could provide transformation module 140 with the necessary details for transforming the relative rotation of an object. As shown in FIG. 4A, the rotation property 416 entry for record 402 indicates that the ground truth rotation for “fork” objects is between −90° and 90° (e.g., where 0° corresponds to completely vertical, a negative degree corresponds to a counterclockwise rotation from 0°, and a positive degree corresponds to a clockwise rotation from) 0°. Similarly, the rotation property 416 entry for record 404 indicates that the ground truth rotation for “fire hydrant” objects is between −10° and 10°.
  • The layout and entries in ground truth table 400 are provided as an example and are not intended to be limiting with respect to the embodiments herein. In some implementations, the ground truth property values in ground truth table 400 could be sets of discrete values. In other implementations, the ground truth property values in ground truth table 400 could be ranges of values. Regardless of the implementation, transformation module 140 could use ground truth table 400 to identify the ground truth property values for an object being transformed. From these ground truth property values, transformation module 140 could select a target property value to use for an object transformation. For example, if transformation module 140 were to transform a “fork” object, transformation module 140 may refer to entry 402 in ground truth table 400 and could select a target height value somewhere between 10 cm and 30 cm (e.g., 25 cm). Transformation module 140 would then transform the “fork” object to have a resulting height of 25 cm. The selection of the target property value from the ground truth property values could be performed randomly or could be performed based on a statistical metric of the ground truth property values (e.g., a median value is always selected, values within one standard deviation of a mean are always selected, etc.).
  • In some embodiments, the ground truth property values in ground truth table 400 are in the form of a frequency distribution. For example, FIG. 4B illustrates frequency distribution 440 for the ground truth height property of a “fork” object, where the x-axis corresponds to height and the y-axis corresponds to the number of the “fork” objects in real-world images that exhibit a specific height. Transformation module 140 could randomly sample a value from frequency distribution 440 to use as the target property value in an object transformation.
  • Prediction module 150 may contain one or more predictive models including, but not limited to: an artificial neural network, a Bayesian network, a hidden Markov model, a Markov decision process, a logistic regression function, a support vector machine, a statistical machine learning algorithm, and/or a heuristic machine learning system.
  • During a training phase, the predictive models of prediction module 150 may be trained on an input set of training data to recognize patterns in the training data and provide output inferences and/or predictions about patterns in the training data. The resulting trained predictive models can be called trained predictive models.
  • In example embodiments, the predictive models can be trained by providing the initial set of images or the augmented set of images generated by transformation module 140 as training input. The predictive models may use various training techniques, such as unsupervised learning, supervised learning, semi-supervised learning, reinforcement learning, transfer learning, incremental learning, or curriculum learning, among other possibilities. The predictive models can be trained using one or more computer processors and/or on-device coprocessors. The on-device coprocessor(s) can include, but are not limited to one or more graphic processing units (GPUs), one or more tensor processing units (TPUs), one or more digital signal processors (DSPs), and/or one or more application specific integrated circuits (ASICs). Such on-device coprocessors can speed up training of the predictive models.
  • During an inference phase, the trained predictive models of prediction module 150 can receive input data and generate corresponding inference(s) and/or prediction(s) about the input data. In examples, the input data can include a collection of images provided by one or more sources. The collection of images can include images of objects that are similar to the objects used to train the predictive models. The inference(s) and/or prediction(s) made by the trained predictive models can include output images, segmentation masks, numerical values, and/or other output data.
  • FIG. 5 illustrates system 500, in accordance with example embodiments. System 500 is provided to illustrate the operational relationship between segmentation module 130, transformation module 140, and prediction module 150. In addition to these modules, system 500 includes ground truth object database 510, background database 520, and training database 540. In other embodiments, system 500 may have fewer, more, or alternative elements.
  • In system 500, segmentation module 130 receives seed image(s) 502 and responsively segments object(s) disposed in seed image(s) 502 from the backgrounds of seed image(s) 502 to generate segmented object(s) 530. In some examples, seed image(s) 502 can be provided by a user. This may be accomplished by way of a web page or series of web pages hosted by system 500 and provided to the user upon request. In other examples, seed image(s) 502 may be provided by prediction module 150. For example, prediction module 150 could detect a class imbalance in training database 540 or augmented image(s) 532 and could transmit images corresponding to underrepresented classes to segmentation module 130.
  • After the segmenting, transformation module 140 could receive segmented object(s) 530 from segmentation module 130. In addition, transformation module 140 could receive (i) ground truth object properties 512 from ground truth object database 510 and (ii) background image(s) 522 from background database 520.
  • Ground truth object database 510 may include one or more ground truth tables, such as ground truth table 400, each containing ground truth property values for objects. In some embodiments, a user can provide the ground truth property values to populate ground truth object database 510. In other embodiments, an image analysis system can provide the ground truth property values to populate ground truth object database 510. Such an image analysis system may be operable to receive a set of labeled images and responsively analyze objects in the set of labeled images to determine ground truth property values for each object. Other ways of populating ground truth object database 510 also exist.
  • Transformation module 140 may utilize the object classes of segmented object(s) 530 to request ground truth property values 512 for segmented object(s) 530. As described above, the object classes of segmented object(s) 530 may be identified via categorical labels associated with the pixels in seed image(s) 502 or may be provided by a user.
  • Background database 520 could include background images taken/captured from a wide variety of environments. For instance, background database 520 may contain background images taken/captured from parks, offices, streets, playgrounds, beaches, homes, and so on. The variability images in background database 520 helps to further increase the diversity of augmented image(s) 532 generated by transformation module 140.
  • Transformation module 140 may request and receive background image(s) 522 from background database 520. In some embodiments, transformation module 140 may randomly request background image(s) 522 from background database 520. That is, in response to a request from transformation module 140, background database 520 may transmit any random background image to transformation module 140. In other embodiments, transformation module 140 uses the object classes of segmented object(s) 530 to request specific background image(s) 522 from background database 520. For example, if segmented object(s) 530 only include “cup” objects and “bowl” objects, then it may be more suitable for transformation module 140 to request backgrounds that cups and bowls would likely be found in (e.g., living rooms environments, dining room environments, restaurant environments) rather than backgrounds that cups and bowls would not likely be found in (e.g., the bottom of the ocean, the top of a volcano, etc.).
  • In line with the discussion above, transformation module 140 could use ground truth property values 512 to transform segmented object(s) 530 into transformed object(s) and then could transplant the transformed object(s) into background image(s) 522 to create augmented image(s) 532.
  • When transplanting a particular object onto a particular background image, transformation module 140 may determine a target pixel position on the particular background image for which to place the particular transformed object. In some embodiments, transformation module 140 randomly determines the target pixel position. In other embodiments, transformation module 140 uses the object class of the particular transformed object to determine the target pixel position. For instance, the object class of the particular transformed object could be associated with ground truth position values. Similar to ground truth property values, ground truth position values may be pixel positions that an object is frequently located at in real-world images. For example, if images of chairs generally depict chairs being positioned near the centermost pixels of the images, then the ground truth position value for a chair in a background image may be the centermost pixels of the background image.
  • In some embodiments, the ground truth position values for the particular object could be based on other objects in the particular background image. For instance, the particular background image may contain categorical labels for each of its pixels. Transformation module 140 could use these categorical labels to identify whether the particular background image contains secondary objects of interest. The secondary objects of interest could be based on the object class of the particular object. For example, if the particular object is a “fork” object, then secondary objects of interest may include “table” objects or “counter-top” objects. If transformation module 140 determines that the particular background image contains secondary objects of interest, then transformation module 140 could use the object classes of the secondary objects of interest to determine the ground truth position values for the particular object. For example, a secondary “table” object may specify that all “fork” objects should be positioned near the top of the “table” object. A technical advantage of this approach is that the ground truth position values are based on actual object positions in real-world images. This can further improve the performance of predictive model 150 when it makes inferences on real-world images.
  • The augmented image(s) 532 generated by transformation module 140 could be added to existing images contained in training database 540. Together, the existing images and augmented image(s) 532 could be used to train predictive models in prediction module 150.
  • In some embodiments, prediction module 150 may determine class imbalances in augmented image(s) 532. To do this, a user may first provide prediction module 150 with a set of object classes that they believe should be evenly represented in augmented image(s) 532. Prediction module 150 could then determine the frequency at which each of the provided object classes appears in augmented image(s) 532. For example, if the user instructs prediction module 150 to determine whether augmented image(s) 532 contains a class balance between “fork” objects and “spoon” objects, then prediction module 150 may determine the frequency at which “fork” objects appear in augmented image(s) 532 and may determine the frequency at which “spoon” objects appear in augmented image(s) 532. After this, prediction module 150 could determine whether the frequency r any object class is below a threshold. In some implementations, the threshold could be based on the frequency at which each of the provided object classes appears in augmented image(s) 532. For example, the threshold could be based on the median or mean frequency that each of the provided object classes appears in augmented image(s) 532. In other implementations, the threshold could be based on a percentage value (e.g., whether an object class is represented in at least 35% or 45% of the images in augmented image(s) 532). If the frequency for a given object class is below the threshold, prediction module 150 may select from augmented image(s) 532 an image containing the given object class and may transmit that image to segmentation module 130 to be used as a seed image. The determination of class imbalances could occur before or after augmented image(s) 532 are added to training database 540.
  • III. Example Methods
  • FIG. 6 depicts message flow 600, in accordance with example embodiments. Message flow 600 illustrates a process in which prediction module 150 identifies imbalances in its training data, temporarily suspends its training, and requests additional augmented images to balance its training data. By way of example, message flow 600 may utilize segmentation module 130, transformation module 140, and prediction module 150 during operations. However, additional components, steps, or blocks may be added to message flow 600 without departing from the scope of this disclosure.
  • At block 602, prediction module 150 begins training one or more predictive models using an initial set of training data. The initial set of training data could be contained within prediction module 150 or may be requested by prediction module 150 from a training database, such as training database 540.
  • At block 604, prediction module 150 determines a class imbalance in the initial set of training data. As described above, this may involve a user may providing prediction module 150 with a set of object classes that they believe should be evenly represented in the initial set of training data and then prediction module 150 determining the frequency at which each of the provided object classes appears in the initial set of training data. After determining an underrepresented object class, at block 606 prediction module 150 transmits one or more images of the underrepresented object class to segmentation module 130.
  • At block 608, segmentation module 130 receives the image(s) of the underrepresented object class and responsively segments underrepresented object(s) from the image(s). Then at block 610, segmentation module 130 transmits the segmented object(s) to transformation module 140.
  • At block 612, transformation module 140 may receive the segmented object(s) from segmentation module 130 and may responsively transform the segmented object(s) into one or more transformed objects. The transformation at block 612 may utilize the ground truth property values for the segmented object(s). In some embodiments, the ground truth property values are stored in transformation module 140. In other embodiments, block 612 involves transformation module 140 requesting and receiving the ground truth property values from a ground truth object database, such as ground truth object database 510.
  • At block 614, transformation module 140 may transplant the transformed object(s) onto one or more background image to generate one or more augmented images. The background images used at block 614 may be based on the ground truth property values for the segmented object(s). In some embodiments, the background image(s) are stored in transformation module 140. In other embodiments, block 614 involves transformation module 140 requesting and receiving the background image(s) from a background database, such as background database 520.
  • At block 616, transformation module 140 transmits the augmented image(s) generated at block 614 to prediction module 150. In some embodiments, block 616 may additional and/or alternatively involve transformation module 140 transmitting the augmented image(s) to a training database containing data for training prediction module 150, such as training database 540.
  • At block 618, prediction module 150 may resume training using the initial images from block 602 in addition to the augmented image(s) received at block 616. In some embodiments, after training is complete prediction module 150 may apply the trained predictive models onto a validation data set. If the trained predictive models perform poorly on a particular class of objects (e.g., an area under ROC curve below 0.5 or an accuracy below 0.5), prediction module 150 may request from segmentation module 130/transformation module 140 additional augmented image(s) for the poorly performing class. Prediction module 150 may retrain the predictive models with these additional augmented image(s) to increase the overall performance of the predictive models.
  • IV. Example Operations
  • FIG. 7 illustrates a method 700, in accordance with example embodiments. Method 700 may include various blocks or steps. The blocks or steps may be carried out individually or in combination. The blocks or steps may be carried out in any order and/or in series or in parallel. Further, blocks or steps may be omitted or added to method 700. The blocks of method 700 may be carried out by various elements of computing system 100 as illustrated and described in reference to FIG. 1.
  • Block 710 may involve locating, by a computing system, a foreground object disposed within a seed image. The computing system may include an initial set of images for training a predictive model.
  • Block 720 may involve identifying, by the computing system, an object class corresponding to the foreground object.
  • Block 730 may involve, based on the identified object class, determining, by the computing system, a target value for an object property of the foreground object;
  • Block 740 may involve applying, by the computing system, a transformation function to transform the foreground object into a transformed object, where the transformation function modifies the object property of the foreground object from having an initial value to having the target value.
  • Block 750 may involve transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image.
  • Block 760 may involve augmenting, by the computing system, the initial set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
  • In some embodiments, the object property includes a relative size or a relative rotation of the foreground object. In other embodiments, the object property includes a height, a width, or a color of the foreground object.
  • In some embodiments, the identified object class includes a set of possible target values for the object property and determining the target value for the object property comprises selecting the target value from the set of possible target values. The target value could be selected randomly from the set of possible target values or could be selected based on statistically properties of the set of possible target values.
  • In some embodiments, the set of possible target values includes a probability distribution for the object property and selecting the target value from the set of possible target values comprises taking a random sample from the probability distribution.
  • In some embodiments, transplanting the transformed object into the background image comprises determining a target position value for the foreground object and placing the transformed object in the background image in accordance with the target position value.
  • In some embodiments, the identified object class includes a set of possible target position values for the foreground object and determining the target position value for the foreground object comprises selecting the target position value from the set of possible target position values.
  • Some embodiments involve based on the identified object class, establishing, by the computing system, secondary objects of interest and determining, by the computing system, that the background image contains at least one of the secondary objects of interest, where placing the transformed object in the background image in accordance with the target position value comprises placing the transformed object to be adjacent to at least one of the secondary objects of interest.
  • Some embodiments involve, after augmenting the initial set of images, determining, by the computing system and for each object class of a plurality of object classes, a frequency at which the object class appears in the augmented set of images. These embodiments may further involve, based on the frequency, determining, by the computing system, a second seed image.
  • In some embodiments, determining the second seed image comprises making a determination, for the object class, that the frequency at which the object class appears in the augmented set of images is below a threshold and, based on the determination, selecting, from the augmented set of images, an image that is associated with the object class to be the second seed image.
  • Some embodiments involve locating, by the computing system, a second foreground object disposed within the second seed image and identifying, by the computing system, a second object class corresponding to the second foreground object. Such embodiments may also involve, based on the identified second object class, determining, by the computing system, a target value for an object property of the second foreground object and applying, by the computing system, a transformation function to transform the second foreground object into a second transformed object, where the transformation function modifies the object property of the second foreground object from having an initial value to having the target value. Such embodiments may further involve transplanting, by the computing system, the second transformed object into the background image so as to produce a second augmented image and augmenting, by the computing system, the augmented set of images with the second augmented image so as to produce a second augmented set of images for training the predictive model.
  • In some embodiments, the transformation function is an affine image transformation. For example, the transformation function could map each pixel in the foreground object to one or more output pixels in the transformed object. In other embodiments, the transformation function is a linear transformation. In some embodiments, both the foreground object and the transformed object are associated with the object class.
  • Some embodiments involve training, by the computing system, the predictive model to determine a respective object class associated with each image in the augmented set of images.
  • Some embodiments involve selecting, from the initial set of images, a candidate image to be the seed image. Some embodiments involve receiving, from a client device, the seed image.
  • In some embodiments, the computing system is a robotic system that operates in a plurality of environments and the initial set of images are images previously captured by the robotic system as the robotic system operated in the plurality of environments.
  • In some embodiments, identifying the object class corresponding to the foreground object comprises generating one or more graphical user interfaces that contain data fields for inputting the object class; transmitting, to a client device, the one or more graphical user interfaces; and receiving, from the client device, the object class by way of the data fields.
  • Some embodiments involve, based on the identified object class, determining, by the computing system, a second target value for an object property of the foreground object. Such embodiments also involve applying, by the computing system, a second transformation function to transform the foreground object into a second transformed object, where the second transformation function modifies the object property of the foreground object from having an initial value to having the second target value. Such embodiments further involve transplanting, by the computing system, the second transformed object into the background image so as to produce a second augmented image and augmenting, by the computing system, the augmented set of images with the second augmented image so as to produce a second augmented set of images for training the predictive model.
  • Some embodiments involve, based on the identified object class, determining, by the computing system, a target value for a second object property of the foreground object, where the transformation function further modifies the second object property of the foreground object from having an initial value to having the target value.
  • The particular arrangements shown in the Figures should not be viewed as limiting. It should be understood that other embodiments may include more or less of each element shown in a given Figure. Further, some of the illustrated elements may be combined or omitted. Yet further, an illustrative embodiment may include elements that are not illustrated in the Figures.
  • A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data). The program code can include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including a disk, hard drive, or other storage medium.
  • The computer readable medium can also include non-transitory computer readable media such as computer-readable media that store data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media can also include non-transitory computer readable media that store program code and/or data for longer periods of time. Thus, the computer readable media may include secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.
  • While various examples and embodiments have been disclosed, other examples and embodiments will be apparent to those skilled in the art. The various disclosed examples and embodiments are for purposes of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.

Claims (20)

We claim:
1. A computer-implemented method comprising:
receiving, by a computing system, an indication of a pre-segmented object for transformation to augment a set of images for training a predictive model;
applying, by the computing system, a transformation function to transform the pre-segmented object into a transformed object, wherein the transformation function modifies an object property of the pre-segmented object from having an initial value to having a target value;
transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image; and
augmenting, by the computing system, the set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
2. The computer-implemented method of claim 1, further comprising identifying, by the computing system, an object class corresponding to the pre-segmented object, wherein the target value is determined based on properties of real world objects of the identified object class.
3. The computer-implemented method of claim 2, wherein the target value is determined based on a statistical metric of ground truth property values associated with real world objects of the identified object class.
4. The computer-implemented method of claim 3, wherein the statistical metric comprises a median value.
5. The computer-implemented method of claim 3, wherein the statistical metric comprises values within one standard deviation of a mean.
6. The computer-implemented method of claim 1, wherein the indication of the pre-segmented object is received by the computing system from a client device.
7. The computer-implemented method of claim 1, further comprising determining to augment the set of images based on the predictive model identifying an imbalance in the set of images.
8. The computer-implemented method of claim 1, further comprising:
training the predictive model using the augmented set of images; and
subsequently applying the trained predictive model on a validation data set.
9. The computer-implemented method of claim 8, further comprising:
based on results of applying the trained predictive model on the validation data set, determining to augment one or more additional images from the set of images.
10. The computer-implemented method of claim 1, wherein the computing system is a robotic system, and wherein the set of images are images previously captured by the robotic system as the robotic system operated in an environment.
11. The computer-implemented method of claim 1, wherein the computing system is a robotic system, and wherein the method further comprises:
training the predictive model using the augmented set of images; and
subsequently applying the trained predictive model on one or more images captured by the robotic system as the robotic system operates in an environment.
12. The computer-implemented method of claim 1, wherein the object property comprises a relative size of the pre-segmented object.
13. The computer-implemented method of claim 1, wherein the object property comprises a relative rotation of the pre-segmented object.
14. The computer-implemented method of claim 1, wherein transplanting the transformed object into the background image comprises:
determining a target position value for the pre-segmented object; and
placing the transformed object in the background image in accordance with the target position value.
15. The computer-implemented method of claim 1, wherein the transformation function is an affine image transformation.
16. The computer-implemented method of claim 1, further comprising:
training, by the computing system, the predictive model to determine a respective object class associated with each image in the augmented set of images.
17. The computer-implemented method of claim 1, further comprising identifying an object class corresponding to the pre-segmented object, wherein applying the transformation function is based on the object class.
18. The computer-implemented method of claim 17, wherein identifying the object class corresponding to the pre-segmented object comprises:
generating one or more graphical user interfaces that contain data fields for inputting the object class;
transmitting, to a client device, the one or more graphical user interfaces; and
receiving, from the client device, the object class by way of the data fields.
19. A computing system comprising:
one or more processors configured to cause the computing system to carry out operations comprising:
receiving, by the computing system, an indication of a pre-segmented object for transformation to augment a set of images for training a predictive model;
applying, by the computing system, a transformation function to transform the pre-segmented object into a transformed object, wherein the transformation function modifies an object property of the pre-segmented object from having an initial value to having a target value;
transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image; and
augmenting, by the computing system, the set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
20. A non-transitory computer-readable medium, having stored thereon program instructions that, upon execution by one or more processors of a computing system, cause the computing system to carry out operations comprising:
receiving, by the computing system, an indication of a pre-segmented object for transformation to augment a set of images for training a predictive model;
applying, by the computing system, a transformation function to transform the pre-segmented object into a transformed object, wherein the transformation function modifies an object property of the pre-segmented object from having an initial value to having a target value;
transplanting, by the computing system, the transformed object into a background image so as to produce an augmented image; and
augmenting, by the computing system, the set of images with the augmented image so as to produce an augmented set of images for training the predictive model.
US17/657,464 2019-12-17 2022-03-31 True Positive Transplant Pending US20220222772A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/657,464 US20220222772A1 (en) 2019-12-17 2022-03-31 True Positive Transplant

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US16/717,013 US10902551B1 (en) 2019-12-17 2019-12-17 True positive transplant
US17/124,103 US11321809B2 (en) 2019-12-17 2020-12-16 True positive transplant
US17/657,464 US20220222772A1 (en) 2019-12-17 2022-03-31 True Positive Transplant

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US17/124,103 Continuation US11321809B2 (en) 2019-12-17 2020-12-16 True positive transplant

Publications (1)

Publication Number Publication Date
US20220222772A1 true US20220222772A1 (en) 2022-07-14

Family

ID=73789904

Family Applications (3)

Application Number Title Priority Date Filing Date
US16/717,013 Active US10902551B1 (en) 2019-12-17 2019-12-17 True positive transplant
US17/124,103 Active US11321809B2 (en) 2019-12-17 2020-12-16 True positive transplant
US17/657,464 Pending US20220222772A1 (en) 2019-12-17 2022-03-31 True Positive Transplant

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US16/717,013 Active US10902551B1 (en) 2019-12-17 2019-12-17 True positive transplant
US17/124,103 Active US11321809B2 (en) 2019-12-17 2020-12-16 True positive transplant

Country Status (2)

Country Link
US (3) US10902551B1 (en)
EP (1) EP3839892A1 (en)

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10678244B2 (en) 2017-03-23 2020-06-09 Tesla, Inc. Data synthesis for autonomous control systems
US11893393B2 (en) 2017-07-24 2024-02-06 Tesla, Inc. Computational array microprocessor system with hardware arbiter managing memory requests
US11409692B2 (en) 2017-07-24 2022-08-09 Tesla, Inc. Vector computational unit
US11157441B2 (en) 2017-07-24 2021-10-26 Tesla, Inc. Computational array microprocessor system using non-consecutive data formatting
US10671349B2 (en) 2017-07-24 2020-06-02 Tesla, Inc. Accelerated mathematical engine
US11561791B2 (en) 2018-02-01 2023-01-24 Tesla, Inc. Vector computational unit receiving data elements in parallel from a last row of a computational array
US11215999B2 (en) 2018-06-20 2022-01-04 Tesla, Inc. Data pipeline and deep learning system for autonomous driving
US11361457B2 (en) 2018-07-20 2022-06-14 Tesla, Inc. Annotation cross-labeling for autonomous control systems
US11636333B2 (en) 2018-07-26 2023-04-25 Tesla, Inc. Optimizing neural network structures for embedded systems
US11562231B2 (en) 2018-09-03 2023-01-24 Tesla, Inc. Neural networks for embedded devices
CA3115784A1 (en) 2018-10-11 2020-04-16 Matthew John COOPER Systems and methods for training machine models with augmented data
US11196678B2 (en) 2018-10-25 2021-12-07 Tesla, Inc. QOS manager for system on a chip communications
US11816585B2 (en) 2018-12-03 2023-11-14 Tesla, Inc. Machine learning models operating at different frequencies for autonomous vehicles
US11537811B2 (en) 2018-12-04 2022-12-27 Tesla, Inc. Enhanced object detection for autonomous vehicles based on field view
US11610117B2 (en) 2018-12-27 2023-03-21 Tesla, Inc. System and method for adapting a neural network model on a hardware platform
US10997461B2 (en) 2019-02-01 2021-05-04 Tesla, Inc. Generating ground truth for machine learning from time series elements
US11567514B2 (en) 2019-02-11 2023-01-31 Tesla, Inc. Autonomous and user controlled vehicle summon to a target
US10956755B2 (en) 2019-02-19 2021-03-23 Tesla, Inc. Estimating object properties using visual image data

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026958A1 (en) * 2012-02-24 2019-01-24 Matterport, Inc. Employing three-dimensional (3d) data predicted from two-dimensional (2d) images using neural networks for 3d modeling applications and other applications
US20190050981A1 (en) * 2017-08-09 2019-02-14 Shenzhen Keya Medical Technology Corporation System and method for automatically detecting a target object from a 3d image
US20190228495A1 (en) * 2018-01-23 2019-07-25 Nvidia Corporation Learning robotic tasks using one or more neural networks
US20190251397A1 (en) * 2018-02-14 2019-08-15 Nvidia Corporation Generation of Synthetic Images For Training a Neural Network Model
US20190392211A1 (en) * 2018-03-30 2019-12-26 Greensight Agronomics, Inc. System to automatically detect and report changes over time in a large imaging data set
US20200160178A1 (en) * 2018-11-16 2020-05-21 Nvidia Corporation Learning to generate synthetic datasets for traning neural networks
US10664722B1 (en) * 2016-10-05 2020-05-26 Digimarc Corporation Image processing arrangements
US20200193597A1 (en) * 2018-12-14 2020-06-18 Spectral Md, Inc. Machine learning systems and methods for assessment, healing prediction, and treatment of wounds
US20200382361A1 (en) * 2019-05-30 2020-12-03 Samsung Electronics Co., Ltd Root cause analysis and automation using machine learning
US20200394752A1 (en) * 2018-02-27 2020-12-17 Portland State University Context-aware synthesis for video frame interpolation
US20210081719A1 (en) * 2019-09-13 2021-03-18 Oracle International Corporation Scalable architecture for automatic generation of content distribution images
US10984284B1 (en) * 2018-11-19 2021-04-20 Automation Anywhere, Inc. Synthetic augmentation of document images
US20210117729A1 (en) * 2018-03-16 2021-04-22 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Using machine learning and/or neural networks to validate stem cells and their derivatives (2-d cells and 3-d tissues) for use in cell therapy and tissue engineered products
US20210146531A1 (en) * 2019-11-20 2021-05-20 Nvidia Corporation Guided uncertainty-aware policy optimization: combining model-free and model-based strategies for sample-efficient learning
US20210225038A1 (en) * 2020-01-16 2021-07-22 Fyusion, Inc. Visual object history
US20210271968A1 (en) * 2018-02-09 2021-09-02 Deepmind Technologies Limited Generative neural network systems for generating instruction sequences to control an agent performing a task
US20210350168A1 (en) * 2019-03-01 2021-11-11 Huawei Technologies Co., Ltd. Image segmentation method and image processing apparatus
US20210365707A1 (en) * 2020-05-20 2021-11-25 Qualcomm Incorporated Maintaining fixed sizes for target objects in frames
US20210374985A1 (en) * 2020-05-29 2021-12-02 Nike, Inc. Systems and Methods for Processing Captured Images
US20210390696A1 (en) * 2019-03-11 2021-12-16 Canon Kabushiki Kaisha Medical image processing apparatus, medical image processing method and computer-readable storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101221623B (en) * 2008-01-30 2011-11-23 北京中星微电子有限公司 Object type on-line training and recognizing method and system thereof
US10867416B2 (en) * 2017-03-10 2020-12-15 Adobe Inc. Harmonizing composite images using deep learning
US10474988B2 (en) * 2017-08-07 2019-11-12 Standard Cognition, Corp. Predicting inventory events using foreground/background processing
US10346721B2 (en) * 2017-11-01 2019-07-09 Salesforce.Com, Inc. Training a neural network using augmented training datasets
US10839517B2 (en) * 2019-02-21 2020-11-17 Sony Corporation Multiple neural networks-based object segmentation in a sequence of color image frames

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026958A1 (en) * 2012-02-24 2019-01-24 Matterport, Inc. Employing three-dimensional (3d) data predicted from two-dimensional (2d) images using neural networks for 3d modeling applications and other applications
US10664722B1 (en) * 2016-10-05 2020-05-26 Digimarc Corporation Image processing arrangements
US20190050981A1 (en) * 2017-08-09 2019-02-14 Shenzhen Keya Medical Technology Corporation System and method for automatically detecting a target object from a 3d image
US20190228495A1 (en) * 2018-01-23 2019-07-25 Nvidia Corporation Learning robotic tasks using one or more neural networks
US20210271968A1 (en) * 2018-02-09 2021-09-02 Deepmind Technologies Limited Generative neural network systems for generating instruction sequences to control an agent performing a task
US20190251397A1 (en) * 2018-02-14 2019-08-15 Nvidia Corporation Generation of Synthetic Images For Training a Neural Network Model
US20200394752A1 (en) * 2018-02-27 2020-12-17 Portland State University Context-aware synthesis for video frame interpolation
US20210117729A1 (en) * 2018-03-16 2021-04-22 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Using machine learning and/or neural networks to validate stem cells and their derivatives (2-d cells and 3-d tissues) for use in cell therapy and tissue engineered products
US20190392211A1 (en) * 2018-03-30 2019-12-26 Greensight Agronomics, Inc. System to automatically detect and report changes over time in a large imaging data set
US20200160178A1 (en) * 2018-11-16 2020-05-21 Nvidia Corporation Learning to generate synthetic datasets for traning neural networks
US10984284B1 (en) * 2018-11-19 2021-04-20 Automation Anywhere, Inc. Synthetic augmentation of document images
US20200193597A1 (en) * 2018-12-14 2020-06-18 Spectral Md, Inc. Machine learning systems and methods for assessment, healing prediction, and treatment of wounds
US20210350168A1 (en) * 2019-03-01 2021-11-11 Huawei Technologies Co., Ltd. Image segmentation method and image processing apparatus
US20210390696A1 (en) * 2019-03-11 2021-12-16 Canon Kabushiki Kaisha Medical image processing apparatus, medical image processing method and computer-readable storage medium
US20200382361A1 (en) * 2019-05-30 2020-12-03 Samsung Electronics Co., Ltd Root cause analysis and automation using machine learning
US20210081719A1 (en) * 2019-09-13 2021-03-18 Oracle International Corporation Scalable architecture for automatic generation of content distribution images
US20210146531A1 (en) * 2019-11-20 2021-05-20 Nvidia Corporation Guided uncertainty-aware policy optimization: combining model-free and model-based strategies for sample-efficient learning
US20210225038A1 (en) * 2020-01-16 2021-07-22 Fyusion, Inc. Visual object history
US20210365707A1 (en) * 2020-05-20 2021-11-25 Qualcomm Incorporated Maintaining fixed sizes for target objects in frames
US20210374985A1 (en) * 2020-05-29 2021-12-02 Nike, Inc. Systems and Methods for Processing Captured Images

Also Published As

Publication number Publication date
US20210183008A1 (en) 2021-06-17
EP3839892A1 (en) 2021-06-23
US11321809B2 (en) 2022-05-03
US10902551B1 (en) 2021-01-26

Similar Documents

Publication Publication Date Title
US11321809B2 (en) True positive transplant
CN110363058B (en) Three-dimensional object localization for obstacle avoidance using one-shot convolutional neural networks
EP3639241B1 (en) Voxel based ground plane estimation and object segmentation
Wu et al. Deep 3D object detection networks using LiDAR data: A review
US11302011B2 (en) Perspective conversion for multi-dimensional data analysis
US11062454B1 (en) Multi-modal sensor data association architecture
Erol et al. Improved deep neural network object tracking system for applications in home robotics
CN109551476B (en) Robot system combined with cloud service system
Kolhatkar et al. Review of SLAM algorithms for indoor mobile robot with LIDAR and RGB-D camera technology
US11613016B2 (en) Systems, apparatuses, and methods for rapid machine learning for floor segmentation for robotic devices
US11887363B2 (en) Training a deep neural network model to generate rich object-centric embeddings of robotic vision data
WO2021097426A1 (en) Systems and methods for training neural networks on a cloud server using sensory data collected by robots
Shi et al. Lidar-based place recognition for autonomous driving: A survey
Li et al. A YOLO-GGCNN based grasping framework for mobile robots in unknown environments
Lu et al. Knowing where I am: exploiting multi-task learning for multi-view indoor image-based localization.
WO2022027015A1 (en) Systems and methods for preserving data and human confidentiality during feature identification by robotic devices
Singh et al. Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots
CN116523952A (en) Estimating 6D target pose using 2D and 3D point-by-point features
Pandey et al. Toward mutual information based place recognition
Hamieh et al. LiDAR and Camera-Based Convolutional Neural Network Detection for Autonomous Driving
KR102616028B1 (en) Apparatus and method for performing visual localization effectively
Al Baghdadi et al. Unmanned aerial vehicles and machine learning for detecting objects in real time
US20230303084A1 (en) Systems and methods for multi-modal data augmentation for perception tasks in autonomous driving
US20230334697A1 (en) 3d environment reconstruction for persistent object tracking
Fu Design and development of an FPGA-based hardware accelerator for corner feature extraction and genetic algorithm-based SLAM system

Legal Events

Date Code Title Description
AS Assignment

Owner name: X DEVELOPMENT LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BATALLER, IGNACIO PABLO MELLADO;GREENBERG, ALEXA;LEGER, CHRIS;SIGNING DATES FROM 20200226 TO 20200229;REEL/FRAME:059463/0609

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:X DEVELOPMENT LLC;REEL/FRAME:064658/0001

Effective date: 20230401

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED