CN116894799A - Data enhancement for domain generalization - Google Patents

Data enhancement for domain generalization Download PDF

Info

Publication number
CN116894799A
CN116894799A CN202310376448.2A CN202310376448A CN116894799A CN 116894799 A CN116894799 A CN 116894799A CN 202310376448 A CN202310376448 A CN 202310376448A CN 116894799 A CN116894799 A CN 116894799A
Authority
CN
China
Prior art keywords
image
target image
foreground
source
background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310376448.2A
Other languages
Chinese (zh)
Inventor
L·贝格尔
F·J·卡布里塔孔德萨
R·胡特马彻
J·科尔特
N·T·P·吴
F·谢赫勒斯拉米
D·T·威尔莫特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Robert Bosch GmbH
Original Assignee
Robert Bosch GmbH
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Robert Bosch GmbH filed Critical Robert Bosch GmbH
Publication of CN116894799A publication Critical patent/CN116894799A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30164Workpiece; Machine component

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

Data enhancement for domain generalization is provided. Methods and systems for generating training data for a machine learning model to obtain better performance of the model are disclosed. A source image is selected from the image database along with a target image. A source image segmentation mask having a foreground region and a background region is generated from a source image using an image segmenter. The same operation is performed on the target image to generate a target image segmentation mask having a foreground region and a background region. The foreground and background of the source and target images are determined based on the mask. The target image foreground is removed from the target image and the source image foreground is inserted into the target image to create an enhanced image having the source image foreground and the target image background. Training data of the machine learning model is updated to include the enhanced image.

Description

Data enhancement for domain generalization
Technical Field
The present disclosure relates to data enhancement for domain generalization. In an embodiment, the present disclosure relates to creating a new enhanced image using image segmentation into machine learning applications.
Background
Machine learning has achieved significant success in various areas such as computer vision, natural language processing, healthcare, autopilot, and the like. The goal of machine learning is to design a model that can learn general knowledge and predictive knowledge from training data, and then apply the model to the new (test) data. Such machine learning models may rely on domain generalization, also known as domain shifting or domain regularization.
Domain generalization utilizes a machine learning model to address tasks where data may come from various domains. For example, the model may be an object detection model or an image recognition model that should perform adequately to properly detect objects under various conditions (e.g., background, illumination, ambiguity, interference, contrast, etc.). These different conditions constitute different data fields of the model. Model performance will typically vary depending on the domain and, in particular, depending on the amount of data from each domain in the dataset used to train the model. The task of domain generalization is to maintain model performance in domains that are not adequately represented in the training set or that are not represented at all (i.e., the trained model has never seen these domains before). In short, the domain generalization process has a challenging setup where one or several different but related domains are given and the goal is to learn a model that can be generalized to the unseen test domains.
Disclosure of Invention
According to an embodiment, a system for performing at least one task under autonomous control of a component comprises: an image sensor configured to output an image of the component; an actuator configured to sort the parts based on the detected defects in the parts; a processor; and a memory comprising instructions that when executed by the processor cause the processor to: retrieving a source image stored in a memory; retrieving the target image stored in the memory; generating a source image segmentation mask having a foreground region and a background region with a source image using an image segmenter; generating a target image segmentation mask having a foreground region and a background region with the target image using an image segmenter; determining a source image foreground and a source image background of the source image based on the source image segmentation mask; determining a target image foreground and a target image background of the target image based on the target image segmentation mask; removing a target image foreground from the target image; inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background; updating training data of the machine learning model with the enhanced image; determining a defect in the component using a machine learning model having updated training data; and actuating an actuator to sort the parts based on the determined defects in the parts.
According to an embodiment, a computer-implemented method for generating training data for a machine learning model includes: selecting a source image from an image database; selecting a target image from an image database; generating a source image segmentation mask having a foreground region and a background region with a source image using an image segmenter; generating a target image segmentation mask having a foreground region and a background region with the target image using an image segmenter; determining a source image foreground and a source image background of the source image based on the source image segmentation mask; determining a target image foreground and a target image background of the target image based on the target image segmentation mask; removing a target image foreground from the target image; inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background; and updating training data of the machine learning model with the enhanced image.
According to another embodiment, a system for training a machine learning model includes: a computer readable storage medium configured to store computer executable instructions; and one or more processors configured to execute the computer-executable instructions, the computer-executable instructions comprising: generating a source image segmentation mask having a foreground region and a background region with a source image using an image segmenter; generating a target image segmentation mask having a foreground region and a background region with the target image using an image segmenter; determining a source image foreground and a source image background of the source image based on the source image segmentation mask; determining a target image foreground and a target image background of the target image based on the target image segmentation mask; removing a target image foreground from the target image; inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background; and updating training data of the machine learning model with the enhanced image.
According to another embodiment, a non-transitory computer-readable medium includes a plurality of instructions that when executed by a processor cause the processor to: generating a source image segmentation mask having a foreground region and a background region with a source image using an image segmenter; generating a target image segmentation mask having a foreground region and a background region with the target image using an image segmenter; determining a source image foreground and a source image background of the source image based on the source image segmentation mask; determining a target image foreground and a target image background of the target image based on the target image segmentation mask; removing a target image foreground from the target image; inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background; and updating training data of the machine learning model with the enhanced image.
Drawings
Fig. 1 illustrates a system for training a neural network, according to an embodiment.
FIG. 2 illustrates a computer-implemented method for training and utilizing a neural network, according to an embodiment.
FIG. 3A illustrates a system flow diagram for generating training data for a machine learning model according to an embodiment, and FIG. 3B illustrates a schematic representation of generating training data according to an embodiment.
Fig. 4 shows a schematic diagram of a deep neural network with nodes in an input layer, multiple hidden layers, and an output layer, according to an embodiment.
FIG. 5 depicts a schematic diagram of interactions between a computer controlled machine and a control system according to an embodiment.
Fig. 6 depicts a schematic diagram of the control system of fig. 5 configured to control a vehicle, which may be a partially autonomous vehicle, a fully autonomous vehicle, a partially autonomous robot, or a fully autonomous robot, according to an embodiment.
Fig. 7 depicts a schematic diagram of the control system of fig. 5 configured to control a manufacturing machine, such as a punch cutter, a cutter, or a gun drill, of a manufacturing system (e.g., part of a production line).
Fig. 8 depicts a schematic of the control system of fig. 5 configured to control a power tool, such as a power drill or driver, having an at least partially autonomous mode.
Fig. 9 depicts a schematic diagram of the control system of fig. 5 configured to control an automated personal assistant.
Fig. 10 depicts a schematic diagram of the control system of fig. 5 configured to control a monitoring system, such as a control access system or a supervisory system.
Fig. 11 depicts a schematic diagram of the control system of fig. 5 configured to control an imaging system, such as an MRI apparatus, an x-ray imaging apparatus, or an ultrasound apparatus.
FIG. 12 is a flowchart of an algorithm for generating training data for a machine learning model and updating a database of machine learning models, according to an embodiment.
Detailed Description
Embodiments of the present disclosure are described herein. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms. The figures are not necessarily to scale; some of the figures may be exaggerated or minimized to show details of particular components. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ the various embodiments. As will be appreciated by those of ordinary skill in the art, the various features illustrated and described with reference to any one of the figures may be combined with features illustrated in one or more other figures to produce embodiments that are not explicitly illustrated or described. The combination of features shown provides representative embodiments for typical applications. However, various combinations and modifications of the features consistent with the teachings of the present disclosure may be desirable for particular applications or implementations.
Embodiments of the present disclosure relate to domain generalization, also known as domain shifting or domain regularization. In domain generalization, a model F is provided, and is intended to address tasks where data may come from various domains. For example, F may be an object detection model in an autonomous vehicle that one wishes to perform well under various weather conditions, such as sunny days, cloudy days, rainy days, nighttime, etc. Different weather conditions will constitute different data fields. Model performance will typically vary depending on the domain, and in particular, depending on the amount of data from each domain in the dataset and the amount of data from each domain used to train the model. One task of domain generalization is to maintain model performance in domains that are not adequately represented in the training set or that are not represented at all (e.g., the trained model never sees these domains before the test time).
As will be described more fully below, the training data of the model may include images (e.g., photographs, lidar, radar, etc.), where the model is trained to determine what the detected object is based on its comparison with stored training data. The accuracy of the model may be improved without too much training data for a particular domain. For example, the training data may not have a sufficient number of images for a particular type of object, and thus the corresponding accuracy in the object detection model applied to that type of object may lag.
Image enhancement can help create more images for the machine learning model database. Image enhancement involves changing the image training data to create a new image, and storing the new image in the training data to expand the number of images stored in the training database.
CutMix is an image enhancement tool. In CutMix, a new training image is created by combining images from the original training set. Two images (e.g., a "source" image and a "target" image) are selected and a new image (e.g., an "enhanced" image) is created. An enhanced image is created by selecting a rectangular patch (patch) from the source image and pasting it onto the target image. The label of the enhanced image is regarded as a weighted average of the source image class and the target image class, wherein the weight is given by the proportion of pixels forming the source image and the target image. For example, if the rectangular patch from the source image is 1/4 of the image size, the label that enhances imaging will be 0.25[ source image class ] and 0.75[ target image class ].
In contrast, the invention described in accordance with the various embodiments herein relates to determining which portions of each image are combined, and how to determine the labels of the resulting images. Rather than selecting random rectangular patches of the source image, the methods and systems described herein use an image segmenter (image segmentation machine learning model) to divide pixels in the source image into foreground and background regions and to divide pixels in the target image into foreground and background regions. In other words, the foreground and background of each of the source image and the target image are determined. The foreground of the source image is then combined with the background of the target image to create a new enhanced image or stitched image for addition to the training data of the machine learning model. Unlike CutMix, the region cropped from the source image is the foreground of the source image determined by the image segmenter. This may make the enhanced image look more natural, thereby allowing better performance of the machine learning model that relies on the trained data.
The disclosed systems and methods rely on machine learning models, such as neural networks (e.g., deep Neural Networks (DNNs), graphic Neural Networks (GNNs), deep Convolutional Networks (DCNs), convolutional Neural Networks (CNNs), etc.), and the like. Fig. 1 illustrates a system 100 for training a neural network (e.g., deep neural network). The neural networks or deep neural networks shown and described are merely examples of the types of machine learning networks or neural networks that may be used. The system 100 may include an input interface for accessing training data 102 of a neural network. For example, as shown in FIG. 1, the input interface may be constituted by a data storage interface 104 that may access training data 102 from a data store 106. For example, the data storage interface 104 may be a memory interface or a persistent storage interface, such as a hard disk or SSD interface, but may also be a personal, local or wide area network interface, such as a Bluetooth, zigbee or Wi-Fi interface, or an Ethernet or fiber optic interface. The data store 106 may be an internal data store of the system 100, such as a hard disk drive or SSD, but may also be an external data store, e.g., a network accessible data store.
In some embodiments, the data store 106 may also include a data representation 108 of an untrained version of the neural network that the system 100 may access from the data store 106. However, it will be appreciated that the data representation 108 of the trained data 102 and the untrained neural network may also be accessed from different data stores, respectively, e.g., through different subsystems of the data storage interface 104. Each subsystem may be of the type of data storage interface 104 described above. In other embodiments, the data representation 108 of the untrained neural network may be generated internally by the system 100 based on design parameters of the neural network, and thus may not be stored explicitly on the data store 106. The system 100 may also include a processor subsystem 110, which may be configured to provide an iterative function during operation of the system 100 as an alternative to the stacking of neural network layers to be trained. Here, the individual layers of the layer stack that are replaced may have weights that are shared with each other and may receive as input the output of the previous layer or, for the first layer of the layer stack, a portion of the input of the initial activation and layer stack. The processor subsystem 110 may be further configured to iteratively train the neural network using the training data 102. Here, the training iterations of the processor subsystem 110 may include a forward propagating portion and a backward propagating portion. The processor subsystem 110 may be configured to perform the forward propagating portion by: in addition to defining other operations of the forward propagating portion that may be performed, determining a balance point of the iterative function at which the iterative function converges to a fixed point, wherein determining the balance point includes using a numerical root search algorithm to find a root solution of the iterative function minus its input; and providing a balance point to replace the output of the layer stack in the neural network. The system 100 may also include an output interface for outputting a data representation 112 of the trained neural network, which data may also be referred to as trained model data 112. For example, as also shown in FIG. 1, the output interface may be constituted by a data storage interface 104, which in these embodiments is an input/output ("IO") interface through which trained model data 112 may be stored in data store 106. For example, the data representation 108 defining an "untrained" neural network may be replaced during or after training, at least in part, with the data representation 112 of a trained neural network, as parameters of the neural network, such as weights, superparameters, and other types of parameters of the neural network, may be adapted to reflect training on the training data 102. In fig. 1, this is also illustrated by reference numerals 108, 112 referring to the same data record on the data store 106. In other embodiments, the data representation 112 may be stored separately from the data representation 108 defining the "untrained" neural network. In some embodiments, the output interface may be separate from the data storage interface 104, but may generally be of the type of data storage interface 104 described above.
The architecture of system 100 is one example of a system that may be used to train the graphical and deep neural networks described herein. Additional structure for operating and training the machine learning model is shown in fig. 2.
Fig. 2 depicts a system 200 for implementing a machine learning model described herein, such as a deep neural network described herein. Other types of machine learning models may be used, and the DNNs described herein are not the only type of machine learning model that can be used with the systems of the present disclosure. For example, if the input image contains an ordered sequence of pixels, CNN may be utilized. The system 200 may be implemented to perform one or more of the stages of image recognition described herein. The system 200 may include at least one computing system 202. The computing system 202 may include at least one processor 204, the processor 204 being operatively connected to a memory unit 208. The processor 204 may include one or more integrated circuits that implement the functionality of a Central Processing Unit (CPU) 206. CPU 206 may be a commercially available processing unit that implements an instruction set such as one of the x86, ARM, power, or MIPS instruction set families. During operation, CPU 206 may execute stored program instructions retrieved from memory unit 208. The stored program instructions may include software that controls the operation of the CPU 206 to perform the operations described herein. In some examples, processor 204 may be a system-on-chip (SoC) that integrates the functionality of CPU 206, memory unit 208, network interfaces, and input/output interfaces into a single integrated device. Computing system 202 may implement an operating system for managing various aspects of operation. Although one processor 204, one CPU 206, and one memory 208 are shown in fig. 2, of course, more than one each of the foregoing may be utilized in the overall system.
Memory unit 208 may include volatile memory and nonvolatile memory for storing instructions and data. The non-volatile memory may include solid state memory, such as NAND flash memory, magnetic and optical storage media, or any other suitable data storage device that retains data when the computing system 202 is disabled or loses power. Volatile memory can include static and dynamic Random Access Memory (RAM) which stores program instructions and data. For example, the memory unit 208 may store a machine learning model 210 or algorithm, a training data set 212 of the machine learning model 210, a raw source data set 216.
The computing system 202 may include a network interface device 222 configured to provide communication with external systems and devices. For example, the network interface device 222 may include a wired and/or wireless Ethernet interface defined by an Institute of Electrical and Electronics Engineers (IEEE) 802.11 family of standards. The network interface device 222 may include a cellular communication interface for communicating with a cellular network (e.g., 3G, 4G, 5G). The network interface device 222 may be further configured to provide a communication interface to an external network 224 or cloud.
The external network 224 may be referred to as the world wide web or the internet. External network 224 may establish a standard communication protocol between computing devices. External network 224 may allow for easy exchange of information and data between computing devices and the network. One or more servers 230 may communicate with external network 224.
Computing system 202 may include an input/output (I/O) interface 220, which may be configured to provide digital and/or analog inputs and outputs. The I/O interface 220 is used to transfer information between an internal storage and an external input and/or output device (e.g., HMI device). The I/O220 interface may include associated circuitry or a BUS network to transfer information to or between the processor(s) and the storage. For example, the I/O interface 220 may include: digital I/O logic lines that may be read or set by the processor(s); a handshake line for supervising data transfer through the I/O line; a timing and counting facility; as well as other known structures that provide such functionality. Examples of input devices include a keyboard, mouse, sensor, and the like. Examples of output devices include monitors, printers, speakers, and the like. The I/O interface 220 may include additional serial interfaces (e.g., universal Serial Bus (USB) interfaces) for communicating with external devices. The I/O interface 220 may be referred to as an input interface (as it communicates data from an external input, such as a sensor) or an output interface (as it communicates data to an external output, such as a display).
The computing system 202 may include a human-machine interface (HMI) device 218, which device 218 may include any device that enables the system 200 to receive control inputs. Examples of input devices may include human interface inputs such as keyboards, mice, touch screens, voice input devices, and other similar devices. The computing system 202 may include a display device 232. Computing system 202 may include hardware and software for outputting graphical and textual information to display device 232. Display device 232 may include an electronic display screen, projector, printer, or other suitable device for displaying information to a user or operator. The computing system 202 may be further configured to allow interaction with a remote HMI and a remote display device through the network interface device 222.
System 200 may be implemented using one or more computing systems. While this example depicts a single computing system 202 implementing all of the described features, it is intended that the various features and functions be separated and implemented by multiple computing units in communication with each other. The particular system architecture selected may depend on a variety of factors.
The system 200 may implement a machine learning algorithm 210 configured to analyze an original source data set 216. The raw source data set 216 may include raw or unprocessed sensor data, which may represent an input data set for a machine learning system. Raw source data set 216 may include video, video clips, images, text-based information, audio or human speech, time series data (e.g., time-varying pressure sensor signals), raw or partially processed sensor data (e.g., radar maps of objects). Further, the raw source data set 216 may be input data from associated sensors such as cameras, lidar, radar, ultrasonic sensors, motion sensors, thermal imaging cameras, or any other type of sensor that produces associated data having spatial dimensions within which there are some concepts of "front Jing and" background ". The input or input "image" referred to herein need not be from a camera, but may be from any of the sensors listed above. Several different input examples are shown and described with reference to fig. 6-12. In some examples, the machine learning algorithm 210 may be a neural network algorithm (e.g., a deep neural network) designed to perform a predetermined function. For example, the neural network algorithm may be configured to identify defects (e.g., cracks, stresses, bumps, etc.) in a component after the component is manufactured but before leaving the factory. In another embodiment, a neural network algorithm may be configured in an automotive application to identify obstacles or pedestrians in an image, as well as their respective gestures, directions of travel, etc.
The computer system 200 may store a training data set 212 for the machine learning algorithm 210. The training data set 212 may represent a set of previously constructed data used to train the machine learning algorithm 210. The machine learning algorithm 210 may use the training data set 212 to learn weighting factors associated with the neural network algorithm. The training data set 212 may include a set of source data having corresponding achievements or results that the machine learning algorithm 210 attempts to replicate through the learning process. In one example, the training data set 212 may include an input image including an object (e.g., a pedestrian). The input image may include various scenes in which the object is identified.
The machine learning algorithm 210 may operate in a learning mode using the training data set 212 as input. The machine learning algorithm 210 may be performed in multiple iterations using data from the training data set 212. For each iteration, the machine learning algorithm 210 may update the internal weighting factors based on the results of the implementation. For example, the machine learning algorithm 210 may compare the output results (e.g., reconstructed or supplemental images where the image data is input) with the results included in the training data set 212. Because the training data set 212 includes expected results, the machine learning algorithm 210 can determine when performance is acceptable. After the machine learning algorithm 210 reaches a predetermined level of performance (e.g., 100% agreement with the outcome associated with the training data set 212) or converges, the machine learning algorithm 210 may be performed using data not in the training data set 212. It should be understood that in this disclosure, "convergence" may mean that a set (e.g., predetermined) number of iterations has occurred, or that the residual is sufficiently small (e.g., the approximate probability in an iteration is changing by less than a threshold), or other convergence condition. The trained machine learning algorithm 210 may be applied to the new data set to generate annotated data.
The machine learning algorithm 210 may be configured to identify specific features in the original source data 216. The raw source data 216 may include multiple instances or input data sets that require supplemental results. For example, the machine learning algorithm 210 may be configured to identify the presence of pedestrians and annotate occurrences in the video image. In another example, the machine learning algorithm 210 may be configured to identify the presence of a defect in a manufactured part by capturing an image of the part. The machine learning algorithm 210 may be programmed to process the raw source data 216 to identify the presence of a particular feature. The machine learning algorithm 210 may be configured to identify features in the raw source data 216 as predetermined features (e.g., obstacles, pedestrians, landmarks, etc.). The raw source data 216 may be derived from various sources. For example, the raw source data 216 may be actual input data collected by a machine learning system. The raw source data 216 may be machine generated for testing the system. As an example, the raw source data 216 may include raw video images from a camera.
Given the above description of a machine learning model, and the structural examples of fig. 1-2 configured to implement the model, fig. 3 shows a flowchart of a system 300 for generating training data for a machine learning model (e.g., the model described above), and fig. 3B shows a schematic representation of training data generation according to an embodiment. A first image or source image 302 and a second image or target image 304 are received. For example, the source image 302 and the target image 304 may be stored in the database 106 or as training data 212. The system 300 operates to create an enhanced image 306 (or stitched image) from a source image 302 and a target image 304. The enhanced image 306 may then also be stored in the database 106 or the memory 208 as training data 212 for improving the machine learning model 210.
The source image 302 and the target image 304 pass through an image divider 308. The image segmenter 308 may be executed by one or more processors described herein. The image segmenter 308 is configured to create an image segmentation mask corresponding to the source image 302 and the target image 304. The image divider 308 may be configured to transform an image by converting the detected foreground region to a first color and converting the remainder (i.e., background region) to a second color. The image segmenter 308 may be an image segmentation machine learning model, a semantic segmentation model, or the like, configured to determine foreground and background regions of the respective source and target images 302, 304. In such embodiments, the image segmenter 308 may rely on a pre-trained machine learning model (e.g., CNN, DCN, etc., such as described above). The model may be trained based on the color, tint, orientation, and size of objects in the image to determine which objects are in the foreground and which objects are in the background. The image segmenter need not necessarily be a machine learning model; in other embodiments, image segmenter 308 may be programmable software configured to change the pixel colors of source image 302 and target image 304 to create a segmentation mask. For example, the programmable software may be programmed to operate by thresholding in the following manner: image pixels at or above a certain brightness threshold are converted to a first color (e.g., white) and image pixels below the threshold are converted to a second color (e.g., black), and one color is designated as foreground and the other color is designated as background to create an image segmentation mask. In other embodiments, the image itself contains data representing the depth of objects in the image; for example, the image source may be a lidar or radar device, which may include depth information for each pixel or location in the generated image. The final determination of foreground and background may be based on the depth information (e.g., objects below a certain depth threshold are foreground and the rest are background).
The image segmentors 308 may be the same segmentor, or alternatively, one image segmentor may be for the source image 302 and another image segmentor may be for the target image 304.
The output of the image segmenter 308 is an image segmentation mask for each of the source and target images 304. In other words, the image segmenter 308 converts the source image 302 into a source image segmentation mask 310 and converts the target image 304 into a target image segmentation mask 312. The source image segmentation mask 310 has one set of pixels converted to a single color representing the foreground of the source image and another set of pixels converted to another single color representing the background of the source image, as shown in fig. 3B. Two separate masks may be output, one of which shows pixels in the background (e.g., left image) in black and the other shows pixels in the foreground as black (e.g., right image), as shown in fig. 3B. In other embodiments, only one mask is provided such that the foreground is colored one color and the background is colored another color. Taking the example of fig. 3B as an example, pixels inside the star are converted to black and labeled as foreground, and the remaining pixels outside the star are converted to white and labeled as background. For the target image 304, the same process is repeated with the image splitter 308 converting the detected foreground (e.g., ellipse) to one color and converting pixels outside the remaining pixels outside the ellipse to another color.
At 314, the system is configured to remove the foreground of the target image. Removing the foreground of the target image may include rendering (inpainting) on a region of the target image 304 corresponding to the region determined to be foreground by the image segmenter 308. In other words, the pixels of the target image 304 are changed, which correspond to the positions where the target image segmentation mask 312 is colored to represent the foreground. These pixels may change color, paint, darken, lighten, etc. In one embodiment, at 314, all pixels in the foreground of the target image are drawn as colors corresponding to the average color of the background pixels. Additional description of this is provided below with respect to the algorithm shown in fig. 12. At 316, the resulting image is a target image with foreground removed or color set to a single color. In other embodiments, pixels in the foreground of the target image are colored in various colors corresponding to colors in the background of the target image.
At 318, the system pastes the source foreground to the location on the target image where the target image foreground was removed. To this end, the pixels of the target image from 316 at the location where the source image foreground is to be input are changed to match the pixel colors of the source image foreground. This gives the appearance that the source image foreground has been inserted onto or into the target image background. The result is an enhanced image 306. The enhanced image 306 may then be stored or saved to memory (e.g., in the database 106 or in the memory 208 as training data 212 for improving the machine learning model 210).
For illustrative purposes, FIG. 4 shows an example schematic diagram of a machine learning model 400, such as that implemented as an image segmentor, or machine learning model utilizing training data 212, which now includes enhanced image 306. As discussed above, the machine learning model 400 may be a neural network (e.g., in some cases, although not necessarily, a deep neural network). If the machine learning model 400 is used as an image segmentor, the machine learning model 40 may be configured as a data-oriented image processing model that uses a data-oriented approach to determine a new color for each pixel of the target image. If the machine learning model 400 is used as an object detection model (e.g., determining that a defect exists in a component, determining that an object exists outside of a vehicle, etc.), the machine learning model may be configured accordingly. The machine learning model 400 may include an input layer (having a plurality of input nodes) and an output layer (having a plurality of output nodes). In some examples, the machine learning model 400 may include multiple hidden layers. The nodes of the input layer, the output layer, and the hidden layer may be coupled to the nodes of the subsequent layer or the previous layer. And each node of the output layer may perform an activation function-e.g., a function that helps determine whether the corresponding node should be activated to provide the output of the machine learning model 400. The number of nodes shown in the input, hidden, and output layers is merely exemplary, and any suitable number may be used.
As noted above, the machine learning model described herein may be used in many different applications where data enhancement for domain generalization may be beneficial to the model, e.g., pedestrian or road sign detection of vehicles, component defect detection in manufacturing plants, etc. In an example of component defect detection, for example, a machine learning model with updated enhanced training data may be better equipped to determine whether a defect exists in the component by comparing an image of the component (e.g., an image acquired from an image sensor such as a camera, lidar, radar, thermal sensor, etc., as explained herein) with the trained data, and sort the component based on the detected image. Sorting the parts may include marking, sorting, changing the direction flow in an assembly line, laser trimming, or other methods of marking or separating the parts from unprotected parts.
Additional application data enhancements for domain generalization that may be used are shown in fig. 6-11. In these embodiments, additional training data generated by enhancing the image and stored for use by the model ultimately improves the performance of the model. The structure for training and using the machine learning model for these applications (and other applications) is illustrated in fig. 5. FIG. 5 depicts a schematic diagram of interactions between a computer-controlled machine 500 and a control system 502. The computer controlled machine 500 includes an actuator 504 and a sensor 506. The actuator 504 may include one or more actuators and the sensor 506 may include one or more sensors. The sensor 506 is configured to sense a condition of the computer controlled machine 500. The sensor 506 may be configured to encode the sensed condition into a sensor signal 508 and transmit the sensor signal 508 to the control system 502. Non-limiting examples of sensors 506 include video, radar, lidar, ultrasound, and motion sensors, as described above with reference to fig. 1-2. In one embodiment, sensor 506 is an optical sensor configured to sense an optical image of the environment in the vicinity of computer controlled machine 500.
The control system 502 is configured to receive sensor signals 508 from the computer controlled machine 500. As described below, the control system 502 may be further configured to calculate the actuator control commands 510 from the sensor signals and transmit the actuator control commands 540 to the actuators 504 of the computer controlled machine 500.
As shown in fig. 5, the control system 502 includes a receiving unit 512. The receiving unit 512 may be configured to receive the sensor signal 508 from the sensor 506 and to transform the sensor signal 508 into the input signal x. In an alternative embodiment, the sensor signal 508 is received directly as the input signal x without the receiving unit 512. Each input signal x may be a portion of each sensor signal 508. The receiving unit 512 may be configured to process each sensor signal 508 to generate each input signal x. The input signal x may include data corresponding to an image recorded by the sensor 506.
The control system 502 includes a classifier 514. The classifier 514 may be configured to classify the input signal x into one or more labels using a Machine Learning (ML) algorithm, such as the neural network described above. Classifier 514 is configured to be parameterized by parameters such as those described above (e.g., parameter θ). The parameter θ may be stored in and provided by the nonvolatile memory 516. Classifier 514 is configured to determine output signal y from input signal x. Each output signal y includes information that assigns one or more tags to each input signal x. The classifier 514 may transmit the output signal y to the conversion unit 518. The conversion unit 518 is configured to convert the output signal y into an actuator control command 510. The control system 502 is configured to transmit actuator control commands 510 to the actuator 504, the actuator 504 being configured to actuate the computer controlled machine 500 in response to the actuator control commands 510. In another embodiment, the actuator 504 is configured to actuate the computer controlled machine 500 directly based on the output signal y.
Upon receipt of the actuator control commands 510 by the actuator 504, the actuator 504 is configured to perform actions corresponding to the associated actuator control commands 510. The actuator 504 may include control logic configured to transform the actuator control command 510 into a second actuator control command for controlling the actuator 504. In one or more embodiments, the actuator control commands 510 may be used to control the display instead of or in addition to the actuators.
In another embodiment, instead of, or in addition to, computer controlled machine 500 includes a sensor 506, control system 502 includes a sensor 506. In lieu of, or in addition to, computer-controlled machine 500 including an actuator 504, control system 502 may also include an actuator 504.
As shown in fig. 5, the control system 502 also includes a processor 520 and a memory 522. Processor 520 may include one or more processors. Memory 522 may include one or more memory devices. Classifier 514 of one or more embodiments (e.g., a machine learning algorithm, such as those described above for pre-trained classifier 306) may be implemented by control system 502, which control system 502 includes non-volatile storage 516, processor 520, and memory 522.
Nonvolatile storage 516 may include one or more persistent data storage devices, such as hard disk drives, optical drives, tape drives, nonvolatile solid state devices, cloud storage, or any other device capable of persistently storing information. Processor 520 may include one or more devices selected from High Performance Computing (HPC) systems, including high performance cores, microprocessors, microcontrollers, digital signal processors, microcomputers, central processing units, field programmable gate arrays, programmable logic devices, state machines, logic circuits, analog circuits, digital circuits, or any other devices that manipulate signals (analog or digital) based on computer executable instructions residing in memory 522. Memory 522 may include a single memory device or multiple memory devices including, but not limited to, random Access Memory (RAM), volatile memory, non-volatile memory, static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), flash memory, cache memory, or any other device capable of storing information.
Processor 520 may be configured to read into memory 522 and execute computer-executable instructions that reside in non-volatile storage 516 and that embody one or more ML algorithms and/or methods of one or more embodiments. The non-volatile storage 516 may include one or more operating systems and application programs. The non-volatile storage 516 may store compilations and/or interpretative computer programs created using a variety of programming languages and/or techniques, including, but not limited to, java, C, C++, C#, objective C, fortran, pascal, java Script, python, perl, and PL/SQL, alone or in combination.
The computer-executable instructions of the non-volatile storage 516, when executed by the processor 520, may cause the control system 502 to implement one or more of the ML algorithms and/or methods disclosed herein. The non-volatile storage 516 may also include ML data (including data parameters) that supports the functions, features, and processes of one or more embodiments described herein.
Program code that embodies the algorithms and/or methods described herein can be distributed singly or in any combination as a program product in a variety of different forms. The program code may be distributed using a computer readable storage medium having computer readable program instructions thereon for causing a processor to perform aspects of one or more embodiments. Inherently non-transitory computer-readable storage media may include volatile and nonvolatile, as well as removable and non-removable tangible media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media may also include RAM, ROM, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other solid state memory technology, portable compact disc read-only memory (CD-ROM) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be read by a computer. The computer readable program instructions may be downloaded over a network from a computer readable storage medium to a computer, another type of programmable data processing apparatus, or another device, or to an external computer or external storage device.
Computer readable program instructions stored in a computer readable medium may be used to direct a computer, other types of programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function, act, and/or operation specified in the flowchart or diagram block or blocks. In some alternative embodiments, the functions, acts and/or operations specified in the flowchart and diagram block or blocks may be reordered, serially processed and/or concurrently processed in accordance with one or more embodiments. Moreover, any of the flowcharts and/or diagrams may include more or fewer nodes or blocks than those shown in accordance with one or more embodiments.
The processes, methods, or algorithms may be embodied in whole or in part using suitable hardware components (e.g., application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), state machines, controllers, or other hardware components or devices), or combinations of hardware, software, and firmware components.
Fig. 6 depicts a schematic diagram of a control system 502 configured to control a vehicle 600, which vehicle 600 may be an at least partially autonomous vehicle or an at least partially autonomous robot. Carrier 600 includes actuators 504 and sensors 506. The sensors 506 may include one or more video sensors, cameras, radar sensors, ultrasonic sensors, lidar sensors, and/or position sensors (e.g., GPS). One or more of the one or more specific sensors may be integrated into the carrier 600. In the context of logo recognition and processing described herein, sensor 506 is a camera mounted on carrier 600 or integrated into carrier 600. In lieu of, or in addition to, the one or more specific sensors identified above, the sensor 506 may include a software module configured to determine the state of the actuator 504 when executed. One non-limiting example of a software module includes a weather information software module configured to determine a current or future weather state in the vicinity of the vehicle 600 or other location.
The classifier 514 of the control system 502 of the vehicle 600 may be configured to detect objects in the vicinity of the vehicle 600 from the input signal x. In such an embodiment, the output signal y may include information characterizing that the object is in the vicinity of the carrier 600. The actuator control command 510 may be determined from this information. The actuator control commands 510 may be used to avoid collisions with detected objects.
In embodiments where the vehicle 600 is at least partially autonomous, the actuator 504 may be embodied in a brake, propulsion system, engine, driveline, or steering of the vehicle 600. The actuator control commands 510 may be determined to control the actuators 504 such that the vehicle 600 avoids collision with a detected object. The detected objects may also be classified according to what the classifier 514 considers they most likely to be, such as pedestrians or trees. Based on the classification, an actuator control command 510 may be determined. In scenarios where a resistance attack may occur, the above-described system may be further trained to better detect changes in lighting conditions or angles of sensors or cameras on the object or identification carrier 600.
In other embodiments where the vehicle 600 is an at least partially autonomous robot, the vehicle 600 may be a mobile robot configured to perform one or more functions, such as flying, swimming, diving, and stepping. The mobile robot may be an at least partially autonomous mower or an at least partially autonomous cleaning robot. In these embodiments, the actuator control commands 510 may be determined such that the propulsion unit, steering unit, and/or braking unit of the mobile robot may be controlled such that the mobile robot may avoid collisions with the identified objects.
In another embodiment, the vehicle 600 is an at least partially autonomous robot in the form of a horticultural robot. In such embodiments, carrier 600 may use optical sensors as sensors 506 to determine the status of plants in the environment in the vicinity of carrier 600. The actuator 504 may be a nozzle configured to spray a chemical. Based on the identified species and/or identified status of the plant, actuator control commands 510 may be determined to cause actuator 504 to spray the appropriate amount of the appropriate chemical to the plant.
The vehicle 600 may be an at least partially autonomous robot in the form of a household appliance. Non-limiting examples of household appliances include washing machines, ovens, microwave ovens, or dishwashers. In such a carrier 600, the sensor 506 may be an optical sensor configured to detect a state of an object to be subjected to a process by the household appliance. For example, in the case where the home appliance is a washing machine, the sensor 506 may detect a state of laundry in the washing machine. The actuator control command 510 may be determined based on the detected laundry state.
Fig. 7 depicts a schematic diagram of a control system 502 of a system 700 (e.g., a manufacturing machine) configured to control a manufacturing system 702 (e.g., a portion of a production line), the system 700 such as a punch cutter, or gun drill. The control system 502 may be configured to control an actuator 504, the actuator 504 being configured to control a system 700 (e.g., a manufacturing machine).
The sensors 506 of the system 700 (e.g., manufacturing machine) may be optical sensors (e.g., those described above) configured to capture one or more properties of the manufactured product 704. Classifier 514 may be configured to determine a status of manufactured product 704 based on the captured one or more attributes. The actuator 504 may be configured to control the system 700 (e.g., a manufacturing machine) based on a determined state of the manufactured product 704, for subsequent manufacturing steps of the manufactured product 704, or for sorting the manufactured product 704 (e.g., discarding, sorting, marking, trimming, or repairing) if the manufactured product 704 has a detected defect. The actuator 504 may be configured to control a function of the system 700 (e.g., a manufacturing machine) on a subsequent manufactured product 706 of the system 700 (e.g., a manufacturing machine) based on the determined state of the manufactured product 704.
Fig. 8 depicts a schematic diagram of a control system 502 configured to control a power tool 800 (e.g., a power drill or driver) having an at least partially autonomous mode. The control system 502 may be configured to control an actuator 504, the actuator 504 being configured to control the power tool 800.
The sensor 506 of the power tool 800 may be an optical sensor configured to capture one or more properties of the work surface 802 and/or the fastener 804 being driven into the work surface 802. Classifier 514 may be configured to determine a state of work surface 802 and/or fastener 804 relative to work surface 802 based on the captured one or more properties. The condition may be that the fastener 804 is flush with the work surface 802. Alternatively, the condition may be the hardness of the working surface 802. The actuator 504 may be configured to control the power tool 800 such that the driving function of the power tool 800 is adjusted according to a determined state of the fastener 804 relative to the working surface 802 or one or more captured properties of the working surface 802. For example, if the state of the fastener 804 is flush with respect to the working surface 802, the actuator 504 may not continue to drive the function. As another non-limiting example, the actuator 504 may apply additional or less torque depending on the hardness of the working surface 802.
Fig. 9 depicts a schematic diagram of a control system 502 configured to control an automated personal assistant 900. The control system 502 may be configured to control an actuator 504, the actuator 504 being configured to control the automated personal assistant 900. The automated personal assistant 900 may be configured to control a household appliance, such as a washing machine, a stove, an oven, a microwave oven, or a dishwasher.
The sensor 506 may be an optical sensor and/or an audio sensor. The optical sensor may be configured to receive a video image of a gesture 904 of the user 902. The audio sensor may be configured to receive voice commands from the user 902.
The control system 502 of the automated personal assistant 900 may be configured to determine actuator control commands 510 configured to control the system 502. The control system 502 may be configured to determine the actuator control command 510 from the sensor signal 508 of the sensor 506. The automated personal assistant 900 is configured to transmit the sensor signal 508 to the control system 502. Classifier 514 of control system 502 may be configured to execute a gesture recognition algorithm to identify gesture 904 made by user 902, determine actuator control command 510, and transmit actuator control command 510 to actuator 504. Classifier 514 may be configured to retrieve information from non-volatile storage in response to gesture 904 and output the retrieved information in a form suitable for receipt by user 902.
Fig. 10 depicts a schematic diagram of a control system 502 configured to control a monitoring system 1000. The monitoring system 1000 may be configured to physically control access through the gate 1002. The sensor 506 may be configured to detect a scenario associated with deciding whether to grant access. The sensor 506 may be an optical sensor configured to generate and transmit image and/or video data. Control system 502 may use such data to detect a person's face.
The classifier 514 of the control system 502 of the monitoring system 1000 may be configured to interpret the image and/or video data by matching the identity of a known person stored in the non-volatile storage 516 to determine the identity of the person. Classifier 514 may be configured to generate actuator control commands 510 in response to interpretation of the image and/or video data. The control system 502 is configured to transmit actuator control commands 510 to the actuator 504. In this embodiment, the actuator 504 may be configured to lock or unlock the door 1002 in response to the actuator control command 510. In other embodiments, non-physical logical access control may also be performed.
The monitoring system 1000 may also be a supervisory system. In such an embodiment, the sensor 506 may be an optical sensor configured to detect a scene under supervision, and the control system 502 is configured to control the display 1004. Classifier 514 is configured to determine a classification of the scene, e.g., whether the scene detected by sensor 506 is suspicious. The control system 502 is configured to transmit actuator control commands 510 to the display 1004 in response to the classification. The display 1004 may be configured to adjust the content displayed in response to the actuator control commands 510. For example, the display 1004 may highlight objects that the classifier 514 deems suspicious. With embodiments of the disclosed system, the supervisory system can predict that an object will appear at some time in the future.
Fig. 11 depicts a schematic diagram of a control system 502 configured to control an imaging system 1100, such as an MRI apparatus, an x-ray imaging apparatus, or an ultrasound apparatus, for example. For example, sensor 506 may be an imaging sensor. Classifier 514 may be configured to determine a classification of all or part of the sensed image. The classifier 514 may be configured to determine or select the actuator control command 510 in response to a classification obtained through the trained neural network. For example, classifier 514 may interpret the area of the sensed image as a potential anomaly. In this case, the actuator control command 510 may be determined or selected to cause the display 1102 to display the imaged and highlighted potentially anomalous regions.
The processor(s) described herein may be configured to execute an algorithm for generating training data for a machine learning model. One embodiment of the algorithm will now be described. In the following description, the superscript is used to denote different images, and the subscript is used to denote pixels within an image, so x i Is an image, andis image x i Pixels at the middle positions (a, b). />
The algorithm may receive a number of inputs, such as:
data set d= { (x) of image-tag pair 1 ,y 1 ),...,(x n ,y n ) -wherein image x i ∈[0,1] W×H×3 Calculate and y i ∈{1,...,K}
Data field D 1 ,...,D m Wherein each image-tag pair in the dataset belongs to exactly one domain
Image segmentation model or method (e.g., image segmenter 308), e.g.Which predicts whether each pixel will be in the foreground (which corresponds to a pixel with a divider output equal to 1) or the background (divider output 0)
The number N of stitched or enhanced images to be created
The output or return of the algorithm is the enhanced data set D *
According to an embodiment, the algorithm is performed as follows. The processor may initialize the enhanced data set D * : = { }. Then, for i=1,.. the following steps may occur. First, the processor selects source domain D s And target domain D t Wherein D is s ≠D t
It should be noted that there are many ways to select the source domain and the target domain. For example, the algorithm may sample the domain set with some probability. In this case, since it is desirable to improve performance over the domain that is not adequately represented in the training set, the algorithm can be configured with probabilities such that the probability of selecting a data domain decreases as the number of images in the domain increases.
In selecting the source image, the processor may select a source image-tag pair (x s ,y s )∈D s Where x and y are pixels. Also, in selecting the target image, the processor may select a target image-tag pair (x t ,y t )∈D t
The source image and the target image are then passed through an image divider to generate a division mask z s :=G(x s ) And z t :=G(x t )。
The foreground of the target image is then removed by repairing using the average color of the target image background. This can be accomplished as follows. First, the processor(s) determine pixels in the background of the target image (i.e., where the target mask pixels)Is used) the average color c. The processor(s) then mask the pixels for the target thereinIs assigned +.>
In this step, the algorithm essentially "draws on" the foreground by replacing the foreground pixels with the average color in the background. Other methods of removing the target foreground may be used; for example, there are many deep learning methods for repairing or inferring missing pixels that can be used instead.
Once the foreground of the target image is removed, the source foreground may be pasted onto the target image. To this end, the processor(s) is where the source mask pixelsIs assigned +.>
The processor(s) will have a source image tag (x t ,y s ) Is added to the enhancement data set D. In other words, the enhanced image may be marked with the same label as the source image.
Then, a joint dU.D of the original training set and the enhanced data set may be used * Training of the machine learning model.
It should be noted that the algorithm assumes that the images in the dataset are all uniformly sized. If they are not of uniform size, the step of pasting the source foreground onto the target image may be modified to first reshape the source image to the size of the target image and then paste it onto the target image.
FIG. 12 illustrates a method 1200 for generating training data for a machine learning model according to embodiments described herein. The method is a summary of the above algorithm. Method 1200 may be performed by one or more processors and associated memory described herein and may be used with the teachings herein with respect to generating training data (e.g., as shown in fig. 1-4 and described above with respect to one embodiment of an algorithm).
At 1202, the processor selects a source image 302 and at 1204, the processor selects a target image 304. The selection of these images may be based on the need to refine the database of one class of objects. In other words, since it may be desirable to improve performance over domains that are not adequately represented in the training set, probabilities may be utilized in steps 1202 and 1204 such that the probability of selecting a data domain decreases as the number of images in the domain increases. For example, if the machine learning system requires more training for a particular type of landmark, the processor may select the source image 302 because the landmark is in the foreground. Alternatively, these steps may be performed manually.
At 1206, the source image 302 and the target image 304 are passed through an image segmenter (e.g., image segmenter 308 described above) to generate a segmentation mask. With reference to the algorithm described above, a mask z is generated by the segmenter s :=G(x s ) And z t :=G(x t ) So that some pixels are converted to 0 and the rest are converted to 1.
At 1208, the foreground of the target image 304 is removed. This may be performed in accordance with the teachings herein, for example, using the average color of the target image background for repair, or using other methods.
At 1210, a foreground of a source image is rendered onto a target image. Again, according to one embodiment, the pixels are masked according to the source thereinIs +.>To change the pixel color in the target image. This results in an enhanced image.
At 1212, the enhanced image is added to the training data or data set to train the machine learning model.
While exemplary embodiments are described above, these embodiments are not intended to describe all possible forms encompassed by the claims. The words used in the specification are words of description rather than limitation, and it is understood that various changes may be made without departing from the spirit and scope of the disclosure. As previously mentioned, features of the various embodiments may be combined to form further embodiments of the invention, which may not be explicitly described or illustrated. While various embodiments may have been described as providing advantages or being superior to other embodiments or implementations of the prior art in terms of one or more desired characteristics, one of ordinary skill in the art will recognize that one or more features or characteristics may be weighted to achieve the desired overall system attributes, which may depend on the specific application and implementation. Such attributes may include, but are not limited to, cost, strength, durability, life cycle cost, marketability, appearance, packaging, size, applicability, weight, manufacturability, ease of assembly, and the like. Thus, to the extent that any embodiment is described as being less desirable in terms of one or more characteristics than other embodiments or prior art implementations, such embodiments are not outside the scope of this disclosure and may be desirable for a particular application.

Claims (20)

1. A system for performing at least one task under autonomous control of a component, the system comprising:
an image sensor configured to output an image of the component;
an actuator configured to sort the parts based on the detected defects in the parts;
a processor; and
a memory comprising instructions that when executed by a processor cause the processor to:
generating a source image segmentation mask having a foreground region and a background region on a source image stored in a memory using an image segmenter;
generating a target image segmentation mask having a foreground region and a background region on a target image stored in a memory using an image segmenter;
determining a source image foreground and a source image background of the source image based on the source image segmentation mask;
determining a target image foreground and a target image background of the target image based on the target image segmentation mask;
removing a target image foreground from the target image;
inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background;
updating training data of the machine learning model with the enhanced image;
Determining a defect in the component using a machine learning model having updated training data; and
an actuator is actuated to sort the parts based on the determined defects in the parts.
2. The system of claim 1, wherein the instructions, when executed by the processor, cause the processor to remove the target image foreground from the target image background by:
determining an average color of pixels in the background of the target image; and
the pixels in the foreground of the target image are changed to take on the average color.
3. A computer-implemented method for generating training data for a machine learning model, the computer-implemented method comprising:
selecting a source image from an image database;
selecting a target image from an image database;
generating a source image segmentation mask having a foreground region and a background region with a source image using an image segmenter;
generating a target image segmentation mask having a foreground region and a background region with the target image using an image segmenter;
determining a source image foreground and a source image background of the source image based on the source image segmentation mask;
determining a target image foreground and a target image background of the target image based on the target image segmentation mask;
Removing a target image foreground from the target image;
inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background; and
the training data of the machine learning model is updated with the enhanced image.
4. The computer-implemented method of claim 3, wherein the removing step comprises:
the color of at least one group of pixels in the foreground of the target image is changed.
5. The computer-implemented method of claim 3, wherein the removing step comprises:
determining an average color of pixels in the background of the target image; and
the pixels in the foreground of the target image are changed to take on the average color.
6. The computer-implemented method of claim 3, further comprising:
determining a color of a pixel in the source image at a location corresponding to a foreground of the source image; and is also provided with
Wherein the inserting step comprises assigning the color to a pixel in the target image at a position corresponding to the foreground of the target image.
7. The computer-implemented method of claim 3, wherein:
the source image segmentation mask and the target image segmentation mask are generated to have a first color in the respective foreground regions and a second color in the respective background regions.
8. The computer-implemented method of claim 3, further comprising outputting a trained machine learning model based on the updated training data.
9. The computer-implemented method of claim 8, further comprising:
receiving an input image of the component from the image sensor;
determining that a defect exists in the component using the trained machine learning model and the input image; and
the part is sorted based on the determined defects in the part.
10. The computer-implemented method of claim 3, wherein the source images are part of a group of source images having a common domain, and wherein the step of selecting the source images is based on probabilities such that the probability of selecting a given source image decreases as the number of images having the common domain increases.
11. The computer-implemented method of claim 3, wherein the image segmenter is an image segmentation machine learning model.
12. A system for training a machine learning model, the system comprising:
a computer readable storage medium configured to store computer executable instructions; and
one or more processors configured to execute computer-executable instructions, the computer-executable instructions comprising:
Generating a source image segmentation mask having a foreground region and a background region with a source image using an image segmenter;
generating a target image segmentation mask having a foreground region and a background region with the target image using an image segmenter;
determining a source image foreground and a source image background of the source image based on the source image segmentation mask;
determining a target image foreground and a target image background of the target image based on the target image segmentation mask;
removing a target image foreground from the target image;
inserting the source image foreground into the target image with the target image foreground removed to create an enhanced image having the source image foreground and the target image background; and
the training data of the machine learning model is updated with the enhanced image.
13. The system of claim 12, wherein the removing step comprises:
the color of at least one group of pixels in the foreground of the target image is changed.
14. The system of claim 12, wherein the removing step comprises:
determining an average color of pixels in the background of the target image; and
the pixels in the foreground of the target image are changed to take on the average color.
15. The system of claim 12, wherein the computer-executable instructions further comprise:
Determining a color of a pixel in the source image at a location corresponding to a foreground of the source image; and is also provided with
Wherein the inserting step comprises assigning the color to a pixel in the target image at a position corresponding to the foreground of the target image.
16. The system of claim 12, wherein:
the source image segmentation mask and the target image segmentation mask are generated to have a first color in the respective foreground regions and a second color in the respective background regions.
17. The system of claim 12, wherein the computer-executable instructions further comprise:
the trained machine learning model is output based on the updated training data.
18. The system of claim 17, wherein the computer-executable instructions further comprise:
receiving an input image of the component from an image source;
determining that a defect exists in the component using the trained machine learning model and the input image; and
a message is output indicating the presence of a defect in the component.
19. The system of claim 12, wherein the source images are part of a group of source images having a common domain, and wherein the step of selecting the source images is based on probabilities such that the probability of selecting a given source image decreases as the number of images having a common domain increases.
20. The system of claim 12, wherein the image segmenter is an image segmentation machine learning model.
CN202310376448.2A 2022-04-08 2023-04-10 Data enhancement for domain generalization Pending CN116894799A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17/716590 2022-04-08
US17/716,590 US20230326005A1 (en) 2022-04-08 2022-04-08 Data augmentation for domain generalization

Publications (1)

Publication Number Publication Date
CN116894799A true CN116894799A (en) 2023-10-17

Family

ID=88094139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310376448.2A Pending CN116894799A (en) 2022-04-08 2023-04-10 Data enhancement for domain generalization

Country Status (3)

Country Link
US (1) US20230326005A1 (en)
CN (1) CN116894799A (en)
DE (1) DE102023109072A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117952820A (en) * 2024-03-26 2024-04-30 杭州食方科技有限公司 Image augmentation method, apparatus, electronic device, and computer-readable medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2358098A (en) * 2000-01-06 2001-07-11 Sharp Kk Method of segmenting a pixelled image
GB2377333A (en) * 2001-07-07 2003-01-08 Sharp Kk Segmenting a pixellated image into foreground and background regions
US9269157B2 (en) * 2005-03-01 2016-02-23 Eyesmatch Ltd Methods for extracting objects from digital images and for performing color change on the object
KR20180127995A (en) * 2016-02-25 2018-11-30 코닝 인코포레이티드 Methods and apparatus for inspecting the edge surface of a moving glass web
US10074161B2 (en) * 2016-04-08 2018-09-11 Adobe Systems Incorporated Sky editing based on image composition
US10657638B2 (en) * 2017-04-28 2020-05-19 Mentor Graphics Corporation Wafer map pattern detection based on supervised machine learning
CN111164647B (en) * 2017-10-04 2024-05-03 谷歌有限责任公司 Estimating depth using a single camera
DE102017220954A1 (en) * 2017-11-23 2019-05-23 Robert Bosch Gmbh Method, device and computer program for determining an anomaly
US10937169B2 (en) * 2018-12-18 2021-03-02 Qualcomm Incorporated Motion-assisted image segmentation and object detection
US10949978B2 (en) * 2019-01-22 2021-03-16 Fyusion, Inc. Automatic background replacement for single-image and multi-view captures
CA3158287A1 (en) * 2019-11-12 2021-05-20 Brian Pugh Method and system for scene image modification
US11688074B2 (en) * 2020-09-30 2023-06-27 Nvidia Corporation Data augmentation including background modification for robust prediction using neural networks

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117952820A (en) * 2024-03-26 2024-04-30 杭州食方科技有限公司 Image augmentation method, apparatus, electronic device, and computer-readable medium

Also Published As

Publication number Publication date
DE102023109072A1 (en) 2023-10-12
US20230326005A1 (en) 2023-10-12

Similar Documents

Publication Publication Date Title
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
US10318848B2 (en) Methods for object localization and image classification
US8379994B2 (en) Digital image analysis utilizing multiple human labels
US20220100850A1 (en) Method and system for breaking backdoored classifiers through adversarial examples
US11551084B2 (en) System and method of robust active learning method using noisy labels and domain adaptation
CN116523823A (en) System and method for robust pseudo tag generation for semi-supervised object detection
CN115880560A (en) Image processing via an isotonic convolutional neural network
US20210224646A1 (en) Method for generating labeled data, in particular for training a neural network, by improving initial labels
KR102597787B1 (en) A system and method for multiscale deep equilibrium models
CN116258865A (en) Image quantization using machine learning
CN117592542A (en) Expert guided semi-supervised system and method with contrast penalty for machine learning models
CN116894799A (en) Data enhancement for domain generalization
JP2024045070A (en) Systems and methods for multi-teacher group-distillation for long-tail classification
CN118279638A (en) System and method for training machine learning models using teacher and student frameworks
JP2021197184A (en) Device and method for training and testing classifier
CN117789502A (en) System and method for distributed awareness target prediction for modular autonomous vehicle control
JP2024035192A (en) System and method for universal purification of input perturbation with denoised diffusion model
CN116523952A (en) Estimating 6D target pose using 2D and 3D point-by-point features
US20230100765A1 (en) Systems and methods for estimating input certainty for a neural network using generative modeling
US12079995B2 (en) System and method for a hybrid unsupervised semantic segmentation
US20230100132A1 (en) System and method for estimating perturbation norm for the spectrum of robustness
US20220101116A1 (en) Method and system for probably robust classification with detection of adversarial examples
CN115482428A (en) System and method for pre-setting robustness device for pre-training model aiming at adversarial attack
US20220101143A1 (en) Method and system for learning joint latent adversarial training
US20230303084A1 (en) Systems and methods for multi-modal data augmentation for perception tasks in autonomous driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication