CN113221916A

CN113221916A - Visual sorting method and device based on cross-domain rapid migration

Info

Publication number: CN113221916A
Application number: CN202110514446.6A
Authority: CN
Inventors: 张正; 赵书光; 卢光明
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2021-05-08
Filing date: 2021-05-08
Publication date: 2021-08-06
Anticipated expiration: 2041-05-08
Also published as: CN113221916B

Abstract

The invention discloses a visual sorting method and a device based on cross-domain rapid migration, which are based on the reduction of the difference degree between cross-domain features and the consideration of the limitation of single similarity measurement, and provide an algorithm based on weighted fusion of feature similarity measurement, use multiple weighted fusion of similarity measurement to comprehensively analyze and constrain the cross-domain features so as to reduce the feature difference and improve the accuracy and the working efficiency of a sorting model in the real domain; the countermeasure learning algorithm based on the attention mechanism is provided, on the basis of considering the global characteristics, the attention mechanism is used for emphasizing and reducing the difference of the local area in which the part is located, so that the characteristics of cross-domain invariance are learned, the gap between a simulation domain and a real domain is spanned, and the robustness of the model performance after the cross-domain is ensured.

Description

Visual sorting method and device based on cross-domain rapid migration

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a visual sorting method and device.

Background

The target sorting model is a core technology of intelligent manufacturing and flexible manufacturing development, realizes unmanned production of industrial processing, detection and assembly, and has wide application prospects in the fields of machining, automobile manufacturing, warehouse logistics, 3C and the like.

The current mainstream technology is to use a simulation technology to generate a large amount of simulation labeling data for training a visual sorting model, improve the robustness and the self-adaptive capacity of the visual sorting model, and solve the problems that labeling data is lacked, the performance is difficult to meet the real sorting scene, and the like. Considering the difference between the simulation data and the real data, even if the visual sorting model can be effectively predicted in the simulation environment, the performance of the model is still difficult to guarantee when the visual sorting model is directly used in the real environment.

Disclosure of Invention

The invention provides a visual sorting method and device based on cross-domain fast migration, aiming at solving the problems in the prior art, a domain randomization technology is adopted to increase the diversity of training data, the generalization capability of a sorting model in a real sorting environment is improved from a data level, on the other hand, a cross-domain fast migration technology based on characteristic distribution is researched, the performance of the sorting model is improved from a characteristic level, and the efficient and robust operation of the visual sorting model in a real sorting scene is realized.

In order to achieve the above object, an embodiment of the present invention provides a cross-domain fast migration-based visual sorting method, including:

acquiring parameter data of a simulation domain based on the learning domain randomization;

acquiring simulation domain data characteristics and real domain data characteristics;

designing an adaptation layer of a learning model of the simulation domain data aiming at different simulation domain data;

using a similarity measurement weighted fusion algorithm and an attention mechanism-based antagonistic learning algorithm to constrain features at the adaptation layer;

wherein a plurality of similarity measures between cross-domain features of an adaptation layer are calculated as losses based on a similarity measure weighted fusion algorithm, the losses acting on the adaptation layer, and,

an attention mechanism-based confrontation learning algorithm uses a confrontation network module to distinguish the fields to which the features belong, and calculates confrontation loss, wherein the loss enables cross-domain features to be indistinguishable, namely learning cross-domain invariant features;

and mapping the characteristics to corresponding task result spaces according to different tasks, and finally outputting task related results.

Further, acquiring parameter data of the simulation domain based on the learning domain randomization specifically includes: combining the guiding information of the real domain, solving the distribution p based on the reinforcement learning technology by taking the prior distribution p (z) as an initial condition for the parameter z of the simulation domain_φ(z), wherein φ represents a parameter for which the distribution can be learned;

further, the domain randomization randomizes sampling of the parameters of the simulation domain in two ways: visual parameters mainly comprise the number of objects in a scene, target texture, background color, illumination and randomized noise; kinetic parameters: mainly comprising the mass, the friction coefficient and the height of the object.

Further, the distribution p is solved based on the reinforcement learning technology_φ(z) the problem solved is represented as follows:

wherein the first term L_DR() Representing a design cost function, guaranteed distribution p_φ(z) to obtain diversified samples, the second term D () ensures the similarity between the predicted distribution and the reference distribution based on the distribution distance measure, and reduces the instability of the result, where α is … ….

Further, the loss function of the similarity metric weighted fusion algorithm is as follows:

wherein L is_task(f_s，f_a，f_c，x_s，x_t，_d) Task loss function, x, over the source domain_s，y_sData of the emulated domain and its label, x, respectively_tFor data in the real domain, the encoder, adaptation layer and decoder of the model are denoted as f_s，f_a，f_c，

Defining respective loss functions, λ, for domain adaptation for k different similarities_iCorresponding weights for different similarity measures.

Further, the countermeasure network module f_dFor distinguishing whether the data comes from the simulated domain or the real domain, let y_dFor the domain label, representing the domain to which the data belongs, let discriminator f_dHas a loss function of L_d(f_s，f_a，f_d，x_s，x_t，y_d) Then the overall loss function of the model is as follows:

L′_all(f_s，f_a，f_c，f_d，x_s，x_t，y_s，y_d)＝L_task(f_s，f_a，f_c，x_s，y_s)+λL_d(f_s，f_a，f_d，x_s，x_t，y_d)。

the embodiment of the invention also provides a visual sorting device based on cross-domain rapid migration, which comprises:

the input and feature distribution extraction module is used for acquiring parameter data of the simulation domain based on the domain randomization of learning; acquiring simulation domain data characteristics and real domain data characteristics;

the cross-domain fast migration module based on the characteristic distribution is used for designing an adaptation layer of a learning model of different simulation domain data; and constraining the features at the adaptation layer using a similarity metric based weighted fusion algorithm and an attention mechanism based antagonistic learning algorithm;

an attention mechanism-based confrontation learning algorithm uses a confrontation network to distinguish the domains to which the features belong, and calculates confrontation loss, wherein the loss enables cross-domain features to be indistinguishable, namely learning the cross-domain invariant features;

and the task-driven feature distribution fitting output module is used for mapping the features to corresponding task result spaces according to different tasks and finally outputting task related results.

Further, the input of the device is a simulation domain image and a real domain image, and the output is a part segmentation result.

Embodiments of the present invention further provide a computer program product, which includes computer program instructions, and when the instructions are executed by a processor, the computer program product is configured to implement the foregoing visual sorting method based on cross-domain fast migration.

Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the foregoing visual sorting method based on cross-domain fast migration is implemented.

The invention has the beneficial effects that: the invention starts from reducing the difference degree between cross-domain characteristics, considers the limitation of single similarity measurement, and provides a method based on weighted fusion of characteristic similarity measurement, uses weighted fusion of multiple similarity measurements to comprehensively analyze and constrain the cross-domain characteristics so as to reduce the characteristic difference and improve the accuracy and the working efficiency of a sorting model in the real domain; the method is characterized in that an attention mechanism-based antagonistic learning method is provided, on the basis of considering global characteristics, the attention mechanism is used for emphasizing and reducing the difference of the local region in which the part is located so as to learn the characteristics of cross-domain invariance, realize the gap between a cross-domain simulation domain and a real domain, and ensure the robustness of the model performance after the cross-domain.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic diagram of a cross-domain fast migration technique of the present invention;

FIG. 2 is an effect diagram of the cross-domain fast migration module of the present invention;

FIG. 3 is a flow chart of a cross-domain fast migration method of the present invention.

Detailed Description

To facilitate understanding and implementing the present invention for those skilled in the art, the following technical solutions of the present invention are described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention develops a set of complete model cross-domain fast migration technology based on deep learning: on one hand, the diversity of training data is increased by adopting a domain randomization technology, the generalization capability of the picking model in a real picking environment is improved from a data level, a domain randomization mechanism based on learning is further explored, and effective simulation data are provided for a real domain; on the other hand, a cross-domain fast migration technology based on feature distribution is researched, the performance of the picking model is improved from a feature level, and efficient and robust operation of the visual picking model in a real picking scene is realized.

Simulation technology based on domain randomization

By means of data acquisition on the simulation domain, acquisition/labeling of training samples is converted from a high-cost real domain to a low-cost simulation domain, and the problems of long system deployment period and high cost caused by difficulty in training data acquisition can be effectively solved. However, the data distribution of the data acquired in the simulation domain and the data distribution of the real domain are different, so that the performance of the trained model in the real domain is unstable, and the trained model cannot be directly used for testing in the real domain. The complexity of the sample in the simulation domain is increased through the domain randomization technology, and the generalization capability of the training model is improved, so that the performance of the model in the real domain is improved. The existing domain randomization technology randomizes visual parameters and dynamic parameters of a simulation environment to obtain complex and changeable labeling data. However, the method has certain defects, on one hand, the method ignores the difference between simulation domains, and causes the sample redundancy between the simulation domains; on the other hand, the lack of guided modeling of the emulated domain based on the real domain results in a negative migration effect of the emulated domain to the real domain.

The invention combines deep learning to establish a simulation domain visual parameter and dynamic parameter prediction model, researches a simulation domain parameter distribution prediction model guided by a real domain, emphatically solves the problems of information increment between simulation domains and the similarity between the simulation domains and the real domain, and adopts a learning-based mode to predict simulation environment parameters, thereby improving the simulation data acquisition effectiveness, preventing negative migration effect and providing a higher-quality and sufficient sample for the training of the model and the simulation to the real migration.

In order to eliminate the influence of environment-independent factors on the model performance, the domain randomization technology performs randomized sampling on parameters in the simulation platform from two aspects:

visual parameters: the method mainly comprises the steps of counting objects in a scene, target texture, background color, illumination, randomized noise and the like;

kinetic parameters: mainly including object mass, coefficient of friction, height, etc.

The current domain randomization method randomly samples parameters based on uniform distribution, and cannot selectively construct a simulation environment effective to a real domain, so that the calculated amount is large, and a model trained by the simulation domain is difficult to migrate to the real domain.

Aiming at the problem, the method combines the guiding information of a real domain, takes the prior distribution p (z) as an initial condition for a simulation parameter z, and solves the distribution p based on the reinforcement learning technology_φ(z) maximizing the differences among the plurality of simulation domains and the similarity between the simulation domains and the real domains, thereby constructing a less and more precise simulation data acquisition environment and generating effective data for the stable migration of the model from simulation to reality, wherein phi represents a parameter which can be learned by the distribution. The problem solved can be expressed as follows:

wherein the first term L_DR() Representing a design cost function, guaranteed distribution p_φ(z) a diversified sample can be finally obtained. The second term D () guarantees the similarity between the predicted distribution and the reference distribution based on the distribution distance metric, reducing the instability of the result. Alpha is a balance hyperparameter that balances the degree of importance between the two metrics.

Cross-domain fast migration technology based on feature distribution

In the process of cross-domain rapid migration of a visual sorting model, the challenging problem is how to reduce the cross-domain characteristic difference between a simulation domain and a real domain, and ensure that the migrated model is efficiently adapted to the data of the real domain, so that the robust knowledge migration of the model from the simulation domain to the real domain is realized.

The invention develops research aiming at the problem of how to reduce the cross-domain feature difference in the migration process from the simulation domain to the real domain, provides an antagonistic learning method based on similarity measurement weighted fusion and an attention mechanism, constructs a cross-domain fast migration module based on feature distribution, and realizes fast and effective knowledge migration from the simulation domain to the real domain. Taking the image segmentation problem based on the encoding-decoding depth network as an example, a schematic diagram thereof is shown in fig. 1. The image segmentation depth learning model on the existing single domain is improved, different adaptive layers are designed aiming at the migration task and different model structures, and the adaptive layers are embedded into the single domain visual segmentation model. It is very critical to select a proper similarity measure for learning the cross-domain invariant discriminant features in the adaptation layer. Because the single similarity measurement has limited analysis capability, the invention aims to perform weighted fusion on multiple similarity measurements and perform comprehensive constraint on features to realize the learning of distinguishing effective data representation.

Let x_s，y_sData of the emulated domain and its label, x, respectively_tIs data of a real domain. Taking part segmentation as an example, as shown in FIG. 1, the encoder, adaptation layer, and decoder of the model are respectively denoted as f_s，f_a，J_c. Assume that the task loss function on the source domain is defined as L_task(f_s，f_a，f_c，x_s，y_s) The loss function is used for effectively improving the accuracy of the corresponding task; d_iIs the i-th similarity measure, d ═ d₁，d₂，...，d_kIs the set of all similarity metrics; meanwhile, corresponding loss functions are defined aiming at domain adaptation with k different similarities

The weight corresponding to different similarity measures is lambda _i1, 2.., k. By using the weighted fusion method, the comprehensive advantages of various similarity measures are fully exerted, and then various features are comprehensively constrained to improve the accurate similarity calculation of the cross-domain features. The loss of the model is defined as follows:

in addition, the invention provides a counterstudy method based on an attention mechanism to reduce the difference of cross-domain features, so that the model is more robust to the study of tasks in the cross-domain migration process. The basic idea of learning cross-domain invariant features by using a strategy based on countermeasure learning is to enhance the consistency and the discriminability of the multi-domain adaptation layer features by using a countermeasure game mode, and a discriminator based on the countermeasure learning theory is designed by taking a feature learning or encoder part of an original model as a generator and is used for discriminating the domain to which a sample of the features belongs. Since the target of picking is a part, attention should be paid to the part area in the simulated domain and real domain images. Therefore, the discriminator with attention mechanism is designed, and on the basis of considering the global characteristics, the local characteristics of the position of the part are intensively analyzed in an all-around way, so that the difference of cross-domain characteristics is reduced, and the robust performance of transferring the simulation domain model to the real domain is ensured. Compared with the method based on feature similarity measurement weighted fusion, the method adds an attention mechanism countermeasure network module on the basis of the model shown in FIG. 1. Let the module be f_dFor discriminating dataFrom an emulated or real domain, let y_dFor the domain label, representing the domain to which the data belongs, let discriminator f_dHas a loss function of L_d(f_s，f_a，f_d，x_s，x_t，y_d) Then the overall loss function of the model is as follows:

L′_all(f_s，f_a，f_c，f_d，x_s，x_t，y_s，y_d)＝L_task(f_s，f_a，f_c，x_s，y_s)+λL_d(f_s，f_a，f_d，x_s，x_t，y_d) (3)

cross-domain fast migration effect of sorting system

The main function of the cross-domain fast migration technology is to realize cross-domain fast migration of the visual picking system from the simulation domain to the real domain, so that the simulation domain model can obtain good performance on real domain data. The basic idea is shown in fig. 2.

Specifically, the basic effect of the cross-domain fast migration module based on feature distribution is shown by taking part segmentation as an example. The overall learning framework includes three modules: the system comprises an input and feature distribution extraction module, a cross-domain fast migration module based on feature distribution and a task-driven feature distribution fitting output module. The whole learning framework takes the images of the simulation domain and the real domain as input and outputs the input and the output as a part segmentation result.

Since the difference between the data of the simulation domain and the data of the real domain inevitably leads to the inconsistency of the feature distribution, how to effectively span the features of the simulation domain and the real domain has a simulation-real gap which becomes the core task of the step. The cross-domain fast migration module learns the characteristics of different domain data at the same time, and the difference between the characteristics of different domains is reduced through a countermeasure learning method based on similarity measurement weighted fusion and an attention mechanism, so that data representation with cross-domain invariance is learned, and high-performance cross-domain efficient model migration based on domain self-adaptive learning is achieved.

Method and process for cross-domain fast migration

A flow chart of the method for cross-domain fast migration is shown in fig. 3. Firstly, acquiring simulation high-efficiency data based on learning domain randomization; and then learning the cross-domain visual analysis features by using a simulation domain feature learning module and a real domain feature learning module respectively. Due to the variability of cross-domain data, it is inevitable to cause variability of cross-domain feature distribution. In order to conveniently constrain the cross-domain feature distribution similarity, feature adaptation layers are designed for different tasks, and the simulation domain data features and the real domain data features are used for learning. In order to improve the similarity of cross-domain feature distribution and learn cross-domain invariant features, the invention uses a counterstudy method based on similarity measurement weighted fusion and an attention mechanism to constrain the features in an adaptation layer. The method based on similarity measure weighted fusion calculates multiple similarity measures among cross-domain features of the adaptation layer as losses, and the losses act on the feature adaptation layer to enable the feature adaptation layer to learn more similar cross-domain features. The countermeasure learning method based on the attention mechanism uses a countermeasure network module to distinguish the domain to which the features belong, and calculates the countermeasure loss, which makes the cross-domain features indistinguishable, i.e., learns the cross-domain invariant features. Under the constraints of the above two methods, the feature adaptation layer is enabled to learn the cross-domain invariant features. The characteristics are input into a characteristic distribution fitting output module driven by the tasks, the characteristics are mapped to corresponding task result spaces according to different tasks, and task related results are finally output, so that the rapid and effective cross-domain migration of the model on the task level is realized.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods, apparatus, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart and block diagrams may represent a module, segment, or portion of code, which comprises one or more computer-executable instructions for implementing the logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. It will also be noted that each block of the block diagrams and flowchart illustrations, and combinations of blocks in the block diagrams and flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention, and is provided by way of illustration only and not limitation. It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made without departing from the spirit and scope of the invention.

Claims

1. A cross-domain fast migration based visual sorting method is characterized by comprising the following steps:

2. The method of claim 1, wherein: the learning-based domain randomization collects parameter data of a simulation domain, and specifically includes: combining the guiding information of the real domain, solving the distribution p based on the reinforcement learning technology by taking the prior distribution p (z) as an initial condition for the parameter z of the simulation domain_φ(z), where φ represents a parameter for which the distribution can be learned.

3. The method of claim 2, wherein: the domain randomization randomizes sampling of the parameters of the simulation domain from two aspects: visual parameters mainly comprise the number of objects in a scene, target texture, background color, illumination and randomized noise; kinetic parameters: mainly comprising the mass, the friction coefficient and the height of the object.

4. The method of claim 2, wherein: solving distribution p based on reinforcement learning technique_φ(z) the problem solved is represented as follows:

wherein the first term L_DR() Representing a design cost function, guaranteed distribution p_φ(z) to obtain diversified samples, the second term D () is based on the distribution distance metric, to ensure the similarity between the predicted distribution and the reference distribution, and to reduce the instability of the result, α is a balance hyperparameter, which is used to balance the importance between the two metrics.

5. The method of claim 1, wherein: the loss function based on the similarity measurement weighting fusion algorithm is as follows:

wherein L is_task(f_s，f_a，f_c，x_s，x_tD) task loss function on the source domain, x_s，y_sData of the emulated domain and its label, x, respectively_tFor data in the real domain, the encoder, adaptation layer and decoder of the model are denoted as f_s，f_a，f_c，

6. The method of claim 5, wherein: said countermeasure network module f_dFor distinguishing whether the data comes from the simulated domain or the real domain, let y_dFor the domain label, representing the domain to which the data belongs, let discriminator f_dHas a loss function of L_d(f_s，f_a，f_d，x_s，x_t，y_d) Then the overall loss function of the model is as follows:

7. a visual sorting device based on cross-domain fast migration is characterized in that: the device comprises:

8. The apparatus of claim 7, wherein: the input of the device is simulation domain images and real domain images, and the output is a part segmentation result.

9. A computer program product comprising computer program instructions for implementing the method of any one of claims 1-6 when executed by a processor.

10. A computer-readable storage medium, on which a computer program is stored which, when executed, implements the method of any of claims 1-6.