CN115984662A

CN115984662A - Multi-mode data pre-training and recognition method, device, equipment and medium

Info

Publication number: CN115984662A
Application number: CN202310272537.2A
Authority: CN
Inventors: 罗亮; 林珠; 李海威; 马志平; 冯秩华
Original assignee: Guangdong Science & Technology Infrastructure Center
Current assignee: Guangdong Science & Technology Infrastructure Center
Priority date: 2023-03-21
Filing date: 2023-03-21
Publication date: 2023-04-18
Anticipated expiration: 2043-03-21
Also published as: CN115984662B

Abstract

The invention discloses a method, a device, equipment and a medium for multi-mode data pre-training and recognition, wherein a defect scene rule database is constructed by performing multi-source heterogeneous data fusion on acquired defect basic data; extracting defect type information, characteristic information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database; constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting a characteristic vector obtained by coding sample data of various defects, performing matching training of data and rules, and generating a modal identification model; and identifying the defects of the sample to be detected according to the modal identification model. The product defect detection accuracy and the model robustness can be improved.

Description

Multi-mode data pre-training and recognition method, device, equipment and medium

Technical Field

The invention relates to the field of image recognition, in particular to a method, a device, equipment and a medium for multi-mode data pre-training and recognition.

Background

With the rapid development of precision manufacturing industry, the loss caused by surface defects of high-precision instruments reaches the billion yuan level every year, and the requirement of high-precision defect detection of industrial products is increasingly strong. Especially, the industrial production environment has highly complex conditions such as noise, shielding, vibration, dim light and the like, so that the defect detection has to meet the requirements of intellectualization, high precision, long time and high efficiency.

Although the defect accuracy rate is improved to a certain extent by applying the deep learning algorithm at the present stage, the defect samples are small and unbalanced in the existing high-precision defect detection process, and are easily influenced by environments such as shielding, oxidation and vibration, so that the problems of low product defect detection accuracy rate and weak model robustness exist.

Disclosure of Invention

In order to solve the technical problems, the invention provides a method, a device, equipment and a medium for multi-mode data pre-training and recognition, which improve the product defect detection accuracy and the robustness of a model.

The embodiment of the invention provides a multi-mode data pre-training and recognition method, which comprises the following steps:

performing multi-source heterogeneous data fusion on acquired defect basic data, and constructing a defect scene rule database;

extracting defect type information, characteristic information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database;

constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, performing matching training of data and rules, and generating a modal identification model;

and identifying the defects of the sample to be detected according to the modal identification model.

Further, the multi-source heterogeneous data fusion is performed on the acquired defect basic data, and a defect scene rule database is constructed, specifically including:

performing multi-source heterogeneous data fusion on defect basic data consisting of historical experience data, common rule data and defect standard data to form a defect scene rule database which is associated with defect types, positions and scales;

the defect scene rules database includes: a surface defect data set, a defect rule data set, a detection system data set, and a process scene data set.

As an improvement of the above solution, the surface defect data set D1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectrum data ];

the defect rule data set D2= [ defect rule ID, detection object type, defect classification statistical data, damage mechanism data, defect cause rule, defect grade ];

the detection system data set D3= [ detection system ID, device type, production line design data, technology type ];

the process scene data set D4= [ process scene data ID, detection object type, scene factor, production process ];

the defect geometry includes: point-line-surface defects, boundaries, bones, shapes, positions, sizes, stretches, and translates;

the spatial distribution data includes: entropy, contrast, consistency and correlation;

the defect statistical data comprise gray level co-occurrence matrixes, autocorrelation coefficients, mathematical morphology, histogram statistical characteristics, fractal values and defect frequency spectrum subsets;

the histogram statistics include range, mean, geometric mean, harmonic mean, standard deviation, variance, and median.

The fractal values comprise stretching and translation fractal dimension and porosity;

the defect frequency spectrum subset comprises a texture frequency spectrum, a taint frequency spectrum and a sawtooth frequency spectrum;

the defect classification statistical data is specifically a fault mode of automatic defect division;

the defect level comprises the detection object type;

the detection object types comprise semiconductors, circuit boards, wafers, fabrics, metal surfaces and woods;

the scene factors comprise operation scale and equipment type selection;

the production process comprises the steps of blank making, grinding, rolling, shearing, bundling and finished product forming.

Preferably, the extracting the defect type information, the feature information, and the scene information from the defect scene rule database, performing data association, and extracting the scene factor of the defect scene rule database specifically includes:

extracting defect type information from the surface defect dataset, extracting feature information from the surface defect dataset and the defect rule dataset, and extracting scene information from the inspection system dataset and the process scene dataset;

for the defect Z, constructing a layered matrix Z multiplied by T multiplied by R according to the extracted defect type information, the extracted characteristic information and the extracted scene information;

for defect-feature associated information, a first extraction factor a is adopted _ij Mapping and extracting from the matrix Z multiplied by T to obtain a front defect scene factor

Based on all preceding extracted defect scene factors &>

Preceding scene factor->

；

For the characteristic-scene associated information, a second extraction factor b is adopted _ij Mapping from a matrix T RShooting and extracting to obtain the background defect scene factor

Based on all the post-defect scene factors extracted ≥>

Forming a post-term scene factor->

；

Determining the scene factor according to the extracted antecedent scene factor and consequent scene factor;

wherein the content of the first and second substances,

，T/>

，/>

n is the number of defect classes, j is the eigenvector dimension, Z _i ^j Being the value of an element in the defect matrix, T _i ^j Is the value of an element in the feature information matrix, R _i ^j I =1,2, ... (n) is the value of an element in the scene information matrix; />

，/>

，/>

Then>

=0，/>

Then is greater or less>

；

；/>

，/>

，/>

Then is greater or less>

=0，/>

Then it is

；/>

。/>

Preferably, the constructing a self-coding network structure model carrying defect scene information, merging the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, performing matching training of data and rules, and generating a modal identification model specifically includes:

applying the former scene factor in the scene factors to an encoder of the self-encoding network structure model to extract effective characteristics;

applying the latter scene factor in the scene factors to a decoder of the self-coding network structure model to generate rules;

inputting a characteristic vector W coded by sample data of various defects, introducing a scene factor in the structure of a basic operation block during superposition by using the thought of a residual error network for reference, so that the scene factor is hidden in a hierarchical structure in the stack of the self-coding network structure model, and decoding and outputting to obtain a scene rule output [ type, characteristic and scene ];

outputting the scene rules through a semi-supervised stacking self-encoder, adding a classifier in a decoding stage to realize a classification function, optimizing the self-encoding network structure model classifier through matching training of data and rules, and generating the modal identification model.

As a preferred scheme, the objective function of the self-coding network structure model is specifically:

；

the loss function of the self-coding network structure model is specifically as follows:

；

wherein V (G, D) is the whole defined objective function, N is the number of the original labels,

the probability P which represents that the defect sample x is the original label in the output data (x) after passing through the self-coding network is judged as being greater than or equal to>

Representing the probability P of the original label in the output data z (x) after the sample x carrying the defect knowledge passes through a self-coding network; d (X) is a conditional probability calculation function, G (z) is the probability of comparing the output information y in the applied classification category data under the condition of the category model G (z); />

Represents whether or not there is a->

Class defects; a. b, w, h and c are composition variables of each grid during defect detection, a and b are points at the lower left corner of the grid, w and h are width and height of the grid, c is grid confidence coefficient, and>

representing the coordinate loss of the defect bounding box by calculating the mean square error from the position informationLosing; />

Representing the size loss of the defect bounding box by the calculated absolute mean square error of the size information; />

Indicates whether the user belongs to by judging

The defect type calculates the confidence loss.

Preferably, the scene rule output is further trained by a hidden layer of a stacked self-encoder, and the defect scene rule is continuously generated and updated and supplemented into the defect scene rule database.

The embodiment of the invention also provides a device for pre-training and recognizing the multi-modal data, which comprises:

the database construction module is used for carrying out multi-source heterogeneous data fusion on the acquired defect basic data and constructing a defect scene rule database;

the scene factor extraction module is used for extracting defect type information, characteristic information and scene information from the defect scene rule database, performing data association and extracting scene factors of the defect scene rule database;

the model generation module is used for constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting a characteristic vector obtained by coding sample data of various defects, performing matching training of data and rules and generating a modal identification model;

and the defect identification module is used for identifying the defects of the sample to be detected according to the modal identification model.

Preferably, the database construction module is specifically configured to:

Further, the surface defect data set D1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectrum data ];

the histogram statistical features include range, mean, geometric mean, harmonic mean, standard deviation; median value

The fractal values include stretching, translating fractal dimension and porosity;

the defect classification statistical data are fault modes of automatic defect division;

the defect grade comprises the detection object type;

the scene factors comprise operation scale and equipment type selection;

the production process comprises blank making, grinding, rolling, shearing, bundling and finished product forming.

Preferably, the scene factor extraction module is specifically configured to:

for defect-feature associated information, a first extraction factor a is adopted _ij Mapping and extracting the matrix Z multiplied by T to obtain the scene factor of the previous defect

Based on all previous defect scene factors extracted>

Forming a preceding scene factor->

；

For the characteristic-scene associated information, a second extraction factor b is adopted _ij Mapping and extracting from the matrix T multiplied by R to obtain the background factor of the defect

Based on all post defect scene factors extracted &>

Forming a post-term scene factor->

；

Determining the scene factor according to the extracted antecedent scene factor and the extracted consequent scene factor;

wherein the content of the first and second substances,

，T/>

，/>

n is the number of defect classes, j is the eigenvector dimension, Z _i ^j Being the value of an element in the defect matrix, T _i ^j Is the value of an element in the feature information matrix, R _i ^j I =1,2, \8230nfor element values in the scene information matrix; />

，/>

，/>

Then is greater or less>

=0，/>

Then is greater or less>

；

；/>

，/>

，/>

Then>

=0，/>

Then it is

；/>

。

Preferably, the model generation module is specifically configured to:

applying the antecedent scene factors in the scene factors to an encoder of the self-encoding network structure model to extract effective features;

Preferably, the objective function of the self-coding network structure model is specifically:

；

；

the defect sample x is represented as the original in the output data (x) after passing through the self-coding networkProbability P,. Of the associated tag>

Represents whether or not there is a->

Class defects; a. b, w, h and c are constituent variables of each grid during defect detection, a and b are points at the lower left corner of the grid, w and h are the width and height of the grid, c is the confidence of the grid, and/or the value of the grid>

Representing that the mean square error is calculated through the position information to represent the coordinate loss of the defect bounding box; />

Indicates whether the user belongs to the system by judging

The defect type calculates the confidence loss.

Further, the scene rule output is continuously generated and updated by hidden layer training of the stacked self-encoder, and is supplemented to the defect scene rule database.

The invention also provides a computer-readable storage medium, which includes a stored computer program, wherein when the computer program runs, the device where the computer-readable storage medium is located is controlled to execute the multimodal data pre-training and recognition method as described in any one of the above embodiments.

The invention further provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to implement the multimodal data pre-training and recognition method as described in any one of the above embodiments.

The invention provides a multi-mode data pre-training and recognition method, a device, equipment and a medium, wherein a defect scene rule database is constructed by performing multi-source heterogeneous data fusion on acquired defect basic data; extracting defect type information, characteristic information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database; constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, performing matching training of data and rules, and generating a modal identification model; and identifying the defects of the sample to be detected according to the modal identification model. The product defect detection accuracy and the model robustness can be improved.

Drawings

FIG. 1 is a schematic flow chart of a multi-modal data pre-training and recognition method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for pre-training and recognizing multi-modal data according to another embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a multi-modal data pre-training and recognition apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a multi-modal data pre-training and recognition method, and relates to fig. 1, which is a flow diagram of the multi-modal data pre-training and recognition method provided by the embodiment of the invention, wherein the method comprises the following steps of S1-S4:

s1, performing multi-source heterogeneous data fusion on acquired defect basic data to construct a defect scene rule database;

s2, extracting defect type information, characteristic information and scene information from the defect scene rule database, performing data association, and extracting scene factors of the defect scene rule database;

s3, constructing a self-coding network structure model carrying defect scene information, integrating the scene factors into the self-coding network structure model, inputting feature vectors obtained by coding sample data of various defects, performing matching training of data and rules, and generating a modal identification model;

and S4, identifying the defects of the sample to be detected according to the modal identification model.

When the method is implemented specifically, defect basic data is collected, the defect basic data is specifically historical defect data of a sample to be detected, multi-source heterogeneous data for defect detection is fused, and a basic defect scene rule database containing information such as static defect representation, dynamic defect evolution, defect classification and defect-scene rules is constructed through the fusion of the multi-source heterogeneous data;

refining scene factors according to a defect scene rule database, wherein the scene factors jointly construct a three-dimensional vector matrix containing defect type information, characteristic information and scene information, and the matrix constraint is applied to force a self-encoder to consider which parts of input data need to be optimized and copied and which parts need to be discarded, so that the self-encoder can learn the effective characteristics of the data and discard irrelevant characteristics, thereby generating more defect scene rules, performing data association and extracting the scene factors of the defect scene rule database;

the method comprises the steps of researching the construction of a scene rule knowledge base based on a semi-supervised self-coding network, designing a stacking self-coding network structure carrying defect scene information, introducing scene factors, enabling the scene factors to be hidden in a hierarchical structure in the stacking of the self-coding network, inputting a characteristic vector obtained by encoding sample data of various defects, performing matching training of data and rules, and generating a modal identification model;

and identifying the defects of the sample according to the generated modal identification model.

According to the method, under the conditions of low defect sampling rate and unbalanced samples, a production process scene is combined, material characteristics, manufacturing process data and high-resolution defect image sub-pixel characteristics are fused, a scene rule knowledge base is constructed through sample generation based on material process data, high-resolution defect image sub-pixel characteristic coding and deep learning classification methods, a self-coding network can well process various mapping relations in small sample defect data, and feature coding and knowledge modeling are performed, so that the core problems of low calculation efficiency, difficult defect origin tracing and the like caused by the fact that defect identification and classification are difficult and the robustness is weak, the capacity of an image to be detected is large and a deep learning method is used under the complex backgrounds of shielding, oxidation, vibration and the like in the defect detection process can be solved.

In another embodiment provided by the present invention, the step S1 specifically includes:

When the embodiment is implemented specifically, the sources of the defect basic data include historical experience data, common rule data and defect standard data, and the historical experience data is specifically historical data of judgment of an expert on the defect;

the defects of common industrial products are mainly as follows: the method is characterized in that the method comprises the following steps of detecting defects such as lines, scratches, oil stains, points, shadows, textures, sawteeth and the like, representing the defects in an image in another form during defect detection, combining the common data representation condition of the defect image, and combining the characteristics of business activities with scene analysis, wherein the links of detected industrial products in business belong to, and the defects have important influence on scene judgment formed by defect detection. Finally, forming a defect scene rule database which is associated with the defect scene, the defect type, the defect position and the defect scale through association of each data set;

the defect scene rule database comprises a surface defect data set, a defect rule data set, a detection system data set and a process scene data set.

Accurate defect identification is realized by classifying and associating complex backgrounds such as shielding, oxidation, vibration and the like in the defect detection process of the micron-sized visual image.

In yet another embodiment provided by the present invention, the surface defect data set D1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectral data ];

the defect geometry comprises: point-line-surface defects, boundaries, bones, shapes, positions, sizes, stretches, and translates;

the histogram statistical features include range, mean, geometric mean, harmonic mean, standard deviation, variance, and median

the defect grade comprises the detection object type;

the scene factors comprise operation scale and equipment type selection;

In the embodiment, the surface defect data set specifically includes defect geometric features (point-line-surface defect, boundary, bone, shape, location, size, stretch, translation), spatial distribution data (entropy, contrast, consistency, and correlation), defect statistics (gray level co-occurrence matrix, autocorrelation coefficient, mathematical morphology, histogram statistics (range, mean, geometric mean, harmonic mean, standard deviation, variance, and median), and fractal values (stretch, translated fractal dimension, and porosity)), defect spectrum data (texture spectrum, blur spectrum, and sawtooth spectrum).

The entropy is used for reflecting the randomness of the image reflection pixels, and the larger the entropy is, the coarser the entropy is; contrast refers to the average difference of brightness and darkness of the defect scene image; consistency refers to the degree of consistency of the quantitative angle in the batch of images; correlation refers to the degree of correlation of the acquired image with the detected scene. In general, these specific data sets, which are actually the detection data sets of the image data, are classified from different angles to form different subsets, so as to facilitate the processing and recognition of the image.

The defect rule data set includes defect classification statistics (defects are automatically classified into corresponding failure modes), damage mechanism data, defect cause rules, and defect classes (inspection object types (semiconductor, circuit board, wafer, fabric, metal surface, wood, etc.)). The detection system data set comprises equipment types, production line design data and technology model selection;

the process scene data includes inspection object type (semiconductor, circuit board, wafer, fabric, metal surface, wood, etc.), scene factor (operation scale, equipment selection), production process (blank making, grinding, rolling, cutting, bundling, finished product, etc.).

Respectively expressing a surface defect data set, a defect rule data set, a detection system data set and a process scene data set in a data set form as follows:

a surface defect data set D1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectral data ];

defect geometry subset = [ surface defect ID, defect geometry ID, point, line, plane defect, boundary, bone, shape, position, size, stretch, translation ];

spatial distribution subset = [ surface defect ID, spatial distribution ID, entropy, contrast, consistency, correlation ];

defect statistics subset = [ surface defect ID, defect statistics ID, gray level co-occurrence matrix, autocorrelation coefficient, mathematical morphology, histogram statistical characteristics, fractal value ];

the defect statistical subset refers to data values obtained by statistically calculating defect data. Although the defect characteristics are not directly described, the statistical data of the distribution of the characteristics are mastered, and the method is favorable for analyzing the relationship between the defect types and the common characteristics. This is intersected in the D2 data set, i.e. these statistics will eventually be associated with defect rules, which make it easier to form defect scene rules.

Histogram statistical feature subset = [ surface defect ID, defect statistics ID, histogram statistics ID, range, mean, geometric mean, harmonic mean, standard deviation, variance, and median ];

fractal value subset = [ surface defect ID, defect statistics ID, fractal value ID, fractal dimension for stretching, translation, and porosity characteristics ];

the fractal value can reflect the stretching and deformation degree of defects, and the integral stretching of accessories is often caused by improper application of process level in the manufacturing process of products, so that industrial gap defects and the like are caused.

Defect spectrum subset = [ surface defect ID, defect spectrum ID, texture spectrum, smear spectrum, sawtooth spectrum ];

the defect frequency spectrum does refer to the frequency spectrum characteristics exhibited by the defect image, but the frequency spectrum characteristics formed by texture, stain and sawtooth are different, and the data set is the frequency spectrum characteristics of the defect image such as good texture, stain and sawtooth collected in the image defect process.

the equipment type refers to missing equipment, and the detection object type refers to a detected object, such as PCB detection, steel detection, chip detection, mobile phone accessory detection and the like. Different detection objects have different detection scenes.

The detection system data set D3= [ detection system ID, device type, production line design data, technology model selection ];

the process scenario data set D4= [ process scenario data ID, detection object type, scenario factor, production procedure ].

In another embodiment provided by the present invention, the step S2 specifically includes:

for the defect-feature associated information, a first extraction factor a is adopted _ij Mapping from a matrix of Z TExtracting to obtain the scene factor of the previous defect

Based on all previous defect scene factors extracted>

Forming a preceding scene factor->

；

Based on all the post-defect scene factors extracted ≥>

Forming a post-term scene factor->

；

wherein the content of the first and second substances,

，T/>

，/>

n is the number of defect classes, j is the eigenvector dimension, Z _i ^j Being the value of an element in the defect matrix, T _i ^j For the value of an element in the characteristic information matrix, R _i ^j I =1,2, \8230nfor element values in the scene information matrix; />

，/>

，/>

Then is greater or less>

=0，/>

Then>

；

；/>

，/>

，/>

Then>

=0，/>

Then it is

；/>

。

When the method is implemented specifically, scene factors are extracted according to a basic knowledge base, the scene factors are constructed into a three-dimensional vector matrix containing types, characteristics and scenes together, and the matrix constraint is applied to force the self-encoder to consider which parts of input data need to be optimized and copied and which parts need to be discarded, so that the self-encoder can learn the effective characteristics of the data and discard irrelevant characteristics, and further more defect scene rules are generated.

And finally forming a three-dimensional vector matrix containing type information, characteristic information and scene information after carrying out data cleaning, data association and conversion on the defect scene rule database.

Extracting defect type information from the surface defect data set D1; extracting feature information from a surface defect data set D1 and the defect rule data set D2; extracting scene information from the detection system dataset D3 and the process scene dataset D4;

for defect Z, can be expressed as

For the characteristic information, it can be expressed as T->

For scene information, may be expressed as ≧>

Finally, a Z × T × R hierarchical matrix is formed.

Wherein n is the number of defect categories, j is the dimension of a feature vector, and j is the dimension of a vector, a sample or a feature vector; for example, for the defect Z, the surface defect data set D1 and the defect rule data set D2 represent feature information, and if the sum of the fields of the surface defect data set D1 and the defect rule data set D2 is 11, j represents 1 to 11;

Z _i ^j being the value of an element in the defect matrix, T _i ^j For the value of an element in the characteristic information matrix, R _i ^j I =1,2, ... (n) is the value of an element in the scene information matrix;

for the defect-feature associated information, extracting mapping information from Z multiplied by T, and adopting a first extraction factor from defects to features

Extracting the previous defect scene factor->

；/>

Wherein, the first and the second end of the pipe are connected with each other,

is a staged representation of the symbol, which is used in the calculation process, is based on>

Then is greater or less>

=0，/>

When it is, then

；

According to the extracted antecedent defect scene factor

The formed antecedent scene factor->

；

For the characteristic-scene associated information, extracting mapping information from T multiplied by R and adopting a second extraction factor from defects to characteristics

And the obtained next defect scene factor is extracted>

；

Wherein the content of the first and second substances,

Then is greater or less>

=0，/>

Then it is

；

According to the extracted antecedent defect scene factor

The formed antecedent scene factor->

；

Scene factor = [ antecedent scene factor, postcedent scene factor ].

Antecedent scene factor representation: the information of the defect feature correlation is used for guiding effective feature extraction before the encoder, so that the noise of the sample is reduced;

the background scene factor represents: the information when the features are associated with the scene can guide the rule generation and filter invalid rules after being used for a decoder and before the rule generation.

In another embodiment provided by the present invention, the step S3 specifically includes applying a previous scene factor in the scene factors to an encoder of the self-coding network structure model to perform effective feature extraction;

In the specific implementation of the present embodiment, referring to fig. 2, a schematic flow chart of a multi-modal data pre-training and recognition method according to another embodiment of the present invention is shown;

in fig. 2, a scene rule knowledge base construction based on a semi-supervised self-coding network is studied, and a stacked self-coding network structure carrying defect scene information is designed;

applying the antecedent scene factors containing defects and characteristics in the scene factors to an encoder of the self-encoding network structure model to extract effective characteristics; applying the background scene factors including the features and the scenes in the scene factors to a decoder of the self-coding network structure model, and generating rules to make the scene factors hidden in a hierarchical structure in the stacking of the self-coding network, and adding coding structures and various classification feature information after stacking the self-coding network so that the constructed model has the functions of modal identification and scene prejudgment;

firstly, in a stacked self-coding network, an encoder and a decoder are in a symmetrical structural model, and the basic operation block structure of the network is designed in the coding network. By taking the thought of a residual error network as a reference, in the structure of a basic operation block, a scene factor is introduced during superposition, so that the scene factor is hidden in a hierarchical structure in the stacking of a self-coding network;

inputting a characteristic vector W consisting of sample data W1-Wi after data preprocessing is carried out on input sample data X1-Xi into a self-coding network structure model, introducing a scene factor in the structure of a basic operation block during superposition by using the thought of a residual error network, so that the scene factor is hidden in a hierarchical structure in the stack of the self-coding network structure model, and decoding and outputting to obtain a scene rule output [ type, characteristic and scene ];

A mode identification and scene prejudgment method based on a semi-supervised self-coding network is used for constructing a basic defect scene knowledge base containing information such as static defect representation, dynamic defect evolution, defect classification and defect-scene rules through multi-source heterogeneous data fusion. Then, based on a self-coding network, introducing scene factors to be fused into a stacked self-coding network, coding a certain type of data sample to obtain a characteristic vector through learning of the data sample, learning mapping from a certain type of image space to a potential space, generating characteristic models of various types, positions and degrees, and performing matching training of data and rules; by constructing and applying the defect scene knowledge base, the defect detection model has the function of scene prejudgment, can promote the cause generated by the defect detection model according to the defect information, and is helpful for the production line design and process optimization of industrial defect products.

In another embodiment provided by the present invention, the objective function of the self-coding network structure model is specifically:

；

；

the probability P of the original label in the output data (x) of the defect sample x after passing through the self-coding network is represented, and the value is combined with the value of the original label in the data (x)>

Representing the presence or absence of ^ in an image>

Representing that the mean square error is calculated through the position information to represent the coordinate loss of the defect boundary box; />

Indicates whether the user belongs to the system by judging

The defect type calculates the confidence loss.

In the specific implementation of this embodiment, the objective function designed when the self-coding network structure model with defect scene information designed in this patent is applied to classification and identification is:

；

where V (G, D) is the defined overall objective function, which is calculated at the angle of maximum contribution, and is a conditional probability calculation function that yields the improvement D (X) to the countermeasure network equation, which is divided into three parts, the first part: reflecting the objective function calculation of the encoding stage, wherein the calculation of the stage and the whole function calculation are pursued to be as large as possible so as to obtain the most representative characteristic information; the second part is a decoding stage, the output calculation value of the stage is required to be as small as possible, but the whole equation calculation is as large as possible, so that the decoding difference is small; when the third part is target classification identification, G (z) is the probability of comparing the output information y in the applied classification class data under the condition of the class model G (z), which can represent the accuracy of classification;

the probability P of the original tag in the output data (x) of the defect sample x after passing through the self-coding network is represented,

representing the probability P of the original label in the output data z (x) after the sample x carrying the defect knowledge passes through a self-coding network; />

Estimating the central point;

the loss function is:

；

wherein a, b, w, h and c are the composition variables of each grid during defect detection, N is the number of the original labels, a and b are the points at the lower left corner of the grid, w and h are the width and height of the grid, c is the confidence coefficient of the grid,

Indicates whether the judgment is passed or not>

The defect type calculates the confidence loss.

In another embodiment of the present invention, the scene rule output is further processed by implicit layer training of a stacked self-encoder, continuously generating and updating a defect scene rule, and supplementing the defect scene rule into the defect scene rule database.

In the specific implementation of the embodiment, the output result after the decoder can realize the classification function by adding a classifier in the decoding stage through a semi-supervised stacked self-encoder, and can continuously generate and update the defect-scene rule knowledge through the hidden layer training of the stacked self-encoder, and supplement the defect-scene rule knowledge to the defect-scene rule database. And further perfecting the knowledge base of the defect and scene mapping rules.

In the specific implementation of this embodiment, referring to fig. 2, a scene rule knowledge base is supplemented according to a rule generated by the post-output of a decoder, that is, a scene factor is extracted through a last-time consequent factor [ Yi-1], a scene hierarchical matrix is updated for a self-coding network structure model, and the scene hierarchical matrix is also supplemented into an input feature vector according to a vector matrix [ Yi ] of the extracted scene factor;

the stacking is performed in the form of a scene factor structure. In the stacking substructure, the antecedent scene factors are merged into a first layer of training, and the consequent scene factors are merged into a second layer of training; the usage is the same, one is threshold value use, and the other is weight amplification; threshold value use means that an activation function is influenced, on the basis of original full connection, through matrix entry verification of antecedent/consequent scene factors, defect features with an excessively small threshold value can be directly discarded, so that excessive feature/scene information is prevented, and finally overfitting can be prevented in application; on the other hand, effective features are further amplified, so that the phenomenon of gradient disappearance easily generated by deep learning can be prevented, and the loss of the effective features is prevented. Through the two aspects, the rules formed by the training of the stacked self-coding network are more suitable for defect scenes.

In another embodiment provided by the present invention, referring to fig. 3, it is a schematic structural diagram of a multi-modal data pre-training and recognition apparatus provided by the embodiment of the present invention, the apparatus includes:

It should be noted that the multi-modal data pre-training and recognition apparatus provided in the embodiment of the present invention can perform the multi-modal data pre-training and recognition method described in any embodiment of the above embodiments, and specific functions of the multi-modal data pre-training and recognition apparatus are not described in detail herein.

Fig. 4 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. The terminal device of this embodiment includes: a processor, a memory, and a computer program, such as a multimodal data pre-training and recognition program, stored in the memory and executable on the processor. When the processor executes the computer program, the steps in each of the above embodiments of the method for pre-training and recognizing multimodal data, such as steps S1 to S5 shown in fig. 1, are implemented. Alternatively, the processor implements the functions of the modules in the above device embodiments when executing the computer program.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device. For example, the computer program may be divided into modules, and the specific functions of the modules are not described again.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of a terminal device, and may include more or less components than those shown, or combine certain components, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.

The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the terminal device integrated module/unit can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in code form, in object code form, in an executable file or in some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.

It should be noted that the above-described device embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the apparatus provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A multi-modal data pre-training and recognition method, the method comprising:

2. The multi-modal data pre-training and recognition method of claim 1, wherein the multi-source heterogeneous data fusion is performed on the acquired defect base data to construct a defect scene rule database, specifically comprising:

3. The method of claim 2, wherein the surface defect dataset D1= [ surface defect ID, defect geometry, spatial distribution data, defect statistics, defect spectral data ];

the defect spectrum subset comprises a texture spectrum, a taint spectrum, and a sawtooth spectrum;

the defect level comprises the detection object type;

the scene factors comprise operation scale and equipment type selection;

4. The multi-modal data pre-training and recognition method of claim 2, wherein the extracting defect type information, feature information, and scene information from the defect scene rules database, performing data association, and extracting scene factors of the defect scene rules database, specifically comprises:

for the defect Z, constructing a layered matrix Z multiplied by T multiplied by R according to the extracted defect type information, the characteristic information and the scene information;

for the defect-feature associated information, a first extraction factor a is adopted _ij Mapping and extracting from the matrix Z multiplied by T to obtain a front defect scene factor

Based on all previous defect scene factors extracted>

Forming a preceding scene factor->

；

For the characteristic-scene correlation information, a second extraction factor b is adopted _ij Mapping and extracting from the matrix T multiplied by R to obtain the background factor of the defect

Based on all the post-defect scene factors extracted ≥>

Forming a post-term scene factor->

；

wherein the content of the first and second substances,

，T/>

，/>

n is the number of defect classes, j is the eigenvector dimension, Z _i ^j Is the value of an element in the defect matrix, T _i ^j For the value of an element in the characteristic information matrix, R _i ^j I =1,2, ... (n) is the value of an element in the scene information matrix; />

，/>

，/>

Then is greater or less>

=0，/>

Then is greater or less>

；

；/>

，/>

，/>

Then is greater or less>

=0，/>

When it is, then

；/>

。

5. The method according to claim 1, wherein the constructing a self-coding network structure model carrying scene information of the defect, fusing the scene factor into the self-coding network structure model, inputting a feature vector obtained by coding sample data of various defects, performing matching training of data and rules, and generating a modal recognition model specifically comprises:

and outputting the scene rules through a semi-supervised stacking self-encoder, adding a classifier in a decoding stage to realize a classification function, and optimizing the self-encoding network structure model classifier through matching training of data and rules to generate the modal recognition model.

6. The method for pre-training and recognizing multimodal data as claimed in claim 1, wherein the objective function of the self-coding network structure model is specifically:

；

；

Represents whether or not there is a->

Representing passage sizeThe calculated absolute mean square error of the information represents the size loss of the defect bounding box; />

Indicates whether the judgment is passed or not>

The defect type calculates the confidence loss.

7. The method of claim 5, wherein the scene rule output is further trained through hidden layers of a stacked self-encoder, continuously generating and updating defect scene rules, and supplementing the defect scene rules into the defect scene rules database.

8. A multi-modal data pre-training and recognition device, the device comprising:

9. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls a device on which the computer-readable storage medium is located to perform the multimodal data pre-training and recognition method as claimed in any one of claims 1 to 7.

10. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the multimodal data pre-training and recognition method as claimed in any one of claims 1 to 7 when executing the computer program.