CN117315377A

CN117315377A - Image processing method and device based on machine vision and electronic equipment

Info

Publication number: CN117315377A
Application number: CN202311606112.7A
Authority: CN
Inventors: 师钰清; 师以贺; 刘静; 郭跃华
Original assignee: Shandong Polytechnic College
Current assignee: Shandong Polytechnic College
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2023-12-29
Anticipated expiration: 2043-11-29
Also published as: CN117315377B

Abstract

The invention provides an image processing method and device based on machine vision and electronic equipment, comprising the following steps: acquiring image data; inputting image data into a pre-trained image recognition model, and determining a first image recognition result; the method comprises the steps of training a pre-trained image recognition model based on an image training data set, wherein the image training data set is obtained by labeling, preprocessing data, expanding data, extracting features and classifying the data of a plurality of video image data; inputting the first image recognition result and the image data into a pre-trained image recognition model, and determining a second image recognition result through the pre-trained image recognition model based on the pre-set noise elimination intensity, the first image recognition result and the noise component of the image data; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold. The training data volume is enlarged, and the recognition performance of the model is improved.

Description

Image processing method and device based on machine vision and electronic equipment

Technical Field

The present invention relates to the field of image recognition technologies, and in particular, to a machine vision-based image processing method and apparatus, and an electronic device.

Background

With the rapid development of modern industrial technology, the requirements on the precision and the efficiency of the production flow are higher and higher. In such a context, conventional manual inspection and manual operations have been difficult to meet production needs, especially on production lines requiring high speed, high precision, and numerous repetitions. In addition, there are also large errors and limitations in manual detection for some subtle differences that are not visible or indistinguishable to the human eye. Machine vision, a technology capable of simulating human visual functions and performing automatic analysis processing, is becoming popular and attracting attention in the industry. Meanwhile, with the progress of computer technology, sensing technology and artificial intelligence technology, the machine vision system has breakthrough progress in image acquisition, processing and analysis. In practical application, the machine vision system not only can rapidly and accurately detect and identify the product, but also can automatically control and optimize the production flow through integration with other industrial systems.

In the related art, the machine vision technology can be utilized to process the image data, so that the purpose of classifying elements in the image is achieved, the image data can be specifically identified and classified through the deep learning technology, the convolutional neural network is utilized to process the image data, the deep features of the image are automatically extracted, and the accuracy of image identification is improved. However, these techniques have a strong dependence on a large amount of training data, and in certain fields, the amount of data acquired is often limited. On the other hand, although deep learning is excellent in feature extraction, there are difficulties in processing noise data, class imbalance, model interpretation, and the like. Particularly, under the conditions of complex field environment and high noise, the direct application of the deep learning model may cause the degradation of recognition performance, and the classification accuracy is reduced.

Disclosure of Invention

Accordingly, an object of the present invention is to provide a machine vision-based image processing method, apparatus and electronic device, so as to solve the above-mentioned technical problems.

In a first aspect, an embodiment of the present invention provides a machine vision-based image processing method, including: acquiring image data; inputting the image data into a pre-trained image recognition model, and determining a first image recognition result; the pre-trained image recognition model is obtained by training based on an image training data set, and the image training data set is obtained by labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data; inputting the first image recognition result and the image data into the pre-trained image recognition model, and determining a second image recognition result by the pre-trained image recognition model based on preset noise cancellation intensity, the first image recognition result and noise components of the image data; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold.

In a preferred embodiment of the present invention, the pre-trained image recognition model is trained by: labeling the collected multiple video image data to obtain a first training data set; performing data preprocessing on the first training data set to obtain a second training data set; performing data expansion on the second training data set to obtain a third training data set; performing feature extraction on the third training data set to obtain an image training data set; training an image recognition model based on the image training data set to obtain a pre-trained image recognition model.

In a preferred embodiment of the present invention, the labeling the collected plurality of video image data to obtain the first training data set includes: determining first element categories corresponding to the video image data respectively; and labeling the plurality of video image data based on each first element category to obtain a first training data set.

In a preferred embodiment of the present invention, the performing data preprocessing on the first training data set to obtain a second training data set includes: respectively carrying out normalization processing on a plurality of first image data included in a first training data set; and respectively denoising the plurality of normalized first image data to obtain a second training data set.

In a preferred embodiment of the present invention, the performing data expansion on the second training data set to obtain a third training data set includes: generating, by the generator, a plurality of first dummy data based on a plurality of second image data included in the second training data set; performing attention enhancement processing on the first dummy data to obtain second dummy data; screening the second dummy data through a discriminator to obtain a plurality of data samples; and respectively performing angle rotation on the plurality of data samples to obtain a third training data set.

In a preferred embodiment of the present invention, the feature extraction of the third training data set to obtain an image training data set includes: stretching the third training data set to obtain a one-dimensional vector; normalizing the one-dimensional vector to obtain a unit vector; determining a quantum state corresponding to the third training data set based on the unit vector; carrying out quantum operation on the quantum state to obtain a feature vector; an image training dataset is determined based on the feature vectors and the attention weights.

In a preferred embodiment of the present invention, the image training data set includes a plurality of third image data, a plurality of second element types included in the plurality of third image data, and feature information corresponding to the plurality of second element types, respectively, and the training of the image recognition model based on the image training data set to obtain a pre-trained image recognition model includes: determining the prior probability of each second element class; determining a conditional probability of each feature information based on the prior probability; taking the plurality of feature information as a feature set, and selecting the plurality of feature information from the feature set to construct a feature subset; selecting independent features from the complement of the feature subset based on the feature set; determining a gaussian conditional probability based on the independence feature, the feature subset, and the conditional probability; adding the independent feature with the maximum Gaussian condition probability into the feature subset until feature information in the feature subset meets a preset quantity threshold value, and obtaining a final feature subset; determining an attention weight for each individual feature in the final feature subset; determining a classification rule based on the attention weight and the final feature subset; a pre-trained image recognition model is determined based on the classification rules.

In a preferred embodiment of the present invention, adding the independent feature with the largest gaussian conditional probability to the feature subset until feature information in the feature subset meets a preset number threshold, to obtain a final feature subset, includes: initializing a feature subset as an empty set; selecting independent features in the complement of the initialized feature subset based on the feature set; and adding the independent feature with the maximum Gaussian condition probability into the feature subset until the feature information in the feature subset meets the preset quantity threshold value, and obtaining a final feature subset.

In a second aspect, an embodiment of the present invention further provides an image processing apparatus based on machine vision, including: the image data acquisition module is used for acquiring image data; the first image recognition result determining module is used for inputting the image data into a pre-trained image recognition model to determine a first image recognition result; the pre-trained image recognition model is obtained by training based on an image training data set, and the image training data set is obtained by labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data; a final image recognition result determining module, configured to input the first image recognition result and the image data into the pre-trained image recognition model, and determine a second image recognition result based on a pre-set noise cancellation intensity, the first image recognition result, and a noise component of the image data through the pre-trained image recognition model; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold.

In a third aspect, an embodiment of the present invention further provides an electronic device, including a processor and a memory, where the memory stores computer executable instructions executable by the processor, where the processor executes the computer executable instructions to implement the machine vision based image processing method of the first aspect.

The embodiment of the invention has the following beneficial effects:

the embodiment of the invention provides an image processing method, an image processing device and electronic equipment based on machine vision, which are characterized in that image data are firstly acquired, then the image data are input into a pre-trained image recognition model, a first image recognition result is determined, then the first image recognition result and the image data are input into the pre-trained image recognition model, a second image recognition result is determined through the pre-trained image recognition model based on the pre-set noise elimination intensity, the first image recognition result and the noise component of the image data, and finally a final image recognition result is determined based on the second image recognition result and a pre-set category decision threshold, wherein the pre-trained image recognition model is obtained through training based on an image training data set, and the image training data set is obtained through labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data, so that the training data quantity is not only enlarged, but also the problem of insufficient training data quantity is solved, the noise reduction performance of the image recognition model is improved, and the accuracy of classification is improved in the image recognition process.

Additional features and advantages of the disclosure will be set forth in the description which follows, or in part will be obvious from the description, or may be learned by practice of the techniques of the disclosure.

The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an image processing method based on machine vision according to an embodiment of the present invention;

FIG. 2 is a flowchart of another image processing method based on machine vision according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an image processing device based on machine vision according to an embodiment of the present invention;

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Based on the above, the image processing method, the device and the electronic equipment based on machine vision provided by the embodiment of the invention acquire image data firstly, input the image data into a pre-trained image recognition model to determine a first image recognition result, input the first image recognition result and the image data into the pre-trained image recognition model, determine a second image recognition result through the pre-trained image recognition model based on the pre-set noise elimination intensity, the first image recognition result and the noise component of the image data, and finally determine a final image recognition result based on the second image recognition result and the pre-set category decision threshold, wherein the pre-trained image recognition model is obtained based on the training of an image training data set, and the image training data set is obtained based on the labeling, the data preprocessing, the data expansion, the feature extraction and the data classification of the acquired multiple video image data, so that the training data quantity is not only enlarged, the problem of insufficient training data quantity is solved, but also the noise reduction is performed in the image recognition process, the recognition performance of the image recognition model is improved, and the accuracy of classification is improved.

For the sake of understanding the present embodiment, first, a detailed description is given of an image processing method based on machine vision disclosed in the present embodiment.

Example 1

An embodiment of the invention provides an image processing method based on machine vision, and fig. 1 is a flowchart of the image processing method based on machine vision provided by the embodiment of the invention. As shown in fig. 1, the machine vision-based image processing method may include the steps of:

step S101, image data is acquired.

The image data can be obtained through image acquisition equipment such as a camera, and the image data can be video frame images in the monitoring video.

Step S102, inputting the image data into a pre-trained image recognition model, and determining a first image recognition result.

The pre-trained image recognition model is obtained by training based on an image training data set, and the image training data set is obtained by labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data.

Wherein the first image recognition result characterizes the element class in the image data, but wherein there may be an influence of noise and an influence of class imbalance. The image may contain a plurality of elements, for example, a person, an article, etc., the first image recognition result may be a person behavior type, an article type, etc., the first image recognition result may relate to a target recognition element in the image data, and before using the image recognition model, the target element of the type to be recognized may be defined, so as to obtain a corresponding first image recognition result, for example, define the person behavior type to be recognized, and the image recognition model may only output the person behavior type in the image data, and may not output the object type contained in the image data.

Specifically, the first image recognition result is obtained based on the following expression (1).

（1）

Wherein,representing the first image recognition result,/->Representing an image recognition model->Representing image data.

Step S103, inputting the first image recognition result and the image data into a pre-trained image recognition model, and determining a second image recognition result by the pre-trained image recognition model based on the pre-set noise elimination intensity, the first image recognition result and the noise component of the image data; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold.

Wherein the second image recognition result characterizes the element class of the image data as a result of noise cancellation of the first image recognition result, but wherein there may still be an influence of class imbalance. The final image recognition result characterizes the element category of the image data and is the result of adjusting the second image recognition result through a preset category decision threshold.

Specifically, the second image recognition result is obtained based on the following expression (2).

（2）

Wherein,representing the second image recognition result,/->Representing the first image recognition result,/->Representing the intensity of noise cancellation ，/>Representing image data +.>Noise component representing input image data, +.>Representing multiplication.

Specifically, the final image recognition result is obtained based on the following expression (3).

（3）

Wherein,representing the final image recognition result,/->Representing the second image recognition result,/->Representing class decision threshold, < >>Representing the selection of +.>The largest category.

According to the machine vision-based image processing method provided by the embodiment of the invention, the image data is firstly acquired, then the image data is input into the pre-trained image recognition model, the first image recognition result is determined, then the first image recognition result and the image data are input into the pre-trained image recognition model, the second image recognition result is determined through the pre-trained image recognition model based on the pre-set noise elimination intensity, the first image recognition result and the noise component of the image data, and finally the final image recognition result is determined based on the second image recognition result and the pre-set category decision threshold, wherein the pre-trained image recognition model is obtained through training based on the image training data set, and the image training data set is obtained through labeling, data preprocessing, data expansion, feature extraction and data classification of the acquired multiple video image data, so that the problem of insufficient training data is solved, noise reduction is performed in the image recognition process, the recognition performance of the image recognition model is improved, and the classification accuracy is improved.

Example 2

The embodiment of the invention also provides another image processing method based on machine vision; the method is realized on the basis of the method of the embodiment; the method focuses on specific training steps of a pre-trained image recognition model.

Fig. 2 is a flowchart of another image processing method based on machine vision according to an embodiment of the present invention, and as shown in fig. 2, a pre-trained image recognition model is trained by:

step S201, labeling the collected multiple video image data to obtain a first training data set.

The data format of the video image data is an RGB image format, and each video image includes three channels, namely, red (R), green (G) and blue (B), and the attribute of the video image data may include pixel values, length and width of the video image, and the like.

Specifically, labeling the collected multiple video image data to obtain a first training data set may include: determining first element categories corresponding to the video image data respectively; and labeling the plurality of video image data based on each first element category to obtain a first training data set.

Wherein it can be assumed that For a video image +.>Representing the +.>Line->Pixels of a column, each pixel being composed of three values, i.e. +.>Wherein->Representing the intensity of the red channel at that pixel, is->Representing the intensity of the green channel at that pixel, is->Representing the intensity of the blue channel at that pixel. The size of the video image is defined asWherein->Representing the height of the video image +.>Representing the width of the video image.

Taking the element person and the behavior category of the element person as examples, in the labeling process, the behavior of the person in the video image needs to be labeled, for example, when the person is overhauling, the video image is labeled as 'overhauling', and the labeling can be represented by category labels, for exampleWherein->Representing category label->Representing the first elementCategory (S)/(S)>Representing the total number of first element categories. By the labeling in the above manner, a first training data set can be obtained:wherein->Representing a first training data set,/for>Representing groups of video image elements->Video image group characterization image +.>Representation of->Element category of->Representing the total number of video images.

Step S202, data preprocessing is carried out on the first training data set to obtain a second training data set.

Specifically, performing data preprocessing on the first training data set to obtain a second training data set, including: respectively carrying out normalization processing on a plurality of first image data included in a first training data set; and respectively denoising the plurality of normalized first image data to obtain a second training data set.

The normalization processing is to normalize the first image data with the pixel value range of 0-255 into normalized first image data with the pixel value range of 0-1, so that the calculation complexity in the training process is reduced.

In particular, the first image data may be represented asThe normalized first image data may be expressed as +.>Wherein->，/>，/>So that the pixel values corresponding to the first image data are normalized to be in the range of 0-1.

The denoising method based on the self-encoder can be used for respectively denoising the plurality of normalized first image data.

Specifically, the image data denoised by the self-encoder can be expressed as:wherein, the method comprises the steps of, wherein,representing the coding function of the self-encoder, +.>Decoding function from encoder, ++>Representing the normalized first image data. The plurality of sets of image data denoised by the self-encoder are the second training data set.

Step S203, data expansion is performed on the second training data set to obtain a third training data set.

Specifically, performing data expansion on the second training data set to obtain a third training data set, including: generating, by the generator, a plurality of first dummy data based on a plurality of second image data included in the second training data set; performing attention enhancement processing on the first dummy data to obtain second dummy data; screening the second dummy data through a discriminator to obtain a plurality of data samples; and respectively performing angle rotation on the plurality of data samples to obtain a third training data set.

The data expansion is performed through a generated countermeasure network (Generative Adversarial Networks, GAN), the GAN comprises a generator G and a discriminator D, and distribution of generated data is learned through a game mode. The generator G is responsible for generating first dummy data imitating the second image data in an attempt to fool the arbiter D, while the task of the arbiter D is to distinguish as much as possible between the true second image data and the generated first dummy data. In generating the data samples, the generator G generates random noiseStarting, generating second dummy data after the self-attention enhancing operation. The discriminator D needs to determine the second image data and the generated second dummy data, thereby screening out the data samples.

Specifically, the objective function of the generator G can be expressed by the following expression (4):

（4）

wherein,representing the objective function of generator G, < >>Representing the desired value->Representing first dummy data, < >>Representing random noise sampled from a standard normal distribution,/->Representation control sparsityWeight parameter of degree ∈>Representation dictionary->Representing a sparsity for constructing a sparsity constraint term, i.e. a representation of the sparsity of the generated first dummy data under the dictionary, < ->Representing the L2 norm.

Wherein the sparseness of each first dummy data under the dictionary can be solved by the following expression (5):

（5）

wherein,representation sparse, ->Representing first dummy data, < >>Representation dictionary->Represents L2 norm->Represents L1 norm,/->Representing the sparseness under all dictionaries.

Specifically, the objective function of the discriminator D can be expressed by the following expression (6):

（6）

wherein,representing the objective function of the arbiter D, +.>Representing the desired value->Representing second image data +_>Representing first dummy data.

Specifically, at each layer of the generator G, a self-attention module is added in addition to the conventional convolution operation, and therefore, second dummy data can be generated by the following expression (7):

（7）

Wherein,representing second dummy data, ++>Representing the input as noise +.>Output of time generator G->Representing a self-attention module.

Specifically, the operation mode of the self-attention module can be represented by the following expression (8):

（8）

wherein,representing self-attention module->Representation value (value), +.>，/>Indicate->Sample No. H>Attention fraction of individual features, +.>Indicate->Value of individual characteristic>Transposed symbol representing matrix,/>Representing query (query), is->Representing a key.

In order to increase the diversity and robustness of the data samples, a rotation operation is performed on the generated data samples. For each generated second dummy dataPerforming rotation at a certain angle to obtain sample data +.>A plurality of rotated sample data form a third training data set, wherein +.>Indicating a rotation operation +.>Is the rotation angle.

Step S204, extracting features of the third training data set to obtain an image training data set.

Specifically, performing feature extraction on the third training data set to obtain an image training data set, including: stretching the third training data set to obtain a one-dimensional vector; normalizing the one-dimensional vector to obtain a unit vector; determining a quantum state corresponding to the third training data set based on the unit vector; carrying out quantum operation on the quantum state to obtain a feature vector; an image training dataset is determined based on the feature vectors and the attention weights.

Wherein, the unit vector can be obtained by the following expression (9):

（9）

wherein,representing a third training data set,/->Representing a one-dimensional vector +.>Representing unit vector +_>Representation->Is a binary norm of (c).

Specifically, regarding the unit vector as the amplitude of one quantum state, the quantum state corresponding to the third training data set may be obtained, and the quantum state may be obtained by the following equation (10):

（10）

wherein,representing the quantum state->Representing unit vector +_>Representing the total number of input sample data, +.>Representing each element in the unit vector, +.>Representing a quantum state corresponding to each element in the unit vector, performing quantum operation on the quantum state, such as Hadamard transformation, phase rotation and the like, to obtain a new quantum state, wherein the new quantum state expression is as follows:，/>unitary matrix representing quantum operations, +.>Representing a new quantum state,/->Representing a quantum state. Measuring the new quantum state to obtain a feature vector, wherein the feature vector can be obtained by the following formula (11):

（11）

wherein,representing feature vectors +_>Representing the inner product of the quantum states.

Specifically, the attention weight can be obtained by the attention weight generator based on the following expression (12):

（12）

wherein,representing attention weight,/- >Representing an attention weight generator, the attention weight generator being a fully connected layer, +.>Representing the feature vector.

The following formula (13) represents the output calculation process of the full connection layer, wherein the dimension of the feature vector is d, the size of the weight matrix of the full connection layer is dxd, and the size of the offset vector is d:

（13）

wherein,representing the output of the fully connected layer, which is also of size b +.>Representing a weight matrix, +.>Representing feature vectors +_>Representing the bias vector.

Further, the output of the full connection layer is converted into attention weight through the Sigmoid activation function, and the expression is:wherein->Representing attention weight,/->Representing Sigmoid activation function,/->Representing the output of the fully connected layer. The main function of the sigmoid function is to compress the output of the fully connected layer to between 0 and 1 and make the sum of all weights 1 to interpret it as a probability or importance.

Further, the feature vector is weighted by the following expression (14), to obtain a weighted feature vector, i.e., an image training data set:

（14）

wherein,representing an image training dataset->Representing attention weight,/->Representing dot product->Representing the feature vector.

The quantum state coding can realize the compression of high-dimensional data, so that the high-dimensional data can be effectively processed, and the nonlinear characteristic extraction can be realized through quantum operation, so that more characteristic information can be obtained.

Step S205, training an image recognition model based on the image training data set to obtain a pre-trained image recognition model.

The image training data set comprises a plurality of third image data, a plurality of second element categories contained in the third image data and feature information respectively corresponding to the second element categories.

For easy understanding, the step of training the image recognition model based on the image training data set to obtain a pre-trained image recognition model is described in detail through steps A1 to A9.

And A1, determining the prior probability of each second element category.

Wherein, the classification task is provided with C different second element categories, and the prior probability of the second element category can be determined by the following formula (15):

（15）

wherein,，/>indicate->The prior probability of the second element class, +.>Indicate->Second element category,/->Representing that the image training dataset belongs to +.>The number of images of a category>Representing the total number of images in the image training dataset.

And step A2, determining the conditional probability of each piece of characteristic information based on the prior probability.

Wherein the input features are estimated under a given element categoryConditional probability of->：

Specifically, it is provided with a common Personal characteristic information->The conditional probability can be represented by the following expression (16):

（16）

and A3, taking the plurality of feature information as a feature set, and selecting the plurality of feature information from the feature set to construct a feature subset.

Wherein the most relevant feature information is selected from the feature set by a greedy algorithm, the feature information is used as a feature subset, and the feature subset is marked as。

And step A4, selecting independent features from the complement sets of the feature subsets based on the feature sets.

Wherein a feature is selected from the complement of the feature subsetSo that the feature->Having the greatest conditional independence from the currently selected feature, i.e. selecting +.>Make->Maximum, wherein->Representing a feature selected from the complement of the feature subset,/->Representing feature subset->Indicate->And a second element class.

Step A5, determining gaussian conditional probabilities based on the independence feature, feature subset, and conditional probabilities.

The gaussian conditional probability can be determined by the following equation (17):

（17）

wherein,representing a gaussian conditional probability +.>Representing feature subset->Expressed in the second element category->Lower->Mean value of individual features>Expressed in the second element category- >Lower->Variance of the individual features.

And step A6, adding the independent feature with the maximum Gaussian condition probability into the feature subset until the feature information in the feature subset meets the preset quantity threshold value, and obtaining a final feature subset.

Specifically, adding the independent feature with the maximum gaussian conditional probability to the feature subset until feature information in the feature subset meets a preset quantity threshold, and obtaining a final feature subset may include: initializing a feature subset as an empty set; selecting independent features in the complement of the initialized feature subset based on the feature set; and adding the independent feature with the maximum Gaussian condition probability into the feature subset until the feature information in the feature subset meets the preset quantity threshold value, and obtaining a final feature subset.

The optimal quantity threshold is selected by cross-validation and feature subsetTo avoid overfitting problems and to obtain better classification performance.

Step A7, determining the attention weight of each individual feature in the final feature subset.

Specifically, a subset of features is selected in a greedy algorithmThen, a corresponding characteristic weight vector is obtained and is marked as +. >The attention weight can be determined by the following expression (18):

（18）

wherein,represents an attention weight defined as a feature weight +.>Is expressed as +.>Attention weight of individual feature, +.>Indicate->The degree of importance of individual features in element classification.

Wherein the denominator is to normalize the attention weights such that the sum of the attention weights of all features is equal to 1.

Step A8, determining a classification rule based on the attention weight and the final feature subset.

Specifically, the classification rule can be expressed by the following expression (19):

（19）

wherein,representing classification rules->Is expressed in the given second element category +.>And attention weight vector->Under, feature subset->Conditional probability of (2).

For calculation, it is necessary to weight according to attentionTo adjust the Gaussian conditional probability +.>Feature subset after attention weight is introduced +.>Conditional probability of->The expression (20) can be expressed as follows:

（20）

further, the feature subset may be determined by the following equation (21)Conditional probability of (2)：

（21）

Wherein,representing a feature selected from the complement of the feature subset,/->Indicate->And a second element class.

Step A9, determining a pre-trained image recognition model based on the classification rule.

According to the machine vision-based image processing method provided by the embodiment of the invention, through the combination of deep learning and quantum machine learning, the characteristics of image data can be effectively extracted, the classification and recognition capability of an image recognition model are enhanced, the problems of unbalanced noise data and classification are avoided through a data expansion and noise elimination technology, the prediction capability of the image recognition model on unknown image data is improved through the data expansion generated by GAN and the characteristic selection of a greedy algorithm, and the decision process of the image recognition model is more transparent through introducing an attention mechanism, so that the attention of the image recognition model to important characteristics is conveniently understood. By training the image recognition model in the mode, the accuracy of the trained image recognition model in image recognition is improved.

Example 3

Corresponding to the above method embodiment, the embodiment of the present invention provides an image processing device based on machine vision, and fig. 3 is a schematic structural diagram of the image processing device based on machine vision provided by the embodiment of the present invention, as shown in fig. 3, the image processing device based on machine vision may include:

The image data acquisition module 301 is configured to acquire image data.

A first image recognition result determining module 302, configured to input the image data into a pre-trained image recognition model, and determine a first image recognition result; the pre-trained image recognition model is obtained by training based on an image training data set, and the image training data set is obtained by labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data.

A final image recognition result determining module 303, configured to input the first image recognition result and the image data into the pre-trained image recognition model, and determine a second image recognition result based on a preset noise cancellation intensity, the first image recognition result, and a noise component of the image data through the pre-trained image recognition model; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold.

The image processing device based on machine vision provided by the embodiment of the invention firstly acquires image data, then inputs the image data into the pre-trained image recognition model, determines the first image recognition result, then inputs the first image recognition result and the image data into the pre-trained image recognition model, determines the second image recognition result based on the pre-trained image recognition model and the noise component of the pre-set noise elimination intensity, the first image recognition result and the image data, and finally determines the final image recognition result based on the second image recognition result and the pre-set category decision threshold, wherein the pre-trained image recognition model is obtained based on the training of the image training data set, and the image training data set is obtained after labeling, data preprocessing, data expansion, feature extraction and data classification of the acquired multiple video image data, so that the training data quantity is enlarged, the problem of insufficient training data quantity is solved, noise reduction is performed in the image recognition process, the recognition performance of the image recognition model is improved, and the classification accuracy is improved.

In some embodiments, the first image recognition result determining module is further configured to annotate the collected plurality of video image data to obtain a first training data set; performing data preprocessing on the first training data set to obtain a second training data set; performing data expansion on the second training data set to obtain a third training data set; performing feature extraction on the third training data set to obtain an image training data set; training an image recognition model based on the image training data set to obtain a pre-trained image recognition model.

In some embodiments, the first image recognition result determining module is further configured to determine a first element class corresponding to each of the plurality of video image data; and labeling the plurality of video image data based on each first element category to obtain a first training data set.

In some embodiments, the first image recognition result determining module is further configured to normalize a plurality of first image data included in the first training data set, respectively; and respectively denoising the plurality of normalized first image data to obtain a second training data set.

In some embodiments, the first image recognition result determination module is further configured to generate, by the generator, a plurality of first dummy data based on a plurality of second image data included in the second training data set; performing attention enhancement processing on the first dummy data to obtain second dummy data; screening the second dummy data through a discriminator to obtain a plurality of data samples; and respectively performing angle rotation on the plurality of data samples to obtain a third training data set.

In some embodiments, the first image recognition result determining module is further configured to stretch the third training data set to obtain a one-dimensional vector; normalizing the one-dimensional vector to obtain a unit vector; determining a quantum state corresponding to the third training data set based on the unit vector; carrying out quantum operation on the quantum state to obtain a feature vector; an image training dataset is determined based on the feature vectors and the attention weights.

In some embodiments, the image training data set includes a plurality of third image data, a plurality of second element categories included in the plurality of third image data, and feature information respectively corresponding to the plurality of second element categories, and the first image recognition result determining module is further configured to determine a priori probabilities of the respective second element categories; determining a conditional probability of each feature information based on the prior probability; taking the plurality of feature information as a feature set, and selecting the plurality of feature information from the feature set to construct a feature subset; selecting independent features from the complement of the feature subset based on the feature set; determining a gaussian conditional probability based on the independence feature, the feature subset, and the conditional probability; adding the independent feature with the maximum Gaussian condition probability into the feature subset until feature information in the feature subset meets a preset quantity threshold value, and obtaining a final feature subset; determining an attention weight for each individual feature in the final feature subset; determining a classification rule based on the attention weight and the final feature subset; a pre-trained image recognition model is determined based on the classification rules.

In some embodiments, the first image recognition result determination module is further configured to initialize the feature subset to an empty set; selecting independent features in the complement of the initialized feature subset based on the feature set; and adding the independent feature with the maximum Gaussian condition probability into the feature subset until the feature information in the feature subset meets the preset quantity threshold value, and obtaining a final feature subset.

The device provided by the embodiment of the present invention has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment is not mentioned.

Example 4

The embodiment of the invention also provides electronic equipment for running the image processing method based on machine vision; referring to a schematic structural diagram of an electronic device shown in fig. 4, the electronic device includes a memory 400 and a processor 401, where the memory 400 is configured to store one or more computer instructions, and the one or more computer instructions are executed by the processor 401 to implement the above-mentioned image processing method based on machine vision.

Further, the electronic device shown in fig. 4 further comprises a bus 402 and a communication interface 403, and the processor 401, the communication interface 403 and the memory 400 are connected by the bus 402.

The memory 400 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is implemented via at least one communication interface 403 (which may be wired or wireless), which may use the internet, a wide area network, a local network, a metropolitan area network, etc. Bus 402 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 4, but not only one bus or type of bus.

The processor 401 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in the processor 401 or by instructions in the form of software. The processor 401 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; but also digital signal processors (Digital Signal Processor, DSP for short), application specific integrated circuits (Application Specific Integrated Circuit, ASIC for short), field-programmable gate arrays (Field-Programmable Gate Array, FPGA for short) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 400, and the processor 401 reads the information in the memory 400, and in combination with its hardware, performs the steps of the method of the previous embodiment.

The embodiment of the invention also provides a computer readable storage medium, which stores computer executable instructions that, when being called and executed by a processor, cause the processor to implement the above image processing method based on machine vision, and the specific implementation can be referred to the method embodiment and will not be described herein.

The computer program product for performing the machine vision-based image processing method according to the embodiment of the present invention includes a computer readable storage medium storing a non-volatile program code executable by a processor, where the program code includes instructions for executing the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be described herein.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A machine vision-based image processing method, the method comprising:

acquiring image data;

inputting the image data into a pre-trained image recognition model, and determining a first image recognition result; the pre-trained image recognition model is obtained by training based on an image training data set, and the image training data set is obtained by labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data;

Inputting the first image recognition result and the image data into the pre-trained image recognition model, and determining a second image recognition result by the pre-trained image recognition model based on preset noise cancellation intensity, the first image recognition result and noise components of the image data; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold.

2. The method of claim 1, wherein the pre-trained image recognition model is trained by:

labeling the collected multiple video image data to obtain a first training data set;

performing data preprocessing on the first training data set to obtain a second training data set;

performing data expansion on the second training data set to obtain a third training data set;

performing feature extraction on the third training data set to obtain an image training data set;

and training an image recognition model based on the image training data set to obtain a pre-trained image recognition model.

3. The method of claim 2, wherein labeling the plurality of captured video image data to obtain a first training data set comprises:

Determining a plurality of first element categories corresponding to the video image data respectively;

and labeling the video image data based on the first element categories to obtain the first training data set.

4. The method of claim 2, wherein the data preprocessing the first training data set to obtain a second training data set comprises:

respectively carrying out normalization processing on a plurality of first image data included in the first training data set;

and respectively denoising the plurality of normalized first image data to obtain the second training data set.

5. The method of claim 2, wherein the data augmenting the second training data set to obtain a third training data set comprises:

generating, by a generator, a plurality of first dummy data based on a plurality of second image data included in the second training data set;

performing attention enhancement processing on the first dummy data to obtain second dummy data;

screening the second dummy data through a discriminator to obtain a plurality of data samples;

and respectively performing angle rotation on the plurality of data samples to obtain the third training data set.

6. The method of claim 2, wherein the feature extraction of the third training data set to obtain an image training data set comprises:

stretching the third training data set to obtain a one-dimensional vector;

normalizing the one-dimensional vector to obtain a unit vector;

determining a quantum state corresponding to the third training data set based on the unit vector;

carrying out quantum operation on the quantum state to obtain a feature vector;

the image training dataset is determined based on the feature vectors and attention weights.

7. The method according to claim 2, wherein the image training dataset includes a plurality of third image data, a plurality of second element categories included in the plurality of third image data, and feature information respectively corresponding to the plurality of second element categories, the training the image recognition model based on the image training dataset, and obtaining the pre-trained image recognition model includes:

determining a priori probabilities of the second element classes;

determining a conditional probability of each of the feature information based on the prior probabilities;

taking a plurality of the feature information as a feature set, and selecting a plurality of the feature information from the feature set to construct a feature subset;

Selecting an independent feature from a complement of the feature subset based on the feature set;

determining a gaussian conditional probability based on the independence feature, the feature subset, and the conditional probability;

adding the independent feature with the maximum Gaussian conditional probability into the feature subset until feature information in the feature subset meets a preset quantity threshold value, and obtaining a final feature subset;

determining an attention weight for each of the independent features in the final feature subset;

determining a classification rule based on the attention weight and the final feature subset;

a pre-trained image recognition model is determined based on the classification rules.

8. The method of claim 7, wherein adding the independence feature with the greatest gaussian conditional probability to the feature subset until feature information in the feature subset meets a preset number threshold, resulting in a final feature subset, comprises:

initializing the feature subset to an empty set;

selecting independent features from the complement of the initialized feature subset based on the feature set;

and adding the independent feature with the maximum Gaussian conditional probability into the feature subset until feature information in the feature subset meets a preset quantity threshold value, and obtaining a final feature subset.

9. An image processing apparatus based on machine vision, the apparatus comprising:

the image data acquisition module is used for acquiring image data;

the first image recognition result determining module is used for inputting the image data into a pre-trained image recognition model to determine a first image recognition result; the pre-trained image recognition model is obtained by training based on an image training data set, and the image training data set is obtained by labeling, data preprocessing, data expansion, feature extraction and data classification of a plurality of acquired video image data;

a final image recognition result determining module, configured to input the first image recognition result and the image data into the pre-trained image recognition model, and determine a second image recognition result based on a pre-set noise cancellation intensity, the first image recognition result, and a noise component of the image data through the pre-trained image recognition model; and determining a final image recognition result based on the second image recognition result and a preset category decision threshold.

10. An electronic device comprising a processor and a memory, the memory storing computer executable instructions executable by the processor, the processor executing the computer executable instructions to implement the machine vision based image processing method of any one of claims 1 to 8.