CN115641517A

CN115641517A - Machine vision defect identification method and system, edge side device and storage medium

Info

Publication number: CN115641517A
Application number: CN202211102395.7A
Authority: CN
Inventors: 史敏锐; 韩韬; 王慧芬
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2023-01-24
Also published as: WO2024051222A1

Abstract

The disclosure provides a machine vision defect identification method and system, an edge side device and a storage medium, and relates to the technical field of machine vision. The machine vision defect identification method comprises the following steps: receiving an image to be detected sent by an image acquisition device; dividing an image to be detected into a preset number D of mutually non-overlapping blocks; carrying out block embedding processing on each block to obtain a corresponding image token, wherein the image token comprises visual token information and position embedding information of the corresponding block; processing the image token by using a preset first defect identification model to obtain a defect identification result; and sending the defect identification result to the user terminal under the condition that the defect identification result shows that the image to be detected does not belong to the defect image.

Description

Machine vision defect identification method and system, edge side device and storage medium

Technical Field

The disclosure relates to the technical field of machine vision, and in particular relates to a machine vision defect identification method and system, an edge side device and a storage medium.

Background

The machine vision system oriented to intelligent manufacturing is a system which uses a machine or a computer to replace human vision to detect, classify, measure or judge. Machine vision is a technique used to provide imaging-based automated detection and analysis for applications such as automated detection, process control, and robotic guidance, and is commonly used in the smart industry field. The machine vision system converts the machine vision product into image signals and transmits the image signals to a special image processing system, the image system performs various calculations on the data and the signals to extract the characteristics of a target, and the equipment action of an industrial field is controlled according to the judgment result. The machine vision system based on intelligent manufacturing has very important value in the fields of industrial product defect detection, product quality control, automatic sorting and the like.

The machine vision system is characterized by improving the intelligent degree and the automation degree of production. For example, machine vision may be used in place of artificial vision in hazardous work environments where manual work is not appropriate or where artificial vision is difficult to meet. For another example, in the process of mass industrial production, the machine vision defect identification method can greatly improve the production efficiency and the automation degree of production. And the machine vision is easy to realize information integration, and is a basic technology for realizing computer integrated manufacturing.

Disclosure of Invention

The inventor has noticed that, in the related art, defect identification needs to be realized through cooperative work of the edge side device and the cloud server, which may cause a long task delay and low computational efficiency, and meanwhile, interaction between the edge side device and the cloud server may occupy more network resources, which may affect network resources used by other network applications.

Accordingly, the machine vision defect identification scheme provided by the invention can effectively reduce longer task delay, improve the calculation efficiency and avoid the influence on network resources used by other network applications.

According to a first aspect of the embodiments of the present disclosure, there is provided a machine vision defect identification method, performed by an edge-side device, including: receiving an image to be detected sent by an image acquisition device; dividing the image to be detected into a preset number D of mutually non-overlapping blocks; carrying out block embedding processing on each block to obtain a corresponding image token, wherein the image token comprises visual token information and position embedding information of the corresponding block; processing the image token by using a preset first defect identification model to obtain a defect identification result; and sending the defect identification result to a user terminal under the condition that the defect identification result shows that the image to be detected does not belong to the defect image.

In some embodiments, the first defect identification model includes N encoders and a normalization module, where N is a natural number greater than 0; processing the image token using a first defect identification model comprises: encoding input information of an ith encoder by using the ith encoder to obtain output information of the ith encoder, wherein i is more than or equal to 1 and is less than N, and the image token is input information of the 1 st encoder; taking the output information of the ith encoder as the input information of the (i + 1) th encoder; and normalizing the output information of the Nth encoder by using the normalization module to obtain the defect identification result.

In some embodiments, encoding the input information of the ith encoder by the ith encoder includes: performing multi-head self-attention processing on the input information of the ith encoder by using a multi-head self-attention model to obtain a first processing result; fusing the first processing result and the input information of the ith encoder to obtain a second processing result; carrying out normalization processing on the second processing result by using a layer normalization model to obtain a third processing result; performing multilayer perception processing on the third processing result by using a multilayer perceptron model to obtain a fourth processing result; fusing the third processing result and the fourth processing result to obtain a fifth processing result; and carrying out normalization processing on the fifth processing result by using a layer normalization model to obtain output information of the ith encoder.

In some embodiments, a multi-headed self is utilizedThe attention model performs multi-head self-attention processing on the input information of the ith encoder, and comprises the following steps: aiming at the input information of the ith encoder, respectively according to a first attention weight matrix W of each single head _t ^Q A second attention weight matrix

And a third attention weight matrix

Determining a corresponding first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t (ii) a According to the first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t Determining an attention value for each of the single heads; and determining corresponding multi-head attention values according to the attention values of all the single heads to serve as the first processing result.

In some embodiments, the first defect identification model comprises a visual Vision transform model.

In some embodiments, when the defect identification result indicates that the image to be detected belongs to a defect image, the image to be detected is sent to a cloud server, so that the cloud server trains a preset second defect identification model by using the image to be detected.

In some embodiments, the first defect identification model is weight updated using model weight information sent by the cloud server.

According to a second aspect of embodiments of the present disclosure, there is provided an edge side device including: the first processing module is configured to receive an image to be detected sent by the image acquisition device; the second processing module is configured to divide the image to be detected into a preset number D of mutually non-overlapping blocks, and perform block embedding processing on each block to obtain a corresponding image token, wherein the image token comprises visual token information and position embedding information of the corresponding block; the third processing module is configured to process the image token by using a preset first defect identification model to obtain a defect identification result; and the fourth processing module is configured to send the defect identification result to the user terminal under the condition that the defect identification result indicates that the image to be detected does not belong to the defect image.

In some embodiments, the first defect identification model includes N encoders and a normalization module, where N is a natural number greater than 0; the third processing module is configured to perform encoding processing on input information of an ith encoder by using the ith encoder to obtain output information of the ith encoder, wherein i is more than or equal to 1 and less than N, the image token is input information of the 1 st encoder, the output information of the ith encoder is used as input information of an (i + 1) th encoder, and the normalization module is used for performing normalization processing on the output information of the Nth encoder to obtain the defect identification result.

In some embodiments, the third processing module is configured to perform multi-head self-attention processing on the input information of the i-th encoder by using a multi-head self-attention model to obtain a first processing result, fuse the first processing result and the input information of the i-th encoder to obtain a second processing result, perform normalization processing on the second processing result by using a layer normalization model to obtain a third processing result, perform multi-layer perception processing on the third processing result by using a multi-layer perceptron model to obtain a fourth processing result, fuse the third processing result and the fourth processing result to obtain a fifth processing result, and perform normalization processing on the fifth processing result by using a layer normalization model to obtain the output information of the i-th encoder.

In some embodiments, the third processing module is configured to determine, for the input information of the i-th encoder, a first attention weight matrix W for each single head, respectively _t ^Q A second attention weight matrix

And a third attention weight matrix

Determining a corresponding first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t According to said first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t And determining the attention value of each single head, and determining the corresponding multi-head attention value according to the attention values of all the single heads to serve as the first processing result.

In some embodiments, the fourth processing module is configured to send the image to be detected to a cloud server when the defect identification result indicates that the image to be detected belongs to a defect image, so that the cloud server trains a preset second defect identification model by using the image to be detected.

In some embodiments, the fourth processing module is configured to update the weight of the first defect identification model using model weight information sent by the cloud server.

According to a third aspect of embodiments of the present disclosure, there is provided an edge side device including: a memory configured to store instructions; a processor coupled to the memory, the processor configured to perform a method implementing any of the embodiments described above based on instructions stored by the memory.

According to a fourth aspect of embodiments of the present disclosure, there is provided a machine vision defect identification system, including: an edge side device as in any of the above embodiments; and the image acquisition device is configured to acquire an image to be detected and send the image to be detected to the edge side equipment.

In some embodiments, the system further comprises: the cloud server is configured to label the image to be detected after receiving the image to be detected sent by the edge side equipment, store the image to be detected in a training data set, train a preset second defect recognition model by using the training data set, and send current model weight information of the second defect recognition model to the edge side equipment under the condition that the performance of the trained second defect recognition model is greater than a preset performance threshold.

In some embodiments, the cloud server is configured to train a preset second defect recognition model using the training data set if the number of images in the training data set is greater than a preset number threshold.

In some embodiments, the second defect identification model comprises a visual Vision Transformer model.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement the method according to any one of the embodiments.

Other features of the present disclosure and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flow chart diagram of a machine vision defect identification method according to an embodiment of the present disclosure;

2A-2B are schematic images of some embodiments of the present disclosure;

FIG. 3 is a schematic structural diagram of a classification head model according to an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of an encoder according to an embodiment of the present disclosure;

FIG. 5 is a schematic flow chart of a machine vision defect identification method according to another embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an edge-side device according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an edge side device according to another embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a machine vision defect identification system according to an embodiment of the present disclosure;

FIG. 9 is a schematic diagram of a machine vision defect identification system according to another embodiment of the present disclosure;

fig. 10 is a flowchart illustrating a machine vision defect identification method according to yet another embodiment of the disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

The relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as exemplary only and not as limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is a schematic flow chart of a machine vision defect identification method according to an embodiment of the present disclosure. In some embodiments, the following machine vision defect identification method is performed by an edge side device.

In step 101, an image to be detected sent by an image acquisition device is received.

In some embodiments, the image capture device is a camera or other hardware device for capturing images and video, including, for example, industrial cameras in the field of smart manufacturing.

In step 102, the image to be detected is segmented into a predetermined number D of mutually non-overlapping segments.

In some embodiments, D = n ₁ ×n ₂ . For example, the image to be detected shown in fig. 2A is divided into 4 × 4=16 mutually non-overlapping blocks as shown in fig. 2B.

In step 103, block Embedding (Patch Embedding) processing is performed on each block to obtain a corresponding Image Token (Image Token), where the Image Token includes Visual Token information and position Embedding (Positional Embedding) information of the corresponding block.

In step 104, the image token is processed by using a preset first defect identification model to obtain a defect identification result.

In some embodiments, the first defect identification model comprises a Vision Transformer model.

For example, as shown in fig. 3, N encoders and a normalization (Softmax) module are included in a Classification Head (Classification Head) module of the first defect identification model, where N is a natural number greater than 0.

In some embodiments, the input information of the ith encoder is encoded by the ith encoder to obtain the output information of the ith encoder, wherein i is greater than or equal to 1 and less than N, and the image token is the input information of the 1 st encoder. And taking the output information of the ith encoder as the input information of the (i + 1) th encoder. And normalizing the output information of the Nth encoder by using a normalization module to obtain a defect identification result.

In some embodiments, the output information of the N encoders is linearized by a Linear module, and the linearized result is normalized by a normalization module to obtain a defect identification result.

In some embodiments, the structure of the encoder is as shown in fig. 4, and the corresponding encoding flow is as follows:

1) The input information of the ith encoder is subjected to multi-Head Self-Attention processing by using a multi-Head Self-Attention (Muiti-Head Self Attention) model to obtain a first processing result.

In some embodiments, the multi-headed self-attention processing includes the following:

firstly, for the input information of the ith encoder, the first attention weight matrix W of each single head is respectively used _t ^Q A second attention weight matrix

And a third attention weight matrix

Determining a corresponding first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t 。

For example, the corresponding calculation formula is shown in formula (1), where Ft is the input information of the tth encoder.

Secondly, according to the first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t Determining the attention value s for each individual head _t As shown in equation (2).

Wherein the content of the first and second substances,

is a matrix K _t τ is the attention calculation function, and ρ is the Softmax logistic regression function.

Next, a corresponding multi-head attention value is determined from the attention values of all the single heads as a first processing result, as shown in equation (3).

Wherein ε is a Concatenate function,

is a parameter matrix.

2) And fusing the first processing result and the input information of the ith encoder to obtain a second processing result. For example, the merging includes concatenating the first processing result and the input information of the ith encoder.

3) And carrying out Normalization processing on the second processing result by using a Layer Normalization (Layer Normalization) model to obtain a third processing result.

4) And performing multi-layer perception processing on the third processing result by using a multi-layer Perceptron (Multilayer Perceptron) model to obtain a fourth processing result.

5) And fusing the third processing result and the fourth processing result to obtain a fifth processing result. For example, the merging includes stitching the third processing result and the fourth processing result.

6) And carrying out normalization processing on the fifth processing result by using the layer normalization model to obtain the output information of the ith encoder.

In step 105, in the case that the defect identification result indicates that the image to be detected does not belong to the defect image, the defect identification result is sent to the user terminal.

In some embodiments, when the defect identification result indicates that the image to be detected does not belong to the defect image, the defect identification result may be further utilized to perform subsequent statistical analysis and system display.

In the machine vision defect identification method provided by the embodiment of the disclosure, the trained defect identification model is set in the edge side device, so that the edge side device can automatically identify the defect of the image to be detected, thereby effectively reducing the task delay time, improving the calculation efficiency, and avoiding the influence on network resources used by other network applications.

Fig. 5 is a schematic flow chart of a machine vision defect identification method according to another embodiment of the disclosure. In some embodiments, the following machine vision defect identification method is performed by an edge side device.

In step 501, an image to be detected sent by an image acquisition device is received.

In step 502, the image to be detected is segmented into a predetermined number D of mutually non-overlapping segments.

In step 503, a block Embedding (Patch Embedding) process is performed on each block to obtain a corresponding Image Token (Image Token), where the Image Token includes Visual Token information and position Embedding (Positional Embedding) information of the corresponding block.

In step 504, the image token is processed by using a preset first defect identification model to obtain a defect identification result.

For example, the defect recognition result may be obtained using the embodiments shown in fig. 3 and 4 described above.

In step 505, under the condition that the defect identification result indicates that the image to be detected belongs to the defect image, the image to be detected is sent to the cloud server, so that the cloud server trains a preset second defect identification model by using the image to be detected.

In some embodiments, the second defect identification model comprises a Vision Transformer model.

In step 506, the weight of the first defect identification model is updated by using the model weight information sent by the cloud server.

It should be noted that, when the edge device identifies that the image to be detected has a defect, the edge device sends the image to be detected to the cloud server, so that the cloud server trains the second defect identification model arranged on the cloud server side by using the image to be detected. And under the condition that the performance evaluation result of the trained second defect identification model meets the preset condition, the cloud server sends the current model weight information of the second defect identification model to the edge side equipment, so that the edge side equipment updates the weight of the first defect identification model arranged on the edge side equipment by using the current model weight information of the second defect identification model. Therefore, the first defect identification model arranged on the edge side equipment side can be continuously updated, and the defect identification capability of the first defect identification model on the edge side equipment side is continuously improved.

Fig. 6 is a schematic structural diagram of an edge-side device according to an embodiment of the present disclosure. As shown in fig. 6, the edge side device includes a first processing module 61, a second processing module 62, a third processing module 63, and a fourth processing module 64.

The first processing module 61 is configured to receive an image to be detected sent by the image acquisition device.

The second processing module 62 is configured to divide the image to be detected into a predetermined number D of mutually non-overlapping blocks, and perform block embedding processing on each block to obtain a corresponding image token, where the image token includes visual token information and position embedding information of the corresponding block.

The third processing module 63 is configured to process the image token by using a preset first defect identification model to obtain a defect identification result.

In some embodiments, the third processing module 63 performs encoding processing on the input information of the ith encoder by using the ith encoder to obtain the output information of the ith encoder, where i is greater than or equal to 1 and less than N, and the image token is the input information of the 1 st encoder. The third processing module 63 takes the output information of the ith encoder as the input information of the (i + 1) th encoder. The third processing module 63 performs normalization processing on the output information of the nth encoder by using the normalization module to obtain a defect identification result.

1) The third processing module 63 performs multi-Head Self-Attention processing on the input information of the ith encoder by using a multi-Head Self-Attention (Muiti-Head Self Attention) model to obtain a first processing result.

first, the third processing module 63 processes the input information of the ith encoder according to the input information of each encoderFirst attention weight matrix W of single head _t ^Q A second attention weight matrix

And a third attention weight matrix

For example, the corresponding calculation formula is as shown in formula (1).

Next, the third processing module 63 bases on the first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t Determining the attention value s for each individual head _t As shown in equation (2).

Next, the third processing module 63 determines a corresponding multi-head attention value from the attention values of all the single heads as a first processing result, as shown in equation (3).

2) The third processing module 63 fuses the first processing result and the input information of the ith encoder to obtain a second processing result. For example, the merging includes concatenating the first processing result and the input information of the ith encoder.

3) The third processing module 63 performs Normalization processing on the second processing result by using a Layer Normalization (Layer Normalization) model to obtain a third processing result.

4) The third processing module 63 performs multi-layer perceptual processing on the third processing result using a multi-layer Perceptron (Multilayer Perceptron) model to obtain a fourth processing result.

5) The third processing module 63 fuses the third processing result and the fourth processing result to obtain a fifth processing result. For example, the merging includes concatenating the third processing result and the fourth processing result.

6) The third processing module 63 performs normalization processing on the fifth processing result by using the layer normalization model to obtain output information of the ith encoder.

The fourth processing module 64 is configured to send the defect recognition result to the user terminal if the defect recognition result indicates that the image to be detected does not belong to the defect image.

In some embodiments, in the case that the defect recognition result indicates that the image to be detected does not belong to the defect image, the fourth processing module 64 may further perform subsequent statistical analysis and system display by using the defect recognition result.

In some embodiments, the fourth processing module 64 is configured to, in a case that the defect identification result indicates that the image to be detected belongs to the defect image, send the image to be detected to the cloud server, so that the cloud server trains the preset second defect identification model by using the image to be detected.

In some embodiments, the fourth processing module 64 updates the weight of the first defect identification model by using the model weight information sent by the cloud server.

Fig. 7 is a schematic structural diagram of an edge side device according to another embodiment of the present disclosure. As shown in fig. 7, the edge side device includes a memory 71 and a processor 72.

The memory 71 is used for storing instructions, the processor 72 is coupled to the memory 71, and the processor 72 is configured to execute the method according to any one of the embodiments in fig. 1 and 5 based on the instructions stored in the memory.

As shown in fig. 7, the edge device further includes a communication interface 73 for information interaction with other devices. Meanwhile, the edge side device further includes a bus 74, and the processor 72, the communication interface 73, and the memory 71 complete mutual communication through the bus 74.

The memory 71 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 71 may also be a memory array. The storage 71 may also be partitioned and the blocks may be combined into virtual volumes according to certain rules.

Further, the processor 72 may be a central processing unit CPU, or may be an application specific integrated circuit ASIC, or one or more integrated circuits configured to implement embodiments of the present disclosure.

The present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the instructions, when executed by a processor, implement the method according to any one of the embodiments in fig. 1 and 5.

Fig. 8 is a schematic structural diagram of a machine vision defect identification system according to an embodiment of the present disclosure. As shown in fig. 8, the machine vision defect recognition system includes an image pickup device 81 and an edge side apparatus 82. The edge device 82 is the edge device according to any one of the embodiments of fig. 6 or fig. 7.

The image pickup device 81 is configured to pick up an image to be picked up and send the image to be picked up to the edge side apparatus 82.

In some embodiments, the image capturing device 81 is a camera or other hardware device for capturing images and videos, such as an industrial camera including those used in the field of smart manufacturing.

In the machine vision defect identification system provided by the embodiment of the disclosure, the trained defect identification model is set in the edge side device, so that the edge side device can automatically identify the defect of the image to be detected, thereby effectively reducing the task delay time, improving the calculation efficiency, and avoiding the influence on network resources used by other network applications.

Fig. 9 is a schematic structural diagram of a machine vision defect identification system according to another embodiment of the disclosure. Fig. 9 differs from fig. 8 in that, in the embodiment shown in fig. 9, the machine vision defect identification system further includes a cloud server 83.

The cloud server 83 is configured to, after receiving the image to be detected sent by the edge side device 82, perform image annotation on the image to be detected, store the image to be detected in the training data set, and train the preset second defect recognition model by using the training data set.

When the performance of the trained second defect identification model is greater than the preset performance threshold, the cloud server 83 sends the current model weight information of the second defect identification model to the edge-side device 82, so that the edge-side device 82 updates the weight of the first defect identification model locally arranged on the edge-side device 82.

For example, if the defect recognition rate of the trained second defect recognition model is higher than the original defect recognition rate, the cloud server 83 transmits the current model weight information of the second defect recognition model to the edge side device 82.

In some embodiments, the cloud server 83 trains a preset second defect recognition model with the training data set if the number of images in the training data set is greater than a preset number threshold.

It should be noted that, in the case that the number of images in the training data set is greater than the preset number threshold, the cloud server 83 can train the second defect identification model using enough images, so that the training effect of the second defect identification model can be improved.

In step 1001, the user terminal sends a service invocation request to the cloud server.

In step 1002, the cloud server verifies the authority of the user terminal.

In step 1003, after the authority of the user terminal passes the verification, the cloud server sends the service call request to the edge device.

In step 1004, the edge side device sends a service invocation request to the image capture device.

In step 1005, the image acquisition device acquires an image to be detected according to the service call request.

In step 1006, the image capture device sends the image to be detected to the edge side device.

In step 1007, the edge device processes the image to be detected by using the first defect identification model set locally to obtain a defect identification result.

In step 1008, the defect recognition result is sent to the user terminal when the defect recognition result indicates that the image to be detected does not belong to the defect image.

In step 1009, the image to be detected is sent to the cloud server when the defect identification result indicates that the image to be detected belongs to the defect image.

In step 1010, the cloud server performs image annotation on the image to be detected, and stores the image to be detected in the training data set. And under the condition that the number of the images in the training data set is larger than a preset number threshold, the cloud server trains a second defect recognition model arranged locally by using the training data set.

In step 1011, the cloud server sends the current model weight information of the second defect identification model to the edge side device when the performance of the trained second defect identification model is greater than the preset performance threshold.

In step 1012, the edge device updates the weight of the first defect identification model set locally using the model weight information sent by the cloud server.

By implementing the above embodiments of the present disclosure, the following beneficial effects can be obtained:

1) The task time delay can be effectively shortened and the calculation efficiency can be improved based on the joint design of the image acquisition device, the edge side equipment and the cloud server, and the machine vision defect identification task cannot be influenced.

2) The method has low occupancy rate to the industrial network, does not influence network resources of other industrial applications, and has good real-time performance of the system.

3) According to the method, through cooperative work of the edge side equipment and the cloud server, the performance of the server deployed on an industrial site cannot influence the identification performance of the whole system.

4) According to the method and the system, the machine vision defect recognition model on the cloud server side is retrained and the performance is updated, and the updated model weight is fed back to the machine vision defect recognition model on the edge side equipment side, so that the machine vision defect recognition model in the system can be continuously updated.

In some embodiments, the functional units described above can be implemented as general purpose processors, programmable Logic Controllers (PLCs), digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable Logic devices, discrete Gate or transistor Logic devices, discrete hardware components, or any suitable combination thereof for performing the functions described in this disclosure.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The description of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the disclosure in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the disclosure and the practical application, and to enable others of ordinary skill in the art to understand the disclosure for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A machine vision defect identification method, performed by an edge-side device, comprising:

receiving an image to be detected sent by an image acquisition device;

dividing the image to be detected into a preset number D of mutually non-overlapping blocks;

carrying out block embedding processing on each block to obtain a corresponding image token, wherein the image token comprises visual token information and position embedding information of the corresponding block;

processing the image token by using a preset first defect identification model to obtain a defect identification result;

and sending the defect identification result to a user terminal under the condition that the defect identification result shows that the image to be detected does not belong to the defect image.

2. The method of claim 1, wherein the first defect identification model comprises N encoders and a normalization module, wherein N is a natural number greater than 0;

processing the image token using a first defect identification model comprises:

encoding input information of an ith encoder by using the ith encoder to obtain output information of the ith encoder, wherein i is more than or equal to 1 and is less than N, and the image token is input information of the 1 st encoder;

taking the output information of the ith encoder as the input information of the (i + 1) th encoder;

and normalizing the output information of the Nth encoder by using the normalization module to obtain the defect identification result.

3. The method of claim 2, wherein the encoding the input information of the ith encoder by the ith encoder comprises:

performing multi-head self-attention processing on the input information of the ith encoder by using a multi-head self-attention model to obtain a first processing result;

fusing the first processing result and the input information of the ith encoder to obtain a second processing result;

carrying out normalization processing on the second processing result by using a layer normalization model to obtain a third processing result;

performing multilayer perception processing on the third processing result by using a multilayer perceptron model to obtain a fourth processing result;

fusing the third processing result and the fourth processing result to obtain a fifth processing result;

and carrying out normalization processing on the fifth processing result by using a layer normalization model to obtain output information of the ith encoder.

4. The method of claim 3, wherein the multi-headed self-attention processing of the input information of the i-th encoder using a multi-headed self-attention model comprises:

aiming at the input information of the ith encoder, respectively according to a first attention weight matrix W of each single head _t ^Q A second attention weight matrix

And a third attention weight matrix

Determining a corresponding first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t ；

According to the first vector matrix Q _t Second vector matrix K _t And a third vector matrix V _t Determining an attention value for each of the single heads;

and determining corresponding multi-head attention values according to the attention values of all the single heads to serve as the first processing result.

5. The method of claim 1, wherein,

the first defect identification model comprises a visual Vision Transformer model.

6. The method of any of claims 1-5, further comprising:

and sending the image to be detected to a cloud server under the condition that the defect identification result shows that the image to be detected belongs to a defect image, so that the cloud server can train a preset second defect identification model by using the image to be detected.

7. The method of claim 6, further comprising:

and updating the weight of the first defect identification model by using the model weight information sent by the cloud server.

8. An edge side device comprising:

the first processing module is configured to receive an image to be detected sent by the image acquisition device;

the second processing module is configured to divide the image to be detected into a preset number D of mutually non-overlapping blocks, and perform block embedding processing on each block to obtain a corresponding image token, wherein the image token comprises visual token information and position embedding information of the corresponding block;

the third processing module is configured to process the image token by using a preset first defect identification model to obtain a defect identification result;

and the fourth processing module is configured to send the defect identification result to the user terminal under the condition that the defect identification result indicates that the image to be detected does not belong to the defect image.

9. The apparatus of claim 8, wherein the first defect identification model comprises N encoders and a normalization module, wherein N is a natural number greater than 0;

the third processing module is configured to perform encoding processing on input information of an ith encoder by using the ith encoder to obtain output information of the ith encoder, wherein i is more than or equal to 1 and less than N, the image token is input information of the 1 st encoder, the output information of the ith encoder is used as input information of an (i + 1) th encoder, and the normalization module is used for performing normalization processing on the output information of the Nth encoder to obtain the defect identification result.

10. The apparatus of claim 9, wherein,

the third processing module is configured to perform multi-head self-attention processing on input information of the ith encoder by using a multi-head self-attention model to obtain a first processing result, fuse the first processing result with the input information of the ith encoder to obtain a second processing result, perform normalization processing on the second processing result by using a layer normalization model to obtain a third processing result, perform multi-layer perception processing on the third processing result by using a multi-layer perceptron model to obtain a fourth processing result, fuse the third processing result with the fourth processing result to obtain a fifth processing result, and perform normalization processing on the fifth processing result by using the layer normalization model to obtain output information of the ith encoder.

11. The apparatus of claim 10, wherein,

a third processing module is configured to determine, for the input information of the ith encoder, a first attention weight matrix W for each single head, respectively _t ^Q A second attention weight matrix

And a third attention weight matrix

12. The apparatus of claim 8, wherein,

13. The apparatus of any one of claims 8-12,

the fourth processing module is configured to send the image to be detected to a cloud server under the condition that the defect identification result indicates that the image to be detected belongs to a defect image, so that the cloud server can train a preset second defect identification model by using the image to be detected.

14. The apparatus of claim 13, wherein,

the fourth processing module is configured to update the weight of the first defect identification model by using the model weight information sent by the cloud server.

15. An edge side device comprising:

a memory configured to store instructions;

a processor coupled to the memory, the processor configured to perform implementing the method of any of claims 1-7 based on instructions stored by the memory.

16. A machine vision defect identification system, comprising:

the edge side device of any one of claims 8-15;

and the image acquisition device is configured to acquire an image to be detected and send the image to be detected to the edge side equipment.

17. The system of claim 16, further comprising:

the cloud server is configured to label the image to be detected after receiving the image to be detected sent by the edge side equipment, store the image to be detected in a training data set, train a preset second defect recognition model by using the training data set, and send current model weight information of the second defect recognition model to the edge side equipment under the condition that the performance of the trained second defect recognition model is greater than a preset performance threshold.

18. The system of claim 17, wherein,

the cloud server is configured to train a preset second defect recognition model by using the training data set when the number of images in the training data set is greater than a preset number threshold.

19. The system of claim 17, wherein,

the second defect identification model comprises a visual Vision Transformer model.

20. A non-transitory computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions which, when executed by a processor, implement the method of any one of claims 1-7.