WO2024060917A1 - 缺陷识别方法、装置和系统 - Google Patents

缺陷识别方法、装置和系统 Download PDF

Info

Publication number
WO2024060917A1
WO2024060917A1 PCT/CN2023/114426 CN2023114426W WO2024060917A1 WO 2024060917 A1 WO2024060917 A1 WO 2024060917A1 CN 2023114426 W CN2023114426 W CN 2023114426W WO 2024060917 A1 WO2024060917 A1 WO 2024060917A1
Authority
WO
WIPO (PCT)
Prior art keywords
processing result
model
vector matrix
attention
encoding
Prior art date
Application number
PCT/CN2023/114426
Other languages
English (en)
French (fr)
Inventor
张园
韩韬
梁伟
杨明川
Original Assignee
中国电信股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国电信股份有限公司 filed Critical 中国电信股份有限公司
Publication of WO2024060917A1 publication Critical patent/WO2024060917A1/zh

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N21/00Investigating or analysing materials by the use of optical means, i.e. using sub-millimetre waves, infrared, visible or ultraviolet light
    • G01N21/84Systems specially adapted for particular applications
    • G01N21/88Investigating the presence of flaws or contamination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure relates to the field of machine vision technology, and in particular to a defect identification method, device and system.
  • Machine vision systems for smart industries refer to systems that use machines or computers to replace human vision for detection, classification, measurement or judgment.
  • Machine vision is used in visual inspection, visual positioning and other fields, and is widely used in the industrial Internet.
  • the machine vision system converts machine vision products into image signals and transmits them to a dedicated image processing system.
  • the image system performs various calculations on these data and signals to extract the characteristics of the target, and then controls the equipment actions at the industrial site based on the discrimination results. .
  • Machine vision systems based on intelligent manufacturing have very important value in the fields of industrial defect visual inspection, visual classification, industrial dimensional measurement and other fields.
  • the characteristic of machine vision system is to improve the intelligence and automation of production. For example, in some dangerous working environments that are not suitable for manual work or where artificial vision cannot meet the requirements, machine vision can be used to replace artificial vision. For another example, in large-scale industrial production processes, the use of defect recognition methods based on machine vision can greatly improve production efficiency and the degree of automation of production. In addition, machine vision is easy to realize information integration and is the basic technology for realizing computer integrated manufacturing.
  • a defect identification method is provided, which is executed by an edge-side device, including: receiving an image to be detected sent by an image acquisition device; extracting a feature map of the image to be detected using an image feature extraction model; Flatten the feature map to obtain multiple visual tokens; use the encoder in the first defect recognition model to process the multiple visual tokens to obtain multiple encoding results; use the third defect recognition model to process the multiple visual tokens.
  • a decoder in a defect recognition model processes the multiple coding results to obtain multiple decoding results; and uses the head model in the first defect recognition model to process the multiple decoding results, To obtain the defect identification result; when the defect identification result shows that the image to be detected does not belong to the defect image, the defect is The recognition results are sent to the user terminal.
  • using an encoder to process the plurality of visual tokens includes: using a normalization model to normalize the i-th visual token to obtain a first encoding processing result, 1 ⁇ i ⁇ N, N is the total number of visual tokens; use a multi-head self-attention model to perform multi-head self-attention processing on the first encoding processing result and the corresponding position encoding information to obtain the second encoding processing result;
  • the encoding processing result is fused with the i-th visual token to obtain a third encoding processing result;
  • the third encoding processing result is normalized using a normalization model to obtain a fourth encoding processing result;
  • a multi-layer perceptron model is used to perform multi-layer perceptual processing on the fourth encoding processing result to obtain a fifth encoding processing result;
  • the fifth encoding processing result and the fourth encoding processing result are fused to obtain the fifth encoding processing result.
  • the encoding result of the i-th visual token uses a multi
  • using a multi-head self-attention model to perform multi-head self-attention processing on the first encoding processing result and corresponding position encoding information includes: for the first encoding processing result, based on each single head's First attention weight matrix Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t ; respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain The updated first vector matrix Q t and the updated second vector matrix K t ; determined according to the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t The attention value of each single head; determine the corresponding multi-head attention value according to the attention values of all single heads as the second encoding processing result.
  • using a decoder to process the multiple encoding results includes: using a normalization model to normalize the preset object query information to obtain the first decoding processing result; using a multi-head The self-attention model performs multi-head self-attention processing on the first decoding processing result and the corresponding position encoding information to obtain the second decoding processing result; the second decoding processing result and the object query information are combined Fusion is performed to obtain the third decoding processing result; the third decoding processing result is normalized using a normalization model to obtain the fourth decoding processing result; the multi-head self-attention model is used to normalize the third decoding processing result.
  • the fourth decoding processing result, the jth coding result and the corresponding position coding information are subjected to multi-head self-attention processing to obtain the fifth decoding processing result, 1 ⁇ j ⁇ N, N is the total number of coding results;
  • the fifth decoding processing result and the third decoding processing result are fused to obtain the sixth decoding processing result;
  • the sixth decoding processing result is normalized using a normalization model to obtain the seventh decoding processing result.
  • the coding results are fused to obtain the decoding result of the jth coding result.
  • a multi-head self-attention model is used to compare the first decoding processing result and the corresponding position.
  • Performing multi-head self-attention processing on the encoding information includes: for the first decoding processing result, based on the first attention weight matrix of each single head respectively Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t ; respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain The updated first vector matrix Q t and the updated second vector matrix K t ; determined according to the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t The attention value of each single head; determine the corresponding multi-head attention value according to the attention values of all single heads as the second decoding processing result.
  • using a multi-head self-attention model to perform multi-head self-attention processing on the fourth decoding processing result, the jth coding result and the corresponding position coding information includes: targeting the fourth decoding processing result , respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t ; respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain The updated first vector matrix Q t and the updated second vector matrix K t ; add the third vector matrix V t to the j-th encoding result to obtain the updated third vector matrix V t ; according to the The updated first vector matrix Q t , the updated second vector matrix K t and the updated third vector matrix V t determine the attention value of each single head; based on the attention values of all single heads The corresponding multi-head attention value is determined as the fifth decoding processing result.
  • using the head model in the first defect identification model to process the multiple decoding results includes: using the first fully connected network model in the head model to process the multiple decoding results.
  • the results are processed to calculate the category to which the target belongs; and the plurality of decoding results are processed using the second fully connected network model in the head model to calculate the location information of the target.
  • the first defect identification model includes a Vision Transformer model.
  • the image to be detected is sent to a cloud server so that the cloud server trains a preset second defect recognition model using the image to be detected.
  • the weight of the first defect identification model is updated using the model weight information sent by the cloud server.
  • an edge-side device including: a first processing module configured to receive an image to be detected sent by an image acquisition device; a second processing module using an image feature extraction model to extract the The feature map of the image to be detected is flattened to obtain multiple visual tokens; the third processing module is configured to use the encoder in the first defect recognition model to perform processing on the multiple visual tokens.
  • a fourth processing module configured to use the decoder in the first defect identification model to process the multiple encoding results to obtain multiple decoding results
  • fifth A processing module configured to use the head model in the first defect identification model to process the plurality of decoding results to obtain a defect identification result
  • a sixth processing module configured to use the head model in the first defect identification model to obtain a defect identification result. If the image to be detected does not belong to a defective image, the defect identification result is sent to the user terminal.
  • the third processing module is configured to use the normalization model to normalize the i-th visual token to obtain the first encoding processing result, 1 ⁇ i ⁇ N, N is the visual token
  • the multi-head self-attention model is used to perform multi-head self-attention processing on the first encoding processing result and the corresponding position encoding information to obtain the second encoding processing result
  • the second encoding processing result and the i-th The visual tokens are fused to obtain a third encoding processing result
  • a normalization model is used to normalize the third encoding processing result to obtain a fourth encoding processing result
  • a multi-layer perceptron model is used to
  • the fourth encoding processing result is subjected to multi-layer perception processing to obtain a fifth encoding processing result
  • the fifth encoding processing result and the fourth encoding processing result are fused to obtain the i-th visual token. Encoding results.
  • the third processing module is configured to: for the first encoding processing result, respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain
  • the updated first vector matrix Q t and the updated second vector matrix K t are determined based on the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t
  • the attention value of each single head determines the corresponding multi-head attention value based on the attention values of all single heads as the second encoding processing result.
  • the fourth processing module is configured to use a normalization model to normalize the preset object query information to obtain the first decoding processing result, and use a multi-head self-attention model to perform normalization processing on the first decoding processing result.
  • the first decoding processing result and the corresponding position coding information are subjected to multi-head self-attention processing to obtain the second decoding processing result, and the second decoding processing result and the object query information are fused to obtain the third decoding processing result.
  • the decoding processing result, the third decoding processing result is normalized using a normalization model to obtain the fourth decoding processing result, and the fourth decoding processing result and the third decoding processing result are obtained using a multi-head self-attention model.
  • j coding results and corresponding position coding information are subjected to multi-head self-attention processing to obtain the fifth decoding processing result, 1 ⁇ j ⁇ N, N is the total number of coding results, and the fifth decoding processing result and the
  • the third decoding processing result is fused to obtain the sixth decoding processing result, and the sixth decoding processing result is normalized using a normalization model to obtain the seventh decoding processing result, using multiple layers
  • the perceptron model performs multi-layer perceptual processing on the seventh decoding processing result to obtain the eighth decoding result.
  • the eighth decoding processing result and the seventh decoding result are fused to obtain the decoding result of the jth encoding result.
  • the fourth processing module is configured to: for the first decoding processing result, respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain
  • the updated first vector matrix Q t and the updated second vector matrix K t are determined based on the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t
  • the attention value of each single head determines the corresponding multi-head attention value based on the attention values of all single heads as the second decoding processing result.
  • the fourth processing module is configured to, for the fourth decoding processing result, respectively calculate the first attention weight matrix of each single head according to Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain an updated first vector matrix Q t and an updated second vector matrix K t , add the third vector matrix V t to the j-th coding result to obtain an updated third vector matrix V t , determine the attention value of each single head according to the updated first vector matrix Q t , the updated second vector matrix K t and the updated third vector matrix V t , and determine the corresponding multi-head attention value according to the attention values of all single heads as the fifth decoding processing result.
  • the fifth processing module is configured to use the first fully connected network model in the head model to process the multiple decoding results to calculate the category to which the target belongs.
  • the second fully connected network model processes the multiple decoding results to calculate target location information.
  • the first defect identification model includes a Vision Transformer model.
  • the sixth processing module is configured to, when the defect identification result indicates that the image to be detected belongs to a defective image, send the image to be detected to the cloud server, so that the cloud server utilizes the The images to be detected are used to train the preset second defect recognition model.
  • the sixth processing module is configured to update the weight of the first defect identification model using the model weight information sent by the cloud server.
  • an edge-side device including: a memory configured to store instructions; a processor coupled to the memory, and the processor is configured to execute any one of the above based on instructions stored in the memory methods described in the examples.
  • a defect identification system including: any of the above implementations
  • the edge-side device described in the example the image acquisition device is configured to collect an image to be detected and send the image to be detected to the edge-side device.
  • the system further includes: a cloud server configured to, after receiving the image to be detected sent by the edge-side device, perform image annotation on the image to be detected, and store the image to be detected into training data. set, and use the training data set to train the preset second defect recognition model.
  • the second defect recognition model is The current model weight information is sent to the edge side device.
  • the cloud server is configured to use the training data set to train a preset second defect recognition model when the number of images in the training data set is greater than a preset number threshold.
  • the second defect identification model includes a Vision Transformer model.
  • a non-transitory computer-readable storage medium stores computer instructions, and when the instructions are executed by a processor, the methods described in any of the above embodiments are implemented. method.
  • a computer program product including computer instructions, wherein when the computer instructions are executed by a processor, the method as described in any of the above embodiments is implemented.
  • FIG1 is a schematic diagram of a process of a defect recognition method based on machine vision according to an embodiment of the present disclosure
  • FIG2 is a schematic diagram of the structure of an encoder according to an embodiment of the present disclosure.
  • Figure 3 is a schematic structural diagram of a decoder according to an embodiment of the present disclosure.
  • Figure 4 is a schematic structural diagram of a head model according to an embodiment of the present disclosure.
  • Figure 5 is a schematic flowchart of a machine vision-based defect identification method according to another embodiment of the present disclosure.
  • Figure 6 is a schematic structural diagram of an edge-side device according to an embodiment of the present disclosure.
  • Figure 7 is a schematic structural diagram of an edge-side device according to another embodiment of the present disclosure.
  • Figure 8 is a schematic structural diagram of a machine vision-based defect identification system according to an embodiment of the present disclosure.
  • Figure 9 is a schematic structural diagram of a machine vision-based defect identification system according to another embodiment of the present disclosure.
  • Figure 10 is a schematic flowchart of a machine vision-based defect identification method according to another embodiment of the present disclosure.
  • any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
  • edge-side devices and cloud servers need to work together to achieve defect identification. This collaborative work will lead to long task delays and low computing efficiency.
  • edge-side devices and cloud servers The interaction between them will occupy more network resources and affect the network resources used by other network applications.
  • the present disclosure provides a defect identification solution based on machine vision, which can effectively reduce long task delays, improve computing efficiency, and avoid affecting network resources used by other network applications.
  • Figure 1 is a schematic flowchart of a machine vision-based defect identification method according to an embodiment of the present disclosure.
  • the following machine vision-based defect identification method is performed by an edge-side device.
  • step 101 the image to be detected sent by the image acquisition device is received.
  • the image acquisition device may include a 2D camera, a point cloud camera, an IoT camera, or other Hardware equipment used to acquire images and videos, such as industrial cameras in the field of smart manufacturing.
  • step 102 the image feature extraction model is used to extract the feature map of the image to be detected.
  • the image feature extraction (Image Feature Extraction) model includes an image feature extraction model designed using a residual network structure.
  • step 103 the feature map is flattened to obtain multiple visual tokens.
  • the feature map has dimensions of H ⁇ W ⁇ C.
  • H ⁇ W visual tokens are obtained.
  • step 104 the encoder in the first defect recognition model is used to process multiple visual tokens to obtain multiple encoding results.
  • the first defect recognition model is a Vision Transformer model.
  • the encoder in the first defect recognition model is shown in Figure 2, and the corresponding encoding process is as follows:
  • Multi-head Self Attention Multi-head Self Attention
  • the processing of the multi-head self-attention model 22 includes the following:
  • the first vector matrix Q t and the second vector matrix K t are respectively added to the corresponding positional encoding (Positional Encoding) information to obtain the updated first vector matrix Q t and the updated second vector matrix K t .
  • the attention value s t of each single head is determined according to the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t , as shown in formula (2).
  • the corresponding multi-head attention value is determined based on the attention values of all single heads as the second encoding processing result, as shown in formula (3).
  • is the Concatenate function
  • is the parameter matrix
  • step 105 the decoder in the first defect identification model is used to process multiple encoding results to obtain multiple decoding results.
  • the decoder in the first defect identification model is shown in Figure 3, and the corresponding decoding process is as follows:
  • the processing of the multi-head self-attention model 32 includes the following:
  • Second attention weight matrix and the third attention weight matrix The corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t are determined.
  • the first vector matrix Q t and the second vector matrix K t are respectively added to the corresponding position coding information to obtain the updated first vector matrix Q t and the updated second vector matrix K t .
  • the attention value of each single head is determined according to the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t .
  • the calculation formula of the attention value is as shown in the above formula (2).
  • the corresponding multi-head attention value is determined according to the attention value of all single heads as the second decoding process result.
  • the processing of the multi-head self-attention model 34 includes the following:
  • Second attention weight matrix and the third attention weight matrix The corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t are determined.
  • the first vector matrix Q t and the second vector matrix K t are respectively added to the corresponding position coding information to obtain the updated first vector matrix Q t and the updated second vector matrix K t .
  • the third vector matrix V t is added to the j-th encoding result to obtain an updated third vector matrix V t .
  • the attention value of each single head is determined based on the updated first vector matrix Qt , the updated second vector matrix Kt , and the updated third vector matrix Vt .
  • the corresponding multi-head attention value is determined based on the attention values of all single heads as the fifth decoding processing result.
  • step 106 use the heads model in the first defect identification model to process the multiple decoding results. processing to obtain defect identification results.
  • the head model includes a first fully connected (Fully Connected) network model 41 and a second fully connected network model 42.
  • the first fully connected network model 41 is used to process multiple decoding results to calculate the category to which the target belongs.
  • the second fully connected network model 42 is used to process multiple decoding results to calculate target location information.
  • step 107 if the defect identification result shows that the image to be detected does not belong to a defective image, the defect identification result is sent to the user terminal.
  • a trained defect identification model is set up on the edge-side device so that the edge-side device can perform defect identification on the image to be detected by itself, thereby effectively reducing the task delay. Longer to improve computing efficiency and avoid affecting network resources used by other network applications.
  • FIG. 5 is a schematic flowchart of a machine vision-based defect identification method according to another embodiment of the present disclosure.
  • the following machine vision-based defect identification method is performed by an edge-side device.
  • step 501 the image to be detected sent by the image acquisition device is received.
  • the image acquisition device may include a 2D camera, a point cloud camera, an Internet of Things camera, or other hardware devices used to acquire images and videos, such as industrial cameras in the field of smart manufacturing.
  • step 502 the image feature extraction model is used to extract the feature map of the image to be detected.
  • the image feature extraction (Image Feature Extraction) model includes an image feature extraction model designed using a residual network structure.
  • step 503 the feature map is flattened to obtain multiple visual tokens (Visual Token).
  • the feature map has dimensions of H ⁇ W ⁇ C.
  • H ⁇ W visual tokens are obtained.
  • step 504 the encoder in the first defect recognition model is used to process multiple visual tokens to obtain multiple encoding results.
  • the first defect recognition model is a Vision Transformer model.
  • the encoder in the first defect identification model is as shown in Figure 2
  • step 505 the decoder in the first defect identification model is used to process multiple encoding results to obtain multiple decoding results.
  • the decoder in the first defect identification model is shown in Figure 3
  • step 506 multiple decoding results are processed using the head model in the first defect identification model to obtain a defect identification result.
  • the head model is as shown in Figure 4.
  • step 507 if the defect recognition result shows that the image to be detected belongs to a defective image, the image to be detected is sent to the cloud server so that the cloud server uses the image to be detected to train the preset second defect recognition model.
  • the second defect recognition model is a Vision Transformer model.
  • step 508 the weight of the first defect recognition model is updated using the model weight information sent by the cloud server.
  • the edge-side device when the edge-side device identifies that the image to be detected has a defect, the edge-side device sends the image to be detected to the cloud server, so that the cloud server uses the image to be detected to pair the second image sensor set on the cloud server side.
  • Defect recognition model is trained.
  • the cloud server sends the current model weight information of the second defect recognition model to the edge device so that the edge device can utilize the second defect recognition model.
  • the current model weight information updates the weight of the first defect identification model set on the edge side device side. In this way, it is possible to continuously update the first defect identification model provided on the edge side device side, thereby continuously improving the defect identification capability of the first defect identification model on the edge side device side.
  • FIG. 6 is a schematic structural diagram of an edge-side device according to an embodiment of the present disclosure.
  • the edge side device includes a first processing module 61 , a second processing module 62 , a third processing module 63 , a fourth processing module 64 , a fifth processing module 65 and a sixth processing module 66 .
  • the first processing module 61 is configured to receive the image to be detected sent by the image acquisition device.
  • the image acquisition device may include a 2D camera, a point cloud camera, an IoT camera, or other hardware devices used to acquire images and videos, such as industrial cameras in the field of smart manufacturing.
  • the second processing module 62 is configured to use an image feature extraction model to extract a feature map of the image to be detected, and flatten the feature map to obtain multiple visual tokens.
  • the image feature extraction model includes an image feature extraction model designed using a residual network structure.
  • the feature map is of H ⁇ W ⁇ C dimensions, and H ⁇ W visual tokens are obtained by flattening the feature map.
  • the third processing module 63 is configured to use the encoder in the first defect recognition model to process the plurality of visual tokens.
  • the lines are processed to obtain multiple encoding results.
  • the first defect recognition model is a Vision Transformer model.
  • the encoder in the first defect identification model is as shown in Figure 2
  • the third processing module 63 is configured to use the normalization model to normalize the i-th visual token to obtain the first encoding processing result, 1 ⁇ i ⁇ N, N is the visual token
  • the multi-head self-attention model is used to perform multi-head self-attention processing on the first encoding processing result and the corresponding position encoding information to obtain the second encoding processing result
  • the second encoding processing result and the i-th visual token are processed Fusion to obtain the third encoding processing result
  • using the multi-layer perceptron model to perform multi-layer processing on the fourth encoding processing result Perceptual processing is performed to obtain the fifth encoding processing result
  • the fifth encoding processing result and the fourth encoding processing result are fused to obtain the encoding result of the i-th visual token.
  • the third processing module 63 is configured to, for the first encoding processing result, respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain
  • the updated first vector matrix Q t and the updated second vector matrix K t are determined according to the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t
  • the attention value of a single head is determined based on the attention values of all single heads, and the corresponding multi-head attention value is used as the second encoding processing result.
  • the fourth processing module 64 is configured to use the decoder in the first defect identification model to process multiple encoding results to obtain multiple decoding results.
  • the decoder in the first defect identification model is as shown in Figure 3.
  • the fourth processing module 64 is configured to use a normalization model to normalize the preset object query information to obtain the first decoding processing result, and use a multi-head self-attention model to perform normalization processing on the first decoding processing result.
  • the decoding processing result and the corresponding position coding information are subjected to multi-head self-attention processing to obtain the second decoding processing result, and the second decoding processing result and the object query information are fused to obtain the third decoding processing result, using
  • the normalization model normalizes the third decoding processing result to obtain the fourth decoding processing result, and uses the multi-head self-attention model to encode the fourth decoding processing result, the jth encoding result and the corresponding position
  • the information is processed by multi-head self-attention to obtain the fifth decoding processing result, 1 ⁇ j ⁇ N, N is the total number of encoding results, and the fifth decoding processing result and the third decoding processing result are fused to obtain the sixth decoding processing result.
  • the decoding processing results are normalized using the normalization model to obtain the seventh decoding processing results.
  • the multi-layer perceptron model is used to normalize the seventh decoding processing results.
  • the decoding processing result is subjected to multi-layer perceptual processing to obtain the eighth decoding processing result, and the eighth decoding processing result and the seventh decoding result are fused to obtain the decoding result of the jth encoding result.
  • the fourth processing module 64 is configured to: for the first decoding processing result, respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain
  • the updated first vector matrix Q t and the updated second vector matrix K t are determined according to the updated first vector matrix Q t , the updated second vector matrix K t and the third vector matrix V t
  • the attention value of a single head is determined based on the attention values of all single heads, and the corresponding multi-head attention value is used as the second decoding processing result.
  • the fourth processing module 64 is configured to: for the fourth decoding processing result, respectively according to the first attention weight matrix of each single head Second attention weight matrix and the third attention weight matrix Determine the corresponding first vector matrix Q t , second vector matrix K t and third vector matrix V t , respectively add the first vector matrix Q t and the second vector matrix K t to the corresponding position coding information to obtain The updated first vector matrix Q t and the updated second vector matrix K t are added to the third vector matrix V t and the j-th encoding result to obtain the updated third vector matrix V t , according to the updated The updated first vector matrix Q t , the updated second vector matrix K t and the updated third vector matrix V t determine the attention value of each single head, and the corresponding long head is determined based on the attention values of all single heads. The attention value is used as the fifth decoding processing result.
  • the fifth processing module 65 is configured to process multiple decoding results using the head model in the first defect identification model to obtain a defect identification result.
  • the head model is as shown in Figure 4.
  • the fifth processing module 65 is configured to use the first fully connected network model in the head model to process the multiple decoding results to calculate the category to which the target belongs, and use the second fully connected network in the head model The model processes multiple decoding results to calculate target location information.
  • the sixth processing module 66 is configured to send the defect identification result to the user terminal if the defect identification result indicates that the image to be detected does not belong to the defective image.
  • the sixth processing module 66 is configured to send the image to be detected to the cloud server when the defect recognition result indicates that the image to be detected belongs to a defective image, so that the cloud server uses the image to be detected to compare the preset third image. Two defect recognition models are trained.
  • the sixth processing module 66 is configured to update the weight of the first defect identification model using the model weight information sent by the cloud server.
  • FIG7 is a schematic diagram of the structure of an edge device according to another embodiment of the present disclosure.
  • the edge device includes a memory 71 and a processor 72 .
  • the memory 71 is used to store instructions, and the processor 72 is coupled to the memory 71 .
  • the processor 72 is configured to execute the method involved in any of the embodiments in FIGS. 1 and 5 based on the instructions stored in the memory.
  • the edge side device also includes a communication interface 73 for information exchange with other devices.
  • the edge-side device also includes a bus 74 , through which the processor 72 , the communication interface 73 , and the memory 71 complete communication with each other.
  • the memory 71 may include high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the memory 71 may also be a memory array.
  • the memory 71 may also be divided into blocks, and the blocks may be combined into virtual volumes according to certain rules.
  • processor 72 may be a central processing unit (CPU), or may be an application specific integrated circuit (ASIC), or one or more integrated circuits configured to implement embodiments of the present disclosure.
  • CPU central processing unit
  • ASIC application specific integrated circuit
  • the present disclosure also relates to a computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions.
  • the instructions When the instructions are executed by a processor, the method involved in any of the embodiments in Figures 1 and 5 is implemented.
  • FIG. 8 is a schematic structural diagram of a machine vision-based defect identification system according to an embodiment of the present disclosure.
  • the machine vision-based defect identification system includes an image acquisition device 81 and an edge-side device 82 .
  • the edge-side device 82 is the edge-side device related to any embodiment in FIG. 6 or FIG. 7 .
  • the image acquisition device 81 is configured to acquire an image to be detected and send the image to be detected to the edge side device 82 .
  • the image acquisition device 81 may include a 2D camera, a point cloud camera, an IoT camera, or other hardware devices used to acquire images and videos, such as industrial cameras in the field of smart manufacturing.
  • a trained defect identification model is set up on the edge-side device so that the edge-side device can perform defect identification on the image to be detected by itself, thereby effectively reducing the task delay. Longer to improve computing efficiency and avoid affecting network resources used by other network applications.
  • Figure 9 is a schematic structural diagram of a machine vision-based defect identification system according to another embodiment of the present disclosure. The difference between FIG. 9 and FIG. 8 is that in the embodiment shown in FIG. 9 , the machine vision-based defect identification system also includes a cloud server 83 .
  • the cloud server 83 is configured to, after receiving the image to be detected sent by the edge side device 82, perform image annotation on the image to be detected, store the image to be detected in a training data set, and use the training data set to perform preset
  • the second defect recognition model is trained.
  • the second defect recognition model includes a Vision Transformer model.
  • the cloud server 83 sends the current model weight information of the second defect recognition model to the edge-side device 82 so that the edge-side device 82 can be set at the edge.
  • the first defect recognition model local to the side device 82 updates the weight.
  • the cloud server 83 sends the current model weight information of the second defect recognition model to the edge device 82 .
  • the cloud server 83 uses the training data set to train the preset second defect recognition model.
  • the cloud server 83 can use enough images to train the second defect recognition model, thereby improving the training of the second defect recognition model. Effect.
  • Figure 10 is a schematic flowchart of a machine vision-based defect identification method according to another embodiment of the present disclosure.
  • step 1001 the user terminal sends a service calling request to the cloud server.
  • step 1002 the cloud server verifies the authority of the user terminal.
  • step 1003 after the authority of the user terminal is verified, the cloud server sends the service call request to the edge device.
  • step 1004 the edge side device sends a service invocation request to the image collection device.
  • step 1005 the image collection device collects the image to be detected according to the service call request.
  • step 1006 the image acquisition device sends the image to be detected to the edge-side device.
  • step 1007 the edge-side device processes the image to be detected using the first defect recognition model set locally to obtain a defect recognition result.
  • step 1008 if the defect identification result shows that the image to be detected does not belong to a defective image, the defect identification result is sent to the user terminal.
  • step 1009 if the defect recognition result shows that the image to be detected belongs to a defective image, the image to be detected is sent to the cloud server.
  • the cloud server annotates the image to be detected and stores the image to be detected in a training data set.
  • the cloud server uses the training data set to train a second defect recognition model set locally.
  • step 1011 when the performance of the trained second defect recognition model is greater than the preset performance threshold, the cloud server sends the current model weight information of the second defect recognition model to the edge device.
  • the edge side device uses the model weight information sent by the cloud server to update the weight of the first defect identification model set locally.
  • This disclosure is based on the joint design of image acquisition devices, edge-side devices and cloud servers, which can effectively shorten task delays and improve computing efficiency without affecting the defect identification task based on machine vision.
  • the present disclosure has a low occupancy rate for the industrial network and will not affect the network resources of other industrial applications.
  • the real-time performance of the system is good.
  • the present invention retrains and updates the performance of the machine vision-based defect recognition model on the cloud server side, and feeds back the updated model weights to the machine vision-based defect recognition model on the edge device side, thereby enabling the machine vision-based defect recognition model in the system to be continuously updated.
  • the functional units described above can be implemented as a general-purpose processor, a programmable logic controller (PLC), a digital signal processor (Digital processor) for performing the functions described in this disclosure.
  • PLC programmable logic controller
  • Digital processor Digital processor
  • DSP Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array

Abstract

本公开提供一种缺陷识别方法、装置和系统,涉及机器视觉技术领域。缺陷识别方法包括:接收图像采集装置发送的待检测图像;利用图像特征提取模型提取待检测图像的特征图;将特征图进行展平处理,以得到多个视觉令牌;利用第一缺陷识别模型中的编码器对多个视觉令牌进行处理,以得到多个编码结果;利用第一缺陷识别模型中的译码器对多个编码结果进行处理,以得到多个译码结果;利用第一缺陷识别模型中的头模型对多个译码结果进行处理,以得到缺陷识别结果;在缺陷识别结果表明待检测图像不属于缺陷图像的情况下,将缺陷识别结果发送给用户终端。

Description

缺陷识别方法、装置和系统
相关申请的交叉引用
本申请是以CN申请号为202211163804.4,申请日为2022年9月23日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及机器视觉技术领域,特别涉及一种缺陷识别方法、装置和系统。
背景技术
面向智慧工业的机器视觉系统是指用机器或者计算机来代替人眼视觉来做检测、分类、测量或者判断的系统。机器视觉是用于视觉检测、视觉定位等领域,在工业互联网中得到广泛应用。机器视觉系统通过将机器视觉产品转换成图像信号,传送给专用的图像处理系统,图像系统对这些数据和信号进行各种计算来抽取目标的特征,根据判别的结果进而来控制工业现场的设备动作。基于智能制造的机器视觉系统在工业缺陷视觉检测、视觉分类、工业尺寸测量等领域具有非常重要的价值。
机器视觉系统的特点是提高生产的智能程度和自动化程度。例如,在一些不适合人工作业的危险工作环境或人工视觉难以满足要求的场合,可用机器视觉来替代人工视觉。又例如,在大批量工业生产过程中,用基于机器视觉的缺陷识别方法可以大大提高生产效率和生产的自动化程度。而且机器视觉易于实现信息集成,是实现计算机集成制造的基础技术。
发明内容
根据本公开实施例的第一方面,提供一种缺陷识别方法,由边缘侧设备执行,包括:接收图像采集装置发送的待检测图像;利用图像特征提取模型提取所述待检测图像的特征图;将所述特征图进行展平处理,以得到多个视觉令牌;利用第一缺陷识别模型中的编码器对所述多个视觉令牌进行处理,以得到多个编码结果;利用所述第一缺陷识别模型中的译码器对所述多个编码结果进行处理,以得到多个译码结果;利用所述第一缺陷识别模型中的头模型对所述多个译码结果进行处理,以得到缺陷识别结果;在所述缺陷识别结果表明所述待检测图像不属于缺陷图像的情况下,将所述缺陷 识别结果发送给用户终端。
在一些实施例中,利用编码器对所述多个视觉令牌进行处理包括:利用归一化模型对第i个视觉令牌进行归一化处理,以得到第一编码处理结果,1≤i≤N,N为视觉令牌总数;利用多头自注意力模型对所述第一编码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二编码处理结果;将所述第二编码处理结果和所述第i个视觉令牌进行融合,以得到第三编码处理结果;利用归一化模型对所述第三编码处理结果进行归一化处理,以得第四编码处理结果;利用多层感知器模型对所述第四编码处理结果进行多层感知处理,以得到第五编码处理结果;将所述第五编码处理结果和所述第四编码处理结果进行融合,以得到所述第i个视觉令牌的编码结果。
在一些实施例中,利用多头自注意力模型对所述第一编码处理结果和对应的位置编码信息进行多头自注意力处理包括:针对所述第一编码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt;分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt;根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值;根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二编码处理结果。
在一些实施例中,利用译码器对所述多个编码结果进行处理包括:利用归一化模型对预设的对象查询信息进行归一化处理,以得到第一译码处理结果;利用多头自注意力模型对所述第一译码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二译码处理结果;将所述第二译码处理结果和所述对象查询信息进行融合,以得到第三译码处理结果;利用归一化模型对所述第三译码处理结果进行归一化处理,以得第四译码处理结果;利用多头自注意力模型对所述第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理,以得到第五译码处理结果,1≤j≤N,N为编码结果总数;将所述第五译码处理结果和所述第三译码处理结果进行融合,以得到第六译码处理结果;利用归一化模型对所述第六译码处理结果进行归一化处理,以得第七译码处理结果;利用多层感知器模型对所述第七译码处理结果进行多层感知处理,以得到第八译码处理结果;将所述第八译码处理结果和所述第七译码结果进行融合,以得到所述第j个编码结果的译码结果。
在一些实施例中,利用多头自注意力模型对所述第一译码处理结果和对应的位置 编码信息进行多头自注意力处理包括:针对所述第一译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt;分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt;根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值;根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二译码处理结果。
在一些实施例中,利用多头自注意力模型对所述第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理包括:针对所述第四译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt;分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt;将第三向量矩阵Vt与第j个编码结果相加,以得到更新后的第三向量矩阵Vt;根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和更新后的第三向量矩阵Vt确定所述每个单头的注意力值;根据全部单头的注意力值确定对应的多头注意力值,以作为所述第五译码处理结果。
在一些实施例中,利用所述第一缺陷识别模型中的头模型对所述多个译码结果进行处理包括:利用所述头模型中的第一全连接网络模型对所述多个译码结果进行处理,以计算目标所属类别;利用所述头模型中的第二全连接网络模型对所述多个译码结果进行处理,以计算目标所在位置信息。
在一些实施例中,所述第一缺陷识别模型包括Vision Transformer模型。
在一些实施例中,在所述缺陷识别结果表明所述待检测图像属于缺陷图像的情况下,将所述待检测图像发送给云服务器,以便所述云服务器利用所述待检测图像对预设的第二缺陷识别模型进行训练。
在一些实施例中,利用所述云服务器发送的模型权重信息,对所述第一缺陷识别模型进行权重更新。
根据本公开实施例的第二方面,提供一种边缘侧设备,包括:第一处理模块,被配置为接收图像采集装置发送的待检测图像;第二处理模块,利用图像特征提取模型提取所述待检测图像的特征图,将所述特征图进行展平处理,以得到多个视觉令牌;第三处理模块,被配置为利用第一缺陷识别模型中的编码器对所述多个视觉令牌进行 处理,以得到多个编码结果;第四处理模块,被配置为利用所述第一缺陷识别模型中的译码器对所述多个编码结果进行处理,以得到多个译码结果;第五处理模块,被配置为利用所述第一缺陷识别模型中的头模型对所述多个译码结果进行处理,以得到缺陷识别结果;第六处理模块,被配置为在所述缺陷识别结果表明所述待检测图像不属于缺陷图像的情况下,将所述缺陷识别结果发送给用户终端。
在一些实施例中,第三处理模块被配置为利用归一化模型对第i个视觉令牌进行归一化处理,以得到第一编码处理结果,1≤i≤N,N为视觉令牌总数,利用多头自注意力模型对所述第一编码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二编码处理结果,将所述第二编码处理结果和所述第i个视觉令牌进行融合,以得到第三编码处理结果,利用归一化模型对所述第三编码处理结果进行归一化处理,以得第四编码处理结果,利用多层感知器模型对所述第四编码处理结果进行多层感知处理,以得到第五编码处理结果,将所述第五编码处理结果和所述第四编码处理结果进行融合,以得到所述第i个视觉令牌的编码结果。
在一些实施例中,第三处理模块被配置为针对所述第一编码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二编码处理结果。
在一些实施例中,第四处理模块被配置为利用归一化模型对预设的对象查询信息进行归一化处理,以得到第一译码处理结果,利用多头自注意力模型对所述第一译码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二译码处理结果,将所述第二译码处理结果和所述对象查询信息进行融合,以得到第三译码处理结果,利用归一化模型对所述第三译码处理结果进行归一化处理,以得第四译码处理结果,利用多头自注意力模型对所述第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理,以得到第五译码处理结果,1≤j≤N,N为编码结果总数,将所述第五译码处理结果和所述第三译码处理结果进行融合,以得到第六译码处理结果,利用归一化模型对所述第六译码处理结果进行归一化处理,以得第七译码处理结果,利用多层感知器模型对所述第七译码处理结果进行多层感知处理,以得到第八译 码处理结果,将所述第八译码处理结果和所述第七译码结果进行融合,以得到所述第j个编码结果的译码结果。
在一些实施例中,第四处理模块被配置为针对所述第一译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定所述每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二译码处理结果。
在一些实施例中,第四处理模块被配置为针对所述第四译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,将第三向量矩阵Vt与第j个编码结果相加,以得到更新后的第三向量矩阵Vt,根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和更新后的第三向量矩阵Vt确定所述每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为所述第五译码处理结果。
在一些实施例中,第五处理模块被配置为利用所述头模型中的第一全连接网络模型对所述多个译码结果进行处理,以计算目标所属类别,利用所述头模型中的第二全连接网络模型对所述多个译码结果进行处理,以计算目标所在位置信息。
在一些实施例中,所述第一缺陷识别模型包括Vision Transformer模型。
在一些实施例中,第六处理模块被配置为在所述缺陷识别结果表明所述待检测图像属于缺陷图像的情况下,将所述待检测图像发送给云服务器,以便所述云服务器利用所述待检测图像对预设的第二缺陷识别模型进行训练。
在一些实施例中,第六处理模块被配置为利用所述云服务器发送的模型权重信息,对所述第一缺陷识别模型进行权重更新。
根据本公开实施例的第三方面,提供一种边缘侧设备,包括:存储器,被配置为存储指令;处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如上述任一实施例所述的方法。
根据本公开实施例的第四方面,提供一种缺陷识别系统,包括:如上述任一实施 例所述的边缘侧设备;图像采集装置,被配置为采集待检测图像,并将所述待检测图像发送给所述边缘侧设备。
在一些实施中,系统还包括:云服务器,被配置为在接收到所述边缘侧设备发送的待检测图像后,对所述待检测图像进行图像标注,将所述待检测图像存入训练数据集合中,并利用所述训练数据集合对预设的第二缺陷识别模型进行训练,在经过训练的第二缺陷识别模型的性能大于预设性能阈值的情况下,将所述第二缺陷识别模型的当前模型权重信息发送给所述边缘侧设备。
在一些实施中云服务器被配置为在所述训练数据集合中的图像数量大于预设数量阈值的情况下,利用所述训练数据集合对预设的第二缺陷识别模型进行训练。
在一些实施中所述第二缺陷识别模型包括Vision Transformer模型。
根据本公开实施例的第五方面,提供一种非瞬态计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如上述任一实施例所述的方法。
根据本公开实施例的第六方面,提供一种计算机程序产品,包括计算机指令,其中所述计算机指令被处理器执行时实现如上述任一实施例所述的方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征及其优点将会变得清楚。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开一个实施例的基于机器视觉的缺陷识别方法的流程示意图;
图2为本公开一个实施例的编码器的结构示意图;
图3为本公开一个实施例的译码器的结构示意图;
图4为本公开一个实施例的头模型的结构示意图;
图5为本公开另一个实施例的基于机器视觉的缺陷识别方法的流程示意图;
图6为本公开一个实施例的边缘侧设备的结构示意图;
图7为本公开另一个实施例的边缘侧设备的结构示意图;
图8为本公开一个实施例的基于机器视觉的缺陷识别系统的结构示意图;
图9为本公开另一个实施例的基于机器视觉的缺陷识别系统的结构示意图;
图10为本公开又一个实施例的基于机器视觉的缺陷识别方法的流程示意图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为授权说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
发明人注意到,在相关技术中,需要通过边缘侧设备和云服务器协同工作以实现缺陷识别,这种协同工作会导致任务时延较长,计算效率较低,同时在边缘侧设备和云服务器之间的交互会占用较多的网络资源,对其它网络应用使用的网络资源产生影响。
据此,本公开提供一种基于机器视觉的缺陷识别方案,能够有效减小任务时延较长,提高计算效率,避免对其它网络应用使用的网络资源产生影响。
图1为本公开一个实施例的基于机器视觉的缺陷识别方法的流程示意图。在一些实施例中,下列的基于机器视觉的缺陷识别方法由边缘侧设备执行。
在步骤101,接收图像采集装置发送的待检测图像。
在一些实施例中,图像采集装置可包括2D相机、点云相机、物联网相机或其它 用来获取图像和视频的硬件设备,例如包括智能制造领域的工业相机等。
在步骤102,利用图像特征提取模型提取待检测图像的特征图。
在一些实施例中,图像特征提取(Image Feature Extraction)模型包括利用残差网络结构设计的图像特征提取模型。
在步骤103,将特征图进行展平处理,以得到多个视觉令牌(Visual Token)。
例如,特征图为H×W×C维,通过将特征图进行展平处理,得到H×W个视觉令牌。
在步骤104,利用第一缺陷识别模型中的编码器对多个视觉令牌进行处理,以得到多个编码结果。
在一些实施例中,第一缺陷识别模型为Vision Transformer模型。
在一些实施例中,第一缺陷识别模型中的编码器如图2所示,相应的编码流程如下:
1)利用归一化(Normalize)模型21对第i个视觉令牌进行归一化处理,以得到第一编码处理结果,1≤i≤N,N为视觉令牌总数。
2)利用多头自注意力(Multi-head Self Attention)模型22对第一编码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二编码处理结果。
在一些实施例中,多头自注意力模型22的处理包括以下内容:
首先,针对第一编码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt
例如,例如,相应的计算公式如公式(1)所示,其中Ft为第一编码处理结果。
接下来,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码(Positional Encoding)信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt
接下来,根据更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定每个单头的注意力值st,如公式(2)所示。
其中,为矩阵Kt的维度,τ为注意力计算函数,ρ为Softmax逻辑回归函数。
然后,根据全部单头的注意力值确定对应的多头注意力值,以作为第二编码处理结果,如公式(3)所示。
其中,ε为Concatenate函数,为参数矩阵。
3)将第二编码处理结果和第i个视觉令牌进行融合,以得到第三编码处理结果。
4)利用归一化模型23对第三编码处理结果进行归一化处理,以得第四编码处理结果。
5)利用多层感知器(Multilayer Perceptron)模型24对第四编码处理结果进行多层感知处理,以得到第五编码处理结果。
6)将第五编码处理结果和第四编码处理结果进行融合,以得到第i个视觉令牌的编码结果。
在步骤105,利用第一缺陷识别模型中的译码器对多个编码结果进行处理,以得到多个译码结果。
在一些实施例中,第一缺陷识别模型中的译码器如图3所示,相应的译码流程如下:
1)利用归一化模型31对预设的对象查询(Object Queries)信息进行归一化处理,以得到第一译码处理结果。
2)利用多头自注意力模型32对第一译码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二译码处理结果。
在一些实施例中,多头自注意力模型32的处理包括以下内容:
首先,针对第一译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt
例如,相应的计算公式如上述公式(1)所示。
接下来,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt
接下来,根据更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定每个单头的注意力值。
例如,注意力值的计算公式如上述公式(2)所示。
然后,根据全部单头的注意力值确定对应的多头注意力值,以作为第二译码处理 结果。
例如,相应的计算公式如上述公式(3)所示。
3)将第二译码处理结果和对象查询信息进行融合,以得到第三译码处理结果。
4)利用归一化模型33对第三译码处理结果进行归一化处理,以得第四译码处理结果。
5)利用多头自注意力模型34对第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理,以得到第五译码处理结果,1≤j≤N,N为编码结果总数。
在一些实施例中,多头自注意力模型34的处理包括以下内容:
首先,针对第四译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt
例如,相应的计算公式如上述公式(1)所示。
接下来,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt。将第三向量矩阵Vt与第j个编码结果相加,以得到更新后的第三向量矩阵Vt
接下来,根据更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和更新后的第三向量矩阵Vt确定每个单头的注意力值。
例如,相应的计算公式如上述公式(2)所示。
然后,根据全部单头的注意力值确定对应的多头注意力值,以作为第五译码处理结果。
例如,相应的计算公式如上述公式(3)所示。
6)将第五译码处理结果和第三译码处理结果进行融合,以得到第六译码处理结果。
7)利用归一化模型35对第六译码处理结果进行归一化处理,以得第七译码处理结果。
8)利用多层感知器模型36对第七译码处理结果进行多层感知处理,以得到第八译码处理结果。
9)将第八译码处理结果和第七译码结果进行融合,以得到第j个编码结果的译码结果。
在步骤106,利用第一缺陷识别模型中的头(heads)模型对多个译码结果进行处 理,以得到缺陷识别结果。
在一些实施例中,如图4所示,头模型中包括第一全连接(Fully Connected)网络模型41和第二全连接网络模型42。
例如,利用第一全连接网络模型41对多个译码结果进行处理,以计算目标所属类别。利用第二全连接网络模型42对多个译码结果进行处理,以计算目标所在位置信息。
在步骤107,在缺陷识别结果表明待检测图像不属于缺陷图像的情况下,将缺陷识别结果发送给用户终端。
在本公开上述实施例提供的基于机器视觉的缺陷识别方法中,通过在边缘侧设备设置经过训练的缺陷识别模型,以便边缘侧设备能够自行对待检测图像进行缺陷识别,从而有效减小任务时延较长,提高计算效率,避免对其它网络应用使用的网络资源产生影响。
图5为本公开另一个实施例的基于机器视觉的缺陷识别方法的流程示意图。在一些实施例中,下列的基于机器视觉的缺陷识别方法由边缘侧设备执行。
在步骤501,接收图像采集装置发送的待检测图像。
在一些实施例中,图像采集装置可包括2D相机、点云相机、物联网相机或其它用来获取图像和视频的硬件设备,例如包括智能制造领域的工业相机等。
在步骤502,利用图像特征提取模型提取待检测图像的特征图。
在一些实施例中,图像特征提取(Image Feature Extraction)模型包括利用残差网络结构设计的图像特征提取模型。
在步骤503,将特征图进行展平处理,以得到多个视觉令牌(Visual Token)。
例如,特征图为H×W×C维,通过将特征图进行展平处理,得到H×W个视觉令牌。
在步骤504,利用第一缺陷识别模型中的编码器对多个视觉令牌进行处理,以得到多个编码结果。
在一些实施例中,第一缺陷识别模型为Vision Transformer模型。
在一些实施例中,第一缺陷识别模型中的编码器如图2所示
在步骤505,利用第一缺陷识别模型中的译码器对多个编码结果进行处理,以得到多个译码结果。
在一些实施例中,第一缺陷识别模型中的译码器如图3所示
在步骤506,利用第一缺陷识别模型中的头模型对多个译码结果进行处理,以得到缺陷识别结果。
在一些实施例中,头模型如图4所示。
在步骤507,在缺陷识别结果表明待检测图像属于缺陷图像的情况下,将待检测图像发送给云服务器,以便云服务器利用待检测图像对预设的第二缺陷识别模型进行训练。
在一些实施例中,第二缺陷识别模型为Vision Transformer模型。
在步骤508,利用云服务器发送的模型权重信息,对第一缺陷识别模型进行权重更新。
需要说明的是,在边缘侧设备识别出待检测图像具有缺陷的情况下,边缘侧设备将该待检测图像发送给云服务器,以便云服务器利用该待检测图像对设置在云服务器侧的第二缺陷识别模型进行训练。在经过训练的第二缺陷识别模型的性能评估结果满足预设条件的情况下,云服务器将第二缺陷识别模型当前的模型权重信息发送给边缘侧设备,以便边缘侧设备利用第二缺陷识别模型当前的模型权重信息对设置在边缘侧设备侧的第一缺陷识别模型进行权重更新。由此可实现对设置在边缘侧设备侧的第一缺陷识别模型进行持续更新,从而持续提升边缘侧设备侧的第一缺陷识别模型的缺陷识别能力。
图6为本公开一个实施例的边缘侧设备的结构示意图。如图6所示,边缘侧设备包括第一处理模块61、第二处理模块62、第三处理模块63、第四处理模块64、第五处理模块65和第六处理模块66。
第一处理模块61被配置为接收图像采集装置发送的待检测图像。
在一些实施例中,图像采集装置可包括2D相机、点云相机、物联网相机或其它用来获取图像和视频的硬件设备,例如包括智能制造领域的工业相机等。
第二处理模块62被配置为利用图像特征提取模型提取待检测图像的特征图,将特征图进行展平处理,以得到多个视觉令牌。
在一些实施例中,图像特征提取模型包括利用残差网络结构设计的图像特征提取模型。
例如,特征图为H×W×C维,通过将特征图进行展平处理,得到H×W个视觉令牌。
第三处理模块63被配置为利用第一缺陷识别模型中的编码器对多个视觉令牌进 行处理,以得到多个编码结果。
在一些实施例中,第一缺陷识别模型为Vision Transformer模型。
在一些实施例中,第一缺陷识别模型中的编码器如图2所示
在一些实施例中,第三处理模块63被配置为利用归一化模型对第i个视觉令牌进行归一化处理,以得到第一编码处理结果,1≤i≤N,N为视觉令牌总数,利用多头自注意力模型对第一编码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二编码处理结果,将第二编码处理结果和第i个视觉令牌进行融合,以得到第三编码处理结果,利用归一化模型对第三编码处理结果进行归一化处理,以得第四编码处理结果,利用多层感知器模型对第四编码处理结果进行多层感知处理,以得到第五编码处理结果,将第五编码处理结果和第四编码处理结果进行融合,以得到第i个视觉令牌的编码结果。
在一些实施例中,第三处理模块63被配置为针对第一编码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,根据更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为第二编码处理结果。
第四处理模块64被配置为利用第一缺陷识别模型中的译码器对多个编码结果进行处理,以得到多个译码结果。
在一些实施例中,第一缺陷识别模型中的译码器如图3所示。
在一些实施例中,第四处理模块64被配置为利用归一化模型对预设的对象查询信息进行归一化处理,以得到第一译码处理结果,利用多头自注意力模型对第一译码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二译码处理结果,将第二译码处理结果和对象查询信息进行融合,以得到第三译码处理结果,利用归一化模型对第三译码处理结果进行归一化处理,以得第四译码处理结果,利用多头自注意力模型对第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理,以得到第五译码处理结果,1≤j≤N,N为编码结果总数,将第五译码处理结果和第三译码处理结果进行融合,以得到第六译码处理结果,利用归一化模型对第六译码处理结果进行归一化处理,以得第七译码处理结果,利用多层感知器模型对第七 译码处理结果进行多层感知处理,以得到第八译码处理结果,将第八译码处理结果和第七译码结果进行融合,以得到第j个编码结果的译码结果。
在一些实施例中,第四处理模块64被配置为针对第一译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,根据更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和第三向量矩阵Vt确定每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为第二译码处理结果。
在一些实施例中,第四处理模块64被配置为针对第四译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,将第三向量矩阵Vt与第j个编码结果相加,以得到更新后的第三向量矩阵Vt,根据更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和更新后的第三向量矩阵Vt确定每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为第五译码处理结果。
第五处理模块65被配置为利用第一缺陷识别模型中的头模型对多个译码结果进行处理,以得到缺陷识别结果。
在一些实施例中,头模型如图4所示。
在一些实施例中,第五处理模块65被配置为利用头模型中的第一全连接网络模型对多个译码结果进行处理,以计算目标所属类别,利用头模型中的第二全连接网络模型对多个译码结果进行处理,以计算目标所在位置信息。
第六处理模块66被配置为在缺陷识别结果表明待检测图像不属于缺陷图像的情况下,将缺陷识别结果发送给用户终端。
在一些实施例中,第六处理模块66被配置为在缺陷识别结果表明待检测图像属于缺陷图像的情况下,将待检测图像发送给云服务器,以便云服务器利用待检测图像对预设的第二缺陷识别模型进行训练。
在一些实施例中,第六处理模块66被配置为利用云服务器发送的模型权重信息,对第一缺陷识别模型进行权重更新。
图7为本公开另一个实施例的边缘侧设备的结构示意图。如图7所示,边缘侧设备包括存储器71和处理器72。
存储器71用于存储指令,处理器72耦合到存储器71,处理器72被配置为基于存储器存储的指令执行实现如图1、5中任一实施例涉及的方法。
如图7所示,该边缘侧设备还包括通信接口73,用于与其它设备进行信息交互。同时,该边缘侧设备还包括总线74,处理器72、通信接口73、以及存储器71通过总线74完成相互间的通信。
存储器71可以包含高速RAM存储器,也可还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。存储器71也可以是存储器阵列。存储器71还可能被分块,并且块可按一定的规则组合成虚拟卷。
此外,处理器72可以是一个中央处理器CPU,或者可以是专用集成电路ASIC,或是被配置成实施本公开实施例的一个或多个集成电路。
本公开同时还涉及一种计算机可读存储介质,其中计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如图1、5中任一实施例涉及的方法。
图8为本公开一个实施例的基于机器视觉的缺陷识别系统的结构示意图。如图8所示,基于机器视觉的缺陷识别系统包括图像采集装置81和边缘侧设备82。边缘侧设备82为图6或图7中任一实施例涉及的边缘侧设备。
图像采集装置81被配置为采集待检测图像,并将待检测图像发送给边缘侧设备82。
在一些实施例中,图像采集装置81可包括2D相机、点云相机、物联网相机或其它用来获取图像和视频的硬件设备,例如包括智能制造领域的工业相机等。
在本公开上述实施例提供的基于机器视觉的缺陷识别系统中,通过在边缘侧设备设置经过训练的缺陷识别模型,以便边缘侧设备能够自行对待检测图像进行缺陷识别,从而有效减小任务时延较长,提高计算效率,避免对其它网络应用使用的网络资源产生影响。
图9为本公开另一个实施例的基于机器视觉的缺陷识别系统的结构示意图。图9和图8的不同之处在于,在图9所示实施例中,基于机器视觉的缺陷识别系统还包括云服务器83。
云服务器83被配置为在接收到边缘侧设备82发送的待检测图像后,对待检测图像进行图像标注,将待检测图像存入训练数据集合中,并利用训练数据集合对预设的 第二缺陷识别模型进行训练。
在一些实施例中,第二缺陷识别模型包括视觉转换器(Vision Transformer)模型。
在经过训练的第二缺陷识别模型的性能大于预设性能阈值的情况下,云服务器83将第二缺陷识别模型的当前模型权重信息发送给边缘侧设备82,以便边缘侧设备82对设置在边缘侧设备82本地的第一缺陷识别模型进行权重更新。
例如,若经过训练的第二缺陷识别模型的缺陷识别率高于原先的缺陷识别率,则云服务器83将第二缺陷识别模型的当前模型权重信息发送给边缘侧设备82。
在一些实施例中,云服务器83在训练数据集合中的图像数量大于预设数量阈值的情况下,利用训练数据集合对预设的第二缺陷识别模型进行训练。
需要说明的是,在训练数据集合中的图像数量大于预设数量阈值的情况下,云服务器83能够使用足够多的图像对第二缺陷识别模型进行训练,从而能够提高第二缺陷识别模型的训练效果。
图10为本公开又一个实施例的基于机器视觉的缺陷识别方法的流程示意图。
在步骤1001,用户终端向云服务器发送业务调用请求。
在步骤1002,云服务器对用户终端的权限进行验证。
在步骤1003,在用户终端的权限通过验证后,云服务器将业务调用请求发送给边缘侧设备。
在步骤1004,边缘侧设备将业务调用请求发送给图像采集装置。
在步骤1005,图像采集装置根据业务调用请求采集待检测图像。
在步骤1006,图像采集装置将待检测图像发送给边缘侧设备。
在步骤1007,边缘侧设备利用设置在本地的第一缺陷识别模型对待检测图像进行处理,以得到缺陷识别结果。
在步骤1008,在缺陷识别结果表明待检测图像不属于缺陷图像的情况下,将缺陷识别结果发送给用户终端。
在步骤1009,在缺陷识别结果表明待检测图像属于缺陷图像的情况下,将待检测图像发送给云服务器。
在步骤1010,云服务器对待检测图像进行图像标注,将待检测图像存入训练数据集合中。在训练数据集合中的图像数量大于预设数量阈值的情况下,云服务器利用训练数据集合对设置在本地的第二缺陷识别模型进行训练。
在步骤1011,在经过训练的第二缺陷识别模型的性能大于预设性能阈值的情况下, 云服务器将第二缺陷识别模型的当前模型权重信息发送给边缘侧设备。
在步骤1012,边缘侧设备利用云服务器发送的模型权重信息对设置在本地的第一缺陷识别模型进行权重更新。
通过实施本公开的上述实施例,能够得到以下有益效果:
1)本公开基于图像采集装置、边缘侧设备和云服务器的联合设计,能够有效缩短任务时延,提高计算效率,不会对基于机器视觉的缺陷识别任务产生影响。
2)本公开对工业网络的占用率较低,不会影响其他工业应用的网络资源,系统的实时性较好。
3)本公开通过边缘侧设备和云服务器的协同工作,工业现场部署的服务器性能不会影响整体系统的识别性能。
4)本公开通过对云服务器侧的基于机器视觉的缺陷识别模型进行重新训练和性能更新,并将更新后的模型权重反馈给边缘侧设备侧的基于机器视觉的缺陷识别模型,从而使得该系统中的基于机器视觉的缺陷识别模型能够进行持续更新。
在一些实施例中,在上面所描述的功能单元可以实现为用于执行本公开所描述功能的通用处理器、可编程逻辑控制器(Programmable Logic Controller,简称:PLC)、数字信号处理器(Digital Signal Processor,简称:DSP)、专用集成电路(Application Specific Integrated Circuit,简称:ASIC)、现场可编程门阵列(Field-Programmable Gate Array,简称:FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件或者其任意适当组合。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。

Claims (27)

  1. 一种缺陷识别方法,由边缘侧设备执行,包括:
    接收图像采集装置发送的待检测图像;
    利用图像特征提取模型提取所述待检测图像的特征图;
    将所述特征图进行展平处理,以得到多个视觉令牌;
    利用第一缺陷识别模型中的编码器对所述多个视觉令牌进行处理,以得到多个编码结果;
    利用所述第一缺陷识别模型中的译码器对所述多个编码结果进行处理,以得到多个译码结果;
    利用所述第一缺陷识别模型中的头模型对所述多个译码结果进行处理,以得到缺陷识别结果;
    在所述缺陷识别结果表明所述待检测图像不属于缺陷图像的情况下,将所述缺陷识别结果发送给用户终端。
  2. 根据权利要求1所述的方法,其中,利用编码器对所述多个视觉令牌进行处理包括:
    利用归一化模型对第i个视觉令牌进行归一化处理,以得到第一编码处理结果,1≤i≤N,N为视觉令牌总数;
    利用多头自注意力模型对所述第一编码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二编码处理结果;
    将所述第二编码处理结果和所述第i个视觉令牌进行融合,以得到第三编码处理结果;
    利用归一化模型对所述第三编码处理结果进行归一化处理,以得第四编码处理结果;
    利用多层感知器模型对所述第四编码处理结果进行多层感知处理,以得到第五编码处理结果;
    将所述第五编码处理结果和所述第四编码处理结果进行融合,以得到所述第i个视觉令牌的编码结果。
  3. 根据权利要求2所述的方法,其中,利用多头自注意力模型对所述第一编码处理结果和对应的位置编码信息进行多头自注意力处理包括:
    针对所述第一编码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt
    分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt
    根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和所述第三向量矩阵Vt确定所述每个单头的注意力值;
    根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二编码处理结果。
  4. 根据权利要求1所述的方法,其中,利用译码器对所述多个编码结果进行处理包括:
    利用归一化模型对预设的对象查询信息进行归一化处理,以得到第一译码处理结果;
    利用多头自注意力模型对所述第一译码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二译码处理结果;
    将所述第二译码处理结果和所述对象查询信息进行融合,以得到第三译码处理结果;
    利用归一化模型对所述第三译码处理结果进行归一化处理,以得第四译码处理结果;
    利用多头自注意力模型对所述第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理,以得到第五译码处理结果,1≤j≤N,N为编码结果总数;
    将所述第五译码处理结果和所述第三译码处理结果进行融合,以得到第六译码处理结果;
    利用归一化模型对所述第六译码处理结果进行归一化处理,以得第七译码处理结果;
    利用多层感知器模型对所述第七译码处理结果进行多层感知处理,以得到第八译 码处理结果;
    将所述第八译码处理结果和所述第七译码结果进行融合,以得到所述第j个编码结果的译码结果。
  5. 根据权利要求4所述的方法,其中,利用多头自注意力模型对所述第一译码处理结果和对应的位置编码信息进行多头自注意力处理包括:
    针对所述第一译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt
    分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt
    根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和所述第三向量矩阵Vt确定所述每个单头的注意力值;
    根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二译码处理结果。
  6. 根据权利要求4所述的方法,其中,利用多头自注意力模型对所述第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理包括:
    针对所述第四译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt
    分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt
    将第三向量矩阵Vt与第j个编码结果相加,以得到更新后的第三向量矩阵Vt
    根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和更新后的第三向量矩阵Vt确定所述每个单头的注意力值;
    根据全部单头的注意力值确定对应的多头注意力值,以作为所述第五译码处理结果。
  7. 根据权利要求1所述的方法,其中,利用所述第一缺陷识别模型中的头模型对 所述多个译码结果进行处理包括:
    利用所述头模型中的第一全连接网络模型对所述多个译码结果进行处理,以计算目标所属类别;
    利用所述头模型中的第二全连接网络模型对所述多个译码结果进行处理,以计算目标所在位置信息。
  8. 根据权利要求1所述的方法,其中,
    所述第一缺陷识别模型包括Vision Transformer模型。
  9. 根据权利要求1-8中任一项所述的方法,还包括:
    在所述缺陷识别结果表明所述待检测图像属于缺陷图像的情况下,将所述待检测图像发送给云服务器,以便所述云服务器利用所述待检测图像对预设的第二缺陷识别模型进行训练。
  10. 根据权利要求9所述的方法,还包括:
    利用所述云服务器发送的模型权重信息,对所述第一缺陷识别模型进行权重更新。
  11. 一种边缘侧设备,包括:
    第一处理模块,被配置为接收图像采集装置发送的待检测图像;
    第二处理模块,被配置为利用图像特征提取模型提取所述待检测图像的特征图,将所述特征图进行展平处理,以得到多个视觉令牌;
    第三处理模块,被配置为利用第一缺陷识别模型中的编码器对所述多个视觉令牌进行处理,以得到多个编码结果;
    第四处理模块,被配置为利用所述第一缺陷识别模型中的译码器对所述多个编码结果进行处理,以得到多个译码结果;
    第五处理模块,被配置为利用所述第一缺陷识别模型中的头模型对所述多个译码结果进行处理,以得到缺陷识别结果;
    第六处理模块,被配置为在所述缺陷识别结果表明所述待检测图像不属于缺陷图像的情况下,将所述缺陷识别结果发送给用户终端。
  12. 根据权利要求11所述的边缘侧设备,其中,
    第三处理模块被配置为利用归一化模型对第i个视觉令牌进行归一化处理,以得到第一编码处理结果,1≤i≤N,N为视觉令牌总数,利用多头自注意力模型对所述第一编码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二编码处理结果,将所述第二编码处理结果和所述第i个视觉令牌进行融合,以得到第三编码处理结果,利用归一化模型对所述第三编码处理结果进行归一化处理,以得第四编码处理结果,利用多层感知器模型对所述第四编码处理结果进行多层感知处理,以得到第五编码处理结果,将所述第五编码处理结果和所述第四编码处理结果进行融合,以得到所述第i个视觉令牌的编码结果。
  13. 根据权利要求12所述的边缘侧设备,其中,
    第三处理模块被配置为针对所述第一编码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和所述第三向量矩阵Vt确定所述每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二编码处理结果。
  14. 根据权利要求11所述的边缘侧设备,其中,
    第四处理模块被配置为利用归一化模型对预设的对象查询信息进行归一化处理,以得到第一译码处理结果,利用多头自注意力模型对所述第一译码处理结果和对应的位置编码信息进行多头自注意力处理,以得到第二译码处理结果,将所述第二译码处理结果和所述对象查询信息进行融合,以得到第三译码处理结果,利用归一化模型对所述第三译码处理结果进行归一化处理,以得第四译码处理结果,利用多头自注意力模型对所述第四译码处理结果、第j个编码结果和对应的位置编码信息进行多头自注意力处理,以得到第五译码处理结果,1≤j≤N,N为编码结果总数,将所述第五译码处理结果和所述第三译码处理结果进行融合,以得到第六译码处理结果,利用归一化模型对所述第六译码处理结果进行归一化处理,以得第七译码处理结果,利用多层感知器模型对所述第七译码处理结果进行多层感知处理,以得到第八译码处理结果,将所 述第八译码处理结果和所述第七译码结果进行融合,以得到所述第j个编码结果的译码结果。
  15. 根据权利要求14所述的边缘侧设备,其中,
    第四处理模块被配置为针对所述第一译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和所述第三向量矩阵Vt确定所述每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为所述第二译码处理结果。
  16. 根据权利要求14所述的边缘侧设备,其中,
    第四处理模块被配置为针对所述第四译码处理结果,分别根据每个单头的第一注意力权重矩阵第二注意力权重矩阵和第三注意力权重矩阵确定对应的第一向量矩阵Qt、第二向量矩阵Kt和第三向量矩阵Vt,分别将第一向量矩阵Qt和第二向量矩阵Kt与对应的位置编码信息相加,以得到更新后的第一向量矩阵Qt和更新后的第二向量矩阵Kt,将第三向量矩阵Vt与第j个编码结果相加,以得到更新后的第三向量矩阵Vt,根据所述更新后的第一向量矩阵Qt、更新后的第二向量矩阵Kt和更新后的第三向量矩阵Vt确定所述每个单头的注意力值,根据全部单头的注意力值确定对应的多头注意力值,以作为所述第五译码处理结果。
  17. 根据权利要求11所述的边缘侧设备,其中,
    第五处理模块被配置为利用所述头模型中的第一全连接网络模型对所述多个译码结果进行处理,以计算目标所属类别,利用所述头模型中的第二全连接网络模型对所述多个译码结果进行处理,以计算目标所在位置信息。
  18. 根据权利要求11所述的边缘侧设备,其中,
    所述第一缺陷识别模型包括Vision Transformer模型。
  19. 根据权利要求11-18中任一项所述的边缘侧设备,其中,
    第六处理模块被配置为在所述缺陷识别结果表明所述待检测图像属于缺陷图像的情况下,将所述待检测图像发送给云服务器,以便所述云服务器利用所述待检测图像对预设的第二缺陷识别模型进行训练。
  20. 根据权利要求19所述的边缘侧设备,其中,
    第六处理模块被配置为利用所述云服务器发送的模型权重信息,对所述第一缺陷识别模型进行权重更新。
  21. 一种边缘侧设备,包括:
    存储器,被配置为存储指令;
    处理器,耦合到存储器,处理器被配置为基于存储器存储的指令执行实现如权利要求1-10中任一项所述的方法。
  22. 一种缺陷识别系统,包括:
    如权利要求11-21中任一项所述的边缘侧设备;
    图像采集装置,被配置为采集待检测图像,并将所述待检测图像发送给所述边缘侧设备。
  23. 根据权利要求22所述的系统,还包括:
    云服务器,被配置为在接收到所述边缘侧设备发送的待检测图像后,对所述待检测图像进行图像标注,将所述待检测图像存入训练数据集合中,并利用所述训练数据集合对预设的第二缺陷识别模型进行训练,在经过训练的第二缺陷识别模型的性能大于预设性能阈值的情况下,将所述第二缺陷识别模型的当前模型权重信息发送给所述边缘侧设备。
  24. 根据权利要求23所述的系统,其中,
    云服务器被配置为在所述训练数据集合中的图像数量大于预设数量阈值的情况下,利用所述训练数据集合对预设的第二缺陷识别模型进行训练。
  25. 根据权利要求22所述的系统,其中,
    所述第二缺陷识别模型包括Vision Transformer模型。
  26. 一种非瞬态计算机可读存储介质,其中,计算机可读存储介质存储有计算机指令,指令被处理器执行时实现如权利要求1-10中任一项所述的方法。
  27. 一种计算机程序产品,包括计算机指令,其中所述计算机指令被处理器执行时实现如权利要求1-8中任一项所述的方法。
PCT/CN2023/114426 2022-09-23 2023-08-23 缺陷识别方法、装置和系统 WO2024060917A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211163804.4A CN117808726A (zh) 2022-09-23 2022-09-23 用于云边协同的机器视觉缺陷识别方法、装置和系统
CN202211163804.4 2022-09-23

Publications (1)

Publication Number Publication Date
WO2024060917A1 true WO2024060917A1 (zh) 2024-03-28

Family

ID=90422358

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/114426 WO2024060917A1 (zh) 2022-09-23 2023-08-23 缺陷识别方法、装置和系统

Country Status (2)

Country Link
CN (1) CN117808726A (zh)
WO (1) WO2024060917A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084812A (zh) * 2019-05-10 2019-08-02 广东工业大学 一种太赫兹图像缺陷检测方法、装置、系统及存储介质
CN113298789A (zh) * 2021-05-28 2021-08-24 国网陕西省电力公司电力科学研究院 绝缘子缺陷检测方法、系统、电子设备及可读存储介质
WO2022065621A1 (ko) * 2020-09-28 2022-03-31 (주)미래융합정보기술 제품 결함 이미지 원격 학습을 통한 비전 검사 시스템
CN114581388A (zh) * 2022-02-24 2022-06-03 国能包神铁路集团有限责任公司 接触网零部件缺陷检测方法及装置
CN114612741A (zh) * 2022-03-02 2022-06-10 北京百度网讯科技有限公司 缺陷识别模型的训练方法、装置、电子设备及存储介质
CN114782933A (zh) * 2022-05-09 2022-07-22 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 基于多模态Transformer网络的驾驶员疲劳检测系统
CN114898121A (zh) * 2022-06-13 2022-08-12 河海大学 基于图注意力网络的混凝土坝缺陷图像描述自动生成方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084812A (zh) * 2019-05-10 2019-08-02 广东工业大学 一种太赫兹图像缺陷检测方法、装置、系统及存储介质
WO2022065621A1 (ko) * 2020-09-28 2022-03-31 (주)미래융합정보기술 제품 결함 이미지 원격 학습을 통한 비전 검사 시스템
CN113298789A (zh) * 2021-05-28 2021-08-24 国网陕西省电力公司电力科学研究院 绝缘子缺陷检测方法、系统、电子设备及可读存储介质
CN114581388A (zh) * 2022-02-24 2022-06-03 国能包神铁路集团有限责任公司 接触网零部件缺陷检测方法及装置
CN114612741A (zh) * 2022-03-02 2022-06-10 北京百度网讯科技有限公司 缺陷识别模型的训练方法、装置、电子设备及存储介质
CN114782933A (zh) * 2022-05-09 2022-07-22 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) 基于多模态Transformer网络的驾驶员疲劳检测系统
CN114898121A (zh) * 2022-06-13 2022-08-12 河海大学 基于图注意力网络的混凝土坝缺陷图像描述自动生成方法

Also Published As

Publication number Publication date
CN117808726A (zh) 2024-04-02

Similar Documents

Publication Publication Date Title
Chen et al. An edge traffic flow detection scheme based on deep learning in an intelligent transportation system
Li et al. Deep learning for smart industry: Efficient manufacture inspection system with fog computing
Xian et al. Monocular relative depth perception with web stereo data supervision
WO2024051222A1 (zh) 机器视觉缺陷识别方法和系统、边缘侧装置及存储介质
CN108171184B (zh) 基于Siamese网络的用于行人重识别的方法
CN106203242B (zh) 一种相似图像识别方法及设备
WO2022160170A1 (zh) 一种金属表面缺陷检测方法及装置
KR20180104609A (ko) 다수의 이미지 일치성을 바탕으로 보험클레임 사기 방지를 실현하는 방법, 시스템, 기기 및 판독 가능 저장매체
CN111144284B (zh) 深度人脸图像的生成方法、装置、电子设备及介质
KR20180004898A (ko) 딥러닝 기반의 이미지 처리 기술 및 그 방법
CN110852162B (zh) 一种人体完整度数据标注方法、装置及终端设备
CN109544522A (zh) 一种钢板表面缺陷检测方法及系统
Shang et al. Defect-aware transformer network for intelligent visual surface defect detection
TWI783200B (zh) 物件的瑕疵判斷方法及裝置
CN109598298B (zh) 图像物体识别方法和系统
CN115019135A (zh) 模型训练、目标检测方法、装置、电子设备及存储介质
CN114092478B (zh) 一种异常检测方法
Lu et al. HFENet: A lightweight hand‐crafted feature enhanced CNN for ceramic tile surface defect detection
WO2024060917A1 (zh) 缺陷识别方法、装置和系统
WO2022222036A1 (zh) 车位确定方法及装置
CN113420839B (zh) 用于堆叠平面目标物体的半自动标注方法及分割定位系统
Xue et al. Detection of Various Types of Metal Surface Defects Based on Image Processing.
CN113920055A (zh) 一种缺陷检测方法
CN111709404B (zh) 一种机房遗留物识别方法、系统以及设备
WO2020233414A1 (zh) 物体识别方法、装置及车辆

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23867210

Country of ref document: EP

Kind code of ref document: A1