CN112991280A

CN112991280A - Visual detection method and system and electronic equipment

Info

Publication number: CN112991280A
Application number: CN202110235680.5A
Authority: CN
Inventors: 刁梁; 朱樊; 顾海松
Original assignee: Wangzhi Technology Shenzhen Co ltd
Current assignee: Wangzhi Technology Shenzhen Co ltd
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2021-06-18

Abstract

One or more embodiments of the specification disclose a visual inspection method, a system and an electronic device. The visual inspection method comprises the following steps: performing target detection object detection on each frame in the video based on a target detection algorithm; if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first number of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first quantity of the first target detection object in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; under the condition that the length of the motion trail is determined to be larger than the length threshold value, the related information of the first target detection object is determined, and the accuracy and the robustness of the visual detection system can be improved.

Description

Visual detection method and system and electronic equipment

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a visual inspection method, a system, and an electronic device.

Background

The product quality detection technology based on image processing is widely used, the current detection technology is mainly realized by adopting a traditional mode identification mode and a deep learning model, the accuracy of visual detection is improved along with the continuous evolution of the deep learning model, and the advantages of the deep learning model in actual production are gradually highlighted. The deep learning model is based on high abstraction of an objective function, so a large amount of target sample data is needed to train the model, and the deep learning model has good feature extraction and coding capability.

In an actual industrial scene, insufficient data such as difficulty in acquiring target samples, small number of target samples, uneven distribution of target samples and the like affect the performance of a deep learning model, so that the precision of a visual detection system is low and the robustness is poor. Therefore, how to improve the precision and robustness of the visual detection system becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

An object of one or more embodiments of the present disclosure is to provide a visual inspection method, a system, and an electronic device, which can improve the accuracy and robustness of a visual inspection system.

To solve the above technical problem, one or more embodiments of the present specification are implemented as follows:

in a first aspect, a visual inspection method is provided, where a sampling image for visual inspection by a visual inspection system is a video, and a scene of a target inspection object exists in the video, and the method further includes: performing target detection object detection on each frame in the video based on a target detection algorithm; if a first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first number of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection object in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and determining the related information of the first target detection object under the condition that the length of the motion trail is determined to be larger than a length threshold value.

In a second aspect, a visual inspection system is provided, in which a sampled image for visual inspection is a video, and a scene of a target inspection object exists in the video, the system including: the detection module is used for detecting a target detection object for each frame in the video based on a target detection algorithm; a key frame determination module, configured to, if a first target detection object appears, take a frame in which the first target detection object exists as a key frame, and determine a first coordinate and a first number of the first target detection object in the key frame; an information acquisition module, configured to acquire coordinates of a second target detection object in adjacent frames of the key frame based on a first coordinate and a first number of the first target detection object in the key frame; the judging module is used for judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; the identity correlation module is used for performing identity correlation on the first target detection object to obtain a motion track of the first target detection object; and the information acquisition module is used for determining the related information of the first target detection object under the condition that the length of the motion trail is determined to be greater than the length threshold value.

In a third aspect, an electronic device is provided, including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a visual inspection method as described above.

As can be seen from the technical solutions provided in one or more embodiments of the present disclosure, the visual inspection method provided in the present application is applicable to a video which is a sample image for the visual inspection by the visual inspection system, and a scene in which a target inspection object exists in the video. The video detection method comprises the steps of carrying out target detection object detection on each frame in a video based on a target detection algorithm; if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first number of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first quantity of the first target detection object in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and under the condition that the length of the motion trail is determined to be larger than the length threshold value, determining the related information of the first target detection object, wherein the related information of the first target detection object comprises the type and the position of the first target detection object. The method can process the video, judge the length of the motion track of the target detection object after judging that the target detection object appearing in the continuous frames is the same target detection object, determine the related information of the target detection object, is suitable for industrial scenes with difficult acquisition of target samples and small number of target samples, and can improve the precision and the robustness of a visual detection system.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, reference will now be made briefly to the attached drawings, which are needed in the description of one or more embodiments or prior art, and it should be apparent that the drawings in the description below are only some of the embodiments described in the specification, and that other drawings may be obtained by those skilled in the art without inventive exercise.

Fig. 1 is a schematic step diagram of a visual inspection method provided in an embodiment of the present disclosure.

Fig. 2 is a schematic step diagram of another visual inspection method provided in an embodiment of the present disclosure.

Fig. 3 is a schematic step diagram of another visual inspection method provided in an embodiment of the present disclosure.

Fig. 4 is a schematic step diagram of another visual inspection method provided in an embodiment of the present disclosure.

Fig. 5 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 6 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 7 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 8 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 9 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 10 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 11 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 12 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 13 is a schematic step diagram of another visual inspection method provided in an embodiment of the present specification.

Fig. 14 is a schematic structural diagram of a visual inspection system provided in an embodiment of the present disclosure.

Fig. 15 is a schematic structural diagram of another visual inspection system provided in an embodiment of the present disclosure.

Fig. 16 is a schematic structural diagram of an electronic device provided in an embodiment of the present specification.

Detailed Description

In order to make the technical solutions in the present specification better understood, the technical solutions in one or more embodiments of the present specification will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present specification, and it is obvious that the one or more embodiments described are only a part of the embodiments of the present specification, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from one or more of the embodiments described herein without making any inventive step shall fall within the scope of protection of this document.

The visual detection method is suitable for a scene that a sampling image of a visual detection system for visual detection is a video, and can improve the precision and robustness of the visual detection system under the conditions that a target sample is difficult to collect and the target sample is few. The visual inspection method and its respective steps provided in the present application will be described in detail below.

It should be noted that the target detection object mentioned in the present application is a defect existing in a product, such as a defect of black particles, hair, and wool incorporated in a liquid medicine bag.

Example one

Referring to fig. 1, a schematic step diagram of a visual inspection method provided in an embodiment of the present disclosure is shown. The visual detection method can be suitable for a visual detection system to perform visual detection, wherein the sampling image is a video, and a scene of a target detection object exists in the video. The visual inspection method comprises the following steps:

step 80: performing target detection object detection on each frame in the video based on a target detection algorithm;

at present, the target sample of an industrial scene is difficult to collect, and meanwhile, the consistency of the space-time sequence of a target detection object is poor. In view of the above problem, the visual inspection method provided in the embodiments of the present invention obtains the related information of the target detection object through tracking, determining and analyzing the target detection object appearing in consecutive frames after detecting the target detection object in each frame of the video, so as to reduce the false detection rate of the target detection object and improve the detection accuracy of the target detection object.

Step 81: if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first number of the first target detection object in the key frame;

after the first target detection object is detected, taking the frame with the first target detection object as a key frame, and simultaneously determining a first coordinate and a first number of the first target detection object in the key frame. There may be multiple first target detection objects in the same keyframe, and the number of first target detection objects and the first coordinates of the first target detection objects are determined.

Step 82: acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first quantity of the first target detection object in the key frame;

the coordinates of the second target detection object in the adjacent frames of the key frame may then be obtained based on a variety of ways, such as using correlation filtering, based on the first coordinates and the first number of first target detection objects in the key frame. And when the adjacent frames do not have correlation filtering response to the first target detection object, the key frame is considered to have false detection, the first target detection object detected in the key frame is removed, and whether the false detection occurs or not can be judged one by one under the condition of a plurality of first target detection objects. When the adjacent frame has a correlation filtering response to the first target detection object, the second target detection object needs to be added into the detection sequence (the first target detection object which has been detected before is added into the detection sequence), and whether the second target detection object which appears in the adjacent frame has the same defect as the first target detection object in the key frame or not is continuously tracked. It should be noted that the first target object and the second target object are used herein to distinguish between the impurities found in the key frame and the defects found in the neighboring frames of the key frame.

Step 83: judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

and under the condition that the second target detection object in the adjacent frame is judged to be the same as the first target detection object in the key frame, associating the first target detection object in the continuous frames so as to obtain the motion track of the first target detection object. The first target detection object and the second target detection object can be directly matched by adopting a Hungarian algorithm through the characteristic distance.

Step 84: carrying out identity association on the first target detection object to obtain a motion track of the first target detection object;

when the number of detected first target detection object lines in continuous frames exceeds a threshold value N, performing identity correlation on a first target detection object in the continuous frames, taking a key frame at the time t and an adjacent frame at the time t +1 as examples, after judging that the first target detection object in the key frame is the same as a second target detection object in the adjacent frame, respectively extracting a first extraction feature map of the first target detection object in the key frame at the time t and a second extraction feature map of the second target detection object in the adjacent frame at the time t +1 through regional pooling ROI posing, respectively combining features in the first extraction feature map and features in the second extraction feature map with Kalman filtering prediction center point coordinates to form a first feature vector and a second feature vector, and calculating a distance matrix of the first feature vector obtained from the key frame at the time t and the second feature vector obtained from the adjacent frame at the time t +1, and performing minimum distance matching by using a Hungarian algorithm to generate a motion trail of the first target detection object.

Step 85: and determining the related information of the first target detection object under the condition that the length of the motion track is determined to be larger than the length threshold value.

After the motion trail of the first target detection object is obtained, calculating the length of the motion trail, and when the length of the motion trail is larger than a length threshold value, determining that the related information of the first target detection object comprises the type and the position of the first target detection object, and confirming the detection defects. The length threshold value here may be set through the target detection object obtained by actual detection.

Referring to FIG. 2, in some embodiments, the sample image is a video, step 80: before detecting a target detection object for each frame in a video based on a target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises:

step 86: and carrying out video classification on the sampled image by adopting a double-current network model of 3D convolution.

The visual detection method provided by the embodiment of the invention adopts end-to-end 3D convolution as a component of a double-flow network model, and carries out video classification on the sampled image. The video classification may be based on the type of the target detection object, or may be based on other classification principles, which is not limited herein.

Referring to fig. 3, in some embodiments, a visual inspection method provided by embodiments of the present invention, step 82: acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection object in the key frame, specifically comprising:

step 820: carrying out forward neural network processing on the key frame to obtain an initial characteristic diagram of the key frame;

in this case, the key frame image is processed by the forward neural network to obtain an initial feature map of the key frame, so as to obtain a first pooled feature map of the first target detection object.

Step 821: if the second number of the second target detection objects appearing in the adjacent frames of the key frame is smaller than the first number, performing area pooling on the initial feature map by using the coordinates of the first target detection objects in the key frame to obtain a first pooled feature map;

for the key frame with the time t, if no target detection object is detected in the adjacent frames of the key frame or the number of the detected target detection objects is smaller than a first number, performing regional pooling ROI posing on the initial feature map obtained after the key frame is processed by the forward neural network based on a first coordinate of the first target detection object in the key frame to obtain a first pooled feature map. The size of the first pooling feature map is set here, and the size of the first pooling feature map obtained after the area pooling may be set.

Step 822: processing the first pooling feature map by a forward neural network to obtain a core feature map;

and inputting the first pooled feature map into a forward neural network for processing to obtain a core feature map, wherein the core feature map has a core size of 3 × l.

Step 823: expanding the boundary for performing regional pooling on the key frame, and performing regional pooling on adjacent frames of the key frame to obtain a second pooling feature map;

and expanding the boundary of the regional pooling ROI pooling on the key frame according to a certain proportion, and performing regional pooling ROI pooling on adjacent frames by utilizing space-time constraint to obtain a second pooling feature map. Here again, the second pooled feature map has a set size, where the size of the first pooled feature map is larger than the size of the second pooled feature map. In addition, the specific ratio may be set according to actual conditions.

Step 824: and performing convolution on the second pooling feature map by using the kernel feature map to obtain a second coordinate of the second target detection object in the adjacent frame.

And performing convolution on the second pooling feature map by using the kernel feature map, and obtaining a second coordinate of the second target detection object in the adjacent frame by using correlation filtering.

Specifically, taking the floating impurities in the liquid medicine bag as an example:

(1) detecting each frame in the video by using a target detection algorithm, if a first target detection object is detected, taking the frame as a key frame k _ t, setting the number of the detected first target detection objects as n _ t, and adding the first target detection object into a detection sequence;

(2) for a key frame with the time t, if a second target detection object is not detected in an adjacent frame or the number of the detected second target detection objects is less than n _ t, performing ROI posing on a feature map obtained by processing the key frame through a forward neural network by using a first coordinate of the first target detection object in the key frame to obtain a first pooling feature map F _ t with a fixed scale;

(3) and (3) processing the first pooling feature map F _ t to a neural network to obtain a kernel feature map conv _ key _ t with the kernel size of 3 x l, and expanding the boundary of the regional pooling ROI pooling on the key frame according to a certain proportion. And performing ROI posing on the adjacent frames by utilizing space-time constraint, namely using the expanded region pooling boundary to obtain a second pooling feature map F _ i of w x h x l, performing convolution on the second pooling feature map F _ i by utilizing the kernel feature map conv _ key, and performing similar correlation filtering to obtain a position heat map of a second target detection object in the adjacent frames. At this time, it should be noted that: if the correlation filtering of the adjacent frames does not respond, confirming that the key frame has false detection, and removing the detected first target detection object from the detection sequence; and if the correlation filtering of the adjacent frames generates a response, confirming to obtain the position of a second target detection object in the adjacent frames, continuously tracking whether the second target detection object and the first target detection object have the same characteristics, and adding the second target detection object into the detection sequence. Adding a detection sequence into a first target detection object, and removing the first target detection object from the detection sequence under the condition of confirming that the key frame has false detection;

(4) when the number of consecutive frames in which the target detection object can be detected exceeds the threshold value N, the second target detection object of the adjacent frame is subjected to identity correlation. Taking a key frame at the moment t and an adjacent frame at the moment t +1 as an example, after the processing of the previous 3 steps, the number of second target detection objects in the adjacent frame is consistent with the number of first target detection objects in the key frame, respectively extracting a first extraction feature map of the first target detection object in the key frame at the moment t and a second extraction feature map of the second target detection object in the adjacent frame at the moment t +1 through ROI posing, combining features in the first extraction feature map, features in the second extraction feature map and Kalman filtering prediction center point coordinates into a first feature vector and a second feature vector, calculating a distance matrix of the first feature vector obtained by the key frame at the moment t and the second feature vector obtained by the frame t +1, and performing minimum distance matching by using a Hungarian algorithm to generate a motion track of the first target detection object;

(5) and calculating the length of the motion trail, confirming that the floating impurities are detected when the length of the motion trail is greater than a length threshold value, and outputting the positions of the impurities.

Referring to FIG. 4, in some embodiments, the vision inspection system performs vision inspection using a vision inspection model, step 80: before detecting a target detection object for each frame in a video based on a target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises:

step 87: the video is input to a visual inspection model.

The visual detection method provided by the embodiment of the invention is completed by adopting the visual detection model as the deep learning model, the visual detection model comprises a forward neural network, and the visual detection model can be trained before visual detection to automatically learn the clustering of similar semantic features.

Referring to fig. 5, in some embodiments, step 87: after the sampling image is input to the visual inspection model, the visual inspection method provided by the embodiment of the invention further comprises the following steps:

step 30: carrying out feature decomposition on the sampled image based on the common features to obtain a feature sequence;

after the feature decomposition, a feature sequence corresponding to the sampling image is obtained, different sampling images can obtain different feature sequences, but the sequence of semantic features in the feature sequences is consistent. Here, the feature sequence is a sequence of semantic features.

Aiming at the problems that target samples which can be identified in the current industrial scene are difficult to collect, the number of the target samples is small, and the sample data is unevenly distributed, the visual detection method provided by the embodiment of the invention can improve the precision and the robustness of the visual detection system.

The visual inspection method provided by the embodiment of the invention for performing feature decomposition on the sampled image refers to the step of converting semantic features into a feature sequence serving as a multi-annotation sequence after performing refined decomposition on the semantic features of the image features of the target detection object, which is different from single-heat value annotation used in the industry at present. Each semantic feature in the feature sequence represents a certain common feature, or may be an individual feature, and the semantic features such as the common features obtained from each sampled image may be sequentially ordered according to a certain order during model training to form the feature sequence, so that the visual inspection method provided by the embodiment of the present invention realizes regression of semantic feature labels.

Step 31: coding the characteristic sequence to obtain a coding sequence;

after the characteristic sequence is obtained, each semantic characteristic in the characteristic sequence is coded to obtain a coding sequence corresponding to the characteristic sequence.

The semantic features of the target detection object are encoded, for example, the target detection object appearing in the liquid medicine bag is as follows:

hair has the following semantic features: black, elongated, flexible;

the wool has the following semantic features: white, elongated, flexible;

the bubble has the following semantic features: white, round;

the length of the coding sequence of each target detection object is set to be 5, and semantic features corresponding to each dimension are shown in the following table, namely, characters in the second row in the table are shown as feature sequences.

TABLE 1

1	2	3	4	5
					Whether it is black or not	Whether or not it is white	Whether or not the shape is elongated	Whether or not it is flexible	Whether the shape is circular or not

The semantic features of the three detection targets are coded, and the coding sequence of the hair is [1,0,1,1,0], the coding sequence of the wool is [0,1,1,1,0], and the coding sequence of the bubble is [0,1,0,0,1 ]. The multi-labeled coding sequence can be used for training the visual detection model in the deep learning stage of the visual detection model. Therefore, when a new target detection object, such as a black dot, appears, it has semantic features of: black, round, and the corresponding code sequence is [1,0,0,0,1], at this time, the trained visual inspection model has the abstract capability of multivariate semantic features, so that the visual inspection model can be directly used for visual inspection of a sampled image, or under the condition that only a small amount of defect data serving as a target detection object is provided, the visual inspection model can also reach the required precision, and the network structure and parameters of any visual inspection model do not need to be modified.

Step 32: the coding sequence is added to the output of the visual inspection model.

The code sequence may be an output of the visual inspection model, and the code sequence may be a vector of fixed length. In order to enhance the flexibility and the mobility of the visual detection model, the network head can share a feature map or a feature semantic vector output in the middle of the visual detection model, and a plurality of binary results are output through the cascading multitask full-connection layer. At this time, if a new coding sequence needs to be added, only a new initialized full-link layer needs to be cascaded, and only the parameters of the initialized full-link layer need to be finely adjusted, so that the mobility and flexibility of the network of the visual inspection model are more suitable for the visual inspection of new small samples which may appear.

Referring to fig. 6, in some embodiments, the visual inspection model includes a scene of a forward neural network, step 87: after the sampling image is input to the visual inspection model, the visual inspection method provided by the embodiment of the invention further comprises the following steps:

step 10: carrying out initial processing on the sampled image to obtain an initial semantic vector;

the initial processing is performed on the sampled image to obtain an initial semantic vector, for example, a backbone network may be used to perform the initial processing on the sampled image, for example, the sampled image is a convolution network such as a residual error network Resnet and Alexnet, or the initial semantic vector is obtained by performing the initial processing using a conventional algorithm such as a color histogram and a HOG operator, and the initial semantic vector is a semantic vector description of the sampled image and may be a feature map or a feature vector.

The initial processing is to extract the image features to be collected in the sampling image as corresponding semantic features and classify the semantic features.

Step 11: respectively inputting the initial semantic vectors into a plurality of first forward neural networks to obtain a plurality of intermediate semantic vectors;

target samples which can be identified in the current industrial scene are difficult to collect, the number of the target samples is small, and the data types are unevenly distributed, so that the target detection system has low precision and poor robustness. In the visual inspection method provided by the embodiment of the invention, the initial semantic vector is input into the first forward neural networks to obtain a plurality of intermediate semantic vectors [ v1, v 2.. vn ], and the relationship between the initial semantic vector and the intermediate semantic vectors can be that the initial semantic vector is subjected to further feature extraction through the first forward neural networks to obtain the intermediate semantic vectors.

Step 12: inputting the initial semantic vector into a second forward neural network to obtain an activation vector corresponding to the intermediate semantic vector;

and inputting the initial semantic vector into a second forward neural network to obtain an activation vector corresponding to the intermediate semantic vector, wherein the relationship between the initial semantic vector and the activation vector is that the initial semantic vector is subjected to further characteristic activation through the second forward neural network to obtain the activation vector.

The activation vector may be a feature vector, and the activation vector may be an n-dimensional feature vector corresponding to the number n of intermediate semantic vectors. And performing softmax or SIGMOD regression and the like on the initial semantic vector by the second forward neural network to obtain an activation vector W [ W1, W2.. wn ].

Step 13: taking the activation vector as the weight of the intermediate semantic vector to obtain a final semantic vector;

and then, taking the activation vector as the weight of a plurality of intermediate semantic vectors to obtain a final semantic vector vlast, and finally outputting an identification result through another forward neural network, wherein the forward neural network encodes the vlast and then outputs the encoded vlast. The calculation formula of the final semantic vector vlast is as follows:

the visual detection method provided by the embodiment of the invention can automatically induce semantic features and classify and cluster. For example, v1 can correspond to semantic features of colors, v2 corresponds to semantic features of shapes, and the activation vector W represents whether corresponding description operators are needed in the global semantic space of the target detection object, so that automatic analysis and semantic automatic clustering of semantic features of small samples and few samples can be realized through the response condition of the activation vector W and the intermediate semantic vectors [ v1, v 2.

Step 14: and taking the final semantic vector as the output of the visual detection model.

The intermediate semantic vectors v1 to vn have actual semantics, taking the detection of a liquid medicine flaw in a medicine bag as an example, when the black dots and hairs in the medicine liquid bag are visually detected, vi exists in the intermediate semantic vectors v1 to vn, the black dots and the hairs simultaneously generate responses on vi, and wi gives higher weight, and the intermediate semantic vector approximately represents the response to black. The intermediate semantic vector vi provides a detailed description of the target detection object, such as what black the black is, the depth, the brightness, and the like, and the expression of the semantics is richer.

Referring to FIG. 7, in some embodiments, step 80: before detecting a target detection object for each frame in a video based on a target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises:

step 40: optimizing the hyper-parameters by adopting gradient descent;

the visual detection method provided by the embodiment of the invention integrates sharpening processing into a network framework of a visual detection model, k is used as a derivative, the shape of the sharpening kernel is expanded into three-channel sharpening kernels (3,3,3), wherein the sharpening kernel is a constant matrix, and the constant matrix is adjusted through k. Since the hyperparameter k is in the neural network and is a variable, the optimization can be performed using a gradient descent approach:

and finally, obtaining the optimal value of the hyper-parameter k through self-deep learning of the visual inspection model.

Step 41: and sharpening the sampling image by adopting a sharpening core, wherein the sharpening core comprises a product of the hyper-parameter and the sharpening core.

Because the target sample data volume under the current industrial scene is small, the target detection object belongs to low frequency on the image gradient and has the interference of high frequency background, the optimal path found on the image gradient by the visual detection model during deep learning is likely to learn redundant information, and finally overfitting is carried out, so that the generalization capability of the visual detection model is poor.

According to the visual detection method provided by the embodiment of the invention, the low-frequency information in the original sampling image is changed into the high-frequency information, and then the high-frequency information is sent to the visual detection model for deep learning, so that the learning burden of the visual detection model is reduced, and the generalization capability of the visual detection model is improved. However, the traditional algorithm is mainly designed by people, involves a lot of prior knowledge, and is represented by hyper-parameters when being converted to a visual detection model for deep learning. This is called a priori preprocessing and will introduce the hyper-parameter, and this patent proposes to carry out parameter adjustment to the hyper-parameter in the deep learning stage. Specific examples are as follows:

for a black spot of a target detection object in a liquid medicine bag, a sharpening core adopted by the visual detection method provided by the embodiment of the invention is as follows, and an example of the sharpening core is used for the black spot:

before detection is carried out by using a visual detection model, a sharpening core is used for sharpening a sampling image, specific parameters of the sharpening core are determined by the specific defect type and shape serving as a target detection object, a super parameter k is used for controlling the sharpening strength, the visual detection model inputs the sampling image serving as an original three-channel image, and then the sampling image is respectively sharpened by using channel separation convolution, and then the processed three-channel image is output, wherein the processed three-channel image comprises a feature map and a feature vector. And subsequently, performing subsequent visual detection on the sharpened feature map and the feature vector.

Referring to FIG. 8, in some embodiments, where a target detection object is present in the sampled image, step 80: before detecting a target detection object for each frame in the video based on a target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises:

step 60: optimizing the hyper-parameters by adopting gradient descent;

for a partial flaw as a target object, such as a flaw with a pixel characteristic highly similar to that of the background, an area pixel enhancement module including 4 hyper-parameters (x1, x2, y1, y2) can be used as a pre-processing of visual inspection, and the specific formula is as follows:

the hyper-parameters (x1, x2, y1, y2) can be as small as 0-255, so that the hyper-parameters can be embedded into a deep learning network framework of a visual inspection model, and gradient descent self-optimization is used, and the optimization method of the hyper-parameter k in the previous embodiment can be referred to by adopting the gradient descent self-optimization. f (u, v) is the pixel value of the sampled image

Step 61: and enhancing the pixels of the area where the target detection object is located based on the hyper-parameters.

The visual detection method provided by the embodiment of the invention can realize that the target detection object is converted from low frequency to high frequency through the regional pixel enhancement module with the micro-ultra-parameters, and the ultra-parameters are automatically optimized through gradient descent. The regional pixel enhancement module can be a plug-in and can be used for processing a sampling image input by visual detection, the output of the regional pixel enhancement module is a characteristic diagram, and the hyper-parameters can be automatically optimized in the deep learning process.

Referring to FIG. 9, in some embodiments, step 87: before the sampling image is input to the visual inspection model, the visual inspection method provided by the embodiment of the invention further comprises the following steps:

step 62: and training a visual inspection model according to the random sample model.

At present, the qualitative standard boundary of the flaw serving as the target detection object is not clear in an industrial scene, so that the problem that the training data label with the target detection object is easy to generate wrong label and missing label before the visual detection model is deeply learned is solved, and the precision of the trained visual detection model is seriously influenced by data noise. Aiming at the problems of wrong labeling and missing labeling of data and noise caused by the wrong labeling, the visual detection method provided by the embodiment of the invention can train a visual detection model according to a random sample model.

The following description is provided for training a visual inspection model based on a Random Sample model of Random Sample Consensus (RANSAC).

Setting original data with data noise as Dr, total data as N, verification set as Dt, iteration times D1 and D2, and sampling probabilities [ p1 and p2], and implementing the following steps:

1. setting an initial abandon proportion coefficient D;

2. discarding data with the proportion of D randomly in Dr, generating a new data set by the remaining data of Dr, and training a visual detection model;

3. testing by respectively adopting Dr and Dt by using a trained visual detection model to obtain the testing precision of Dr, putting samples with errors smaller than a threshold value in the testing result of Dr into Dri, wherein the rest samples in the testing result of Dr are Dro, the threshold value can be selected according to specific practical conditions, Dri represents a class inner point queue, and Dro represents a class outer point queue;

4. randomly discarding data with the proportion D from Dri and Dro, wherein the selection probability of the discarded data obeys [ p1, p2], training a visual detection model by using the non-discarded data in Dri and Dro, if the precision of the current visual detection model in a verification set Dt is verified to be higher than that of a previous visual detection model, keeping the current visual detection model, and replacing samples with errors smaller than a threshold value in the testing result of Dr with the rest samples in Dr;

5. repeating (4) until the number of iterations is D1;

d + step size (here, gradually increasing the discard fraction), repeat (4) until the number of iterations reaches D2;

7. and outputting the optimal visual detection model.

Referring to FIG. 10, in some embodiments, step 87: before the sampling image is input to the visual inspection model, the visual inspection method provided by the embodiment of the invention further comprises the following steps:

step 71: iteratively training a visual inspection model by using an original dirty data set, wherein training data in the original dirty data set are labeled;

at present, the qualitative standard boundary of the flaw serving as the target detection object in the industrial scene is not clear, so that the problem that the training data label of the target detection object is easy to generate wrong label and missing label before the visual detection model is deeply learned is solved, and the precision of the trained visual detection model is seriously influenced by data noise. Aiming at the problems of wrong labeling and missing labeling of data and noise caused by the wrong labeling and missing labeling, the visual detection method provided by the embodiment of the invention can adopt a semi-supervised labeling and noise reduction method to train the visual detection model, and can realize that the visual detection model has self-error correction capability and reduce the noise.

Step 72: after the iteration is carried out for the set times, a prediction result is output to the training data by using a visual detection model;

the initial state has an original dirty Data set Data _ t0, the visual inspection model is trained by using the Data set, after a certain number of iterations, the visual inspection model is used for predicting the training set and obtaining an output result.

Step 73: if the prediction result is inconsistent with the label of the training data, determining that error-marked data occur in the training data;

at this time, the prediction result is inconsistent with the training data set, and how the visual inspection method provided by the embodiment of the present invention searches for the misregistered data is described below by taking fastercnn as an example.

Step 74: comparing the prediction result with the true value, determining the type of the target detection object, and setting confidence degree labels for the corresponding training data;

for the truth value and the prediction result, it is assumed that the following format [ xmin, ymin, xmax, ymax, c, c _ index ] has been generated by decoding, the former four dimensions are coordinates, c is a confidence vector, and c _ index is a prediction category, which includes a background class, where:

c_index＝argmax(c)

it can be seen that, since the detection network introduces coordinate parameters, it is first necessary to set labels for the training data, where the labels include type labels and position labels of the target detection objects, in the embodiment of the present invention, when the Intersection ratio IOU (overall name: Intersection over unit) > t of the manual label and the prediction result is set, t is a set threshold, it is considered that the prediction result matches the manual label, and each manual label only matches the prediction box with the largest IOU and the highest confidence, and when the prediction result and the IOU of the manual label are < t and c _ index! If it is 0 (not the background class), the manually labeled numeric index of the prediction box is considered to be 0. In order to subsequently obtain the positions of the target detection objects, if n target detection objects exist in a certain training picture, n regional pooling ROIs (English full name: region of interest) detected as background class at the fastercnnn stage are extracted, and the specific extraction mode is as follows: and sequencing the confidence degrees of whether the target exists in one stage, taking n with the highest confidence degrees of the background class, and simultaneously satisfying the requirement of the n ROIs and manually marking IOU < t.

Step 75: calculating a distance queue of confidence degrees based on the confidence degrees;

the numerical index (index) for a certain class of target detection objects is c _ i, c_tFor confidence, the numerical index may be a code corresponding to the type of the target detection object, for example, the breakage index is 1, the fragmentation index is 2, the confidence threshold is calculated, the training data labeled as c _ i is c _ n, any one of which is t, the predicted confidence vector is c:

calculating a distance queue of confidence levels for the training data, wherein the calculation formula for each artificially labeled distance queue is as follows:

marg in_{c_i}＝C_{c_i}-c_t[c_i]；

step 76: sorting the confidence levels based on the distance queues;

the confidence levels of the training data are ranked based on the distance queue.

Step 77: and based on the sorting result of the distance queue, at least selecting the training data label corresponding to the confidence coefficient with the largest distance as error label data.

And selecting the first position confidence degrees with larger distances from the sequencing results of the distance queue, and selecting training data with labels corresponding to the confidence degrees as error labeling data.

Referring to fig. 11, in some embodiments, the visual inspection system includes a product grasping device, step 87: before the sampling image is input to the visual inspection model, the visual inspection method provided by the embodiment of the invention further comprises the following steps:

step 70: and correcting the grabbing parameters of the grabbing device by adopting a network grabbing model.

In the current industrial detection scene, a visual detection system comprises a product gripping device and an image acquisition device, wherein the product gripping device grips a product, the image acquisition device shoots the product to acquire a sampling image, and the image acquisition device adopts a depth camera to acquire a space three-dimensional coordinate of a target detection object.

It should be mentioned that the adjustment of the gripping parameters of the gripping device requires a lot of manual experiments and is not robust. The visual detection method provided by the embodiment of the invention adopts the network grabbing device to correct the grabbing parameters of the grabbing device. The grabbing parameters may include grabbing coordinates or rotation angles of the robot arm in multiple degrees of freedom, for example. The network grabbing model and the visual detection model belong to deep learning models, and accurate target recognition and classification can be performed.

Referring to fig. 12, in some embodiments, in the visual inspection method provided by the embodiments of the present invention, the visual inspection system further includes an image capturing device, and step 70: adopting the network to snatch the parameter of snatching of model correction grabbing device, specifically include:

step 700: and training a network grabbing model based on the reward information output by the image acquisition device.

The visual detection method provided by the embodiment of the invention can train the network capture model by utilizing the reward information provided by the image acquisition device. The reward information is an interference factor influencing visual detection, and the more the reward information is, the smaller the interference factor is, the less the reward information is, and the larger the interference factor is. The following description will be given taking a bag for holding the liquid medicine as an example.

In the impurity detection of the liquid medicine bag, the motor of the gripping device grips the liquid medicine bag after rotating, and possible impurities in the liquid medicine bag move, but the generation of bubbles needs to be prohibited, so that the fewer bubbles in the example, the more reward information, and the more adverse bubbles in the example. Therefore, the grabbing target of the grabbing device is to minimize the number of bubbles B _ num and maximize the motor speed R, the current state quantity s includes the motor speed V _ t used in the previous stage and the number of generated bubbles B _ num _ t, the network grabbing model of the grabbing device outputs the motor speed V _ t +1 of the next stage, and the motor speed boundary is set to V _ B, then the reward function can be set as follows:

Reward＝-B_num+α*R

alpha is a scaling factor, the reward function provides an optimization direction for the network grabbing model, the larger the reward is, the larger the optimization power is provided, and the reward function is the weighting of the reward provided by the bubbles and the motor rotating speed as can be seen from the expression of the reward function.

The following is an objective function of a semi-offline parameter update algorithm for optimizing a grab network model, which may be used to optimize model parameters of the grab network model:

Obejective＝-output*Reward；

taking the state quantity s ([ v _ t, B _ num _ t ]) as the input of the semi-offline parameter updating algorithm of the network capture model, and converting the output result vp processed by the forward neural network in the network capture model because the rotating speed of the motor is a continuous motion space, wherein the specific formula is as follows:

V＝tanh(Vp)*V_b

the tan h is an activation function, positive and negative infinity are mapped between-1 and 1, the function can map a result in the maximum rotating speed, and a target function of a semi-offline parameter updating algorithm is solved by using gradient descent, which is specifically as follows:

after the network model is grabbed and the rotating speed of the motor is output, the network grabbing model detects the number of bubbles by using a visual algorithm, the current state quantity is added into a memory base, when grabbing parameters of the grabbing device are updated each time, besides the gradient generated in the previous stage, a plurality of samples are selected from the memory base to be used as training samples, and the gradient is strengthened through artificially set sample weights [ w1, w2, w3... wn ] so as to balance the magnitude of the gradient generated in the previous stage and past experience on the correction of the intelligent body.

Referring to fig. 13, in some embodiments, the visual inspection system includes an industrial personal computer and a cloud server, step 87: before the sampling image is input to the visual inspection model, the visual inspection method provided by the embodiment of the invention further comprises the following steps:

step 90: the industrial personal computer collects the sampling image and then sends the sampling image to the cloud server;

the industrial computer collects the sampling images collected by the camera through the USB interface, stores the samples of the sampling images, trains the models and identifies the target detection objects, and a user can complete all operations locally, particularly, the industrial computer can complete the operations efficiently by adopting an embedded system.

Correspondingly, step 85: after determining the information related to the first target detection object, the method further comprises:

step 91: at least the sample of the sampled image and the information related to the first target detection object are stored to a cloud server.

The sample storage and model training part can be additionally completed at the cloud end, the characteristics of the cloud end are combined, the efficiency is improved, and the cost required by hardware is reduced. The front-end industrial personal computer only reserves basic storage and basic operational capability required by the camera, the 5G wireless module and the storage model for acquiring the sampling image.

The method can be realized by sending the sampling image acquired by the industrial personal computer to the cloud server in real time for sample storage of the sampling image, model training, identification of a target detection object and sample storage of the final semantic vector.

By the technical scheme, the visual detection method is suitable for the visual detection system to perform visual detection, the sampling image is a video, and a scene with a target detection object exists in the video. The video detection method comprises the steps of carrying out target detection object detection on each frame in a video based on a target detection algorithm; if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first number of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first quantity of the first target detection object in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and under the condition that the length of the motion trail is determined to be larger than the length threshold value, determining the related information of the first target detection object, wherein the related information of the first target detection object comprises the type and the position of the first target detection object. The method can process the video, judge the length of the motion track of the target detection object after judging that the target detection object appearing in the continuous frames is the same target detection object, determine the related information of the target detection object, is suitable for industrial scenes with difficult acquisition of target samples and small number of target samples, and can improve the precision and the robustness of a visual detection system.

Example two

Referring to fig. 14, which is a schematic structural diagram of a visual inspection system provided in an embodiment of this specification, a sample image suitable for visual inspection is a video, and a scene in which a target detection object exists in the video. The visual inspection system includes:

the detection module 10 is used for detecting a target detection object for each frame in the video based on a target detection algorithm;

A key frame determining module 20, configured to, if the first target detection object appears, take a frame in which the first target detection object exists as a key frame, and determine a first coordinate and a first number of the first target detection object in the key frame;

An information obtaining module 30, configured to obtain coordinates of a second target detection object in adjacent frames of the key frame based on a first coordinate and a first number of the first target detection object in the key frame;

The judging module 40 is configured to judge that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

The identity correlation module 50 is configured to perform identity correlation on the first target detection object to obtain a motion trajectory of the first target detection object; and the number of the first and second groups,

And an information obtaining module 60, configured to determine relevant information of the first target detection object under the condition that it is determined that the length of the motion trajectory is greater than the length threshold.

Referring to fig. 15, in some embodiments, the visual inspection system according to embodiments of the present invention further includes a video classification module 70, where before performing target detection on each frame in the video based on the target detection algorithm, the video classification module 70 is configured to:

and carrying out video classification on the sampled image by adopting a double-current network model of 3D convolution.

In some embodiments, the visual inspection system provided in the embodiments of the present invention, the information obtaining module 30, is further configured to:

for the key frame with the time t, if no target detection object is detected in the adjacent frames of the key frame or the number of the detected target detection objects is smaller than a first number, performing regional pooling ROI posing on the initial feature map obtained after the key frame is processed by the forward neural network based on a first coordinate of the first target detection object in the key frame to obtain a first pooled feature map. The size of the first pooling feature map is set here, and the size of the first pooling feature map obtained after the area pooling may be set. Step 822: processing the first pooling feature map by a forward neural network to obtain a core feature map;

EXAMPLE III

Fig. 16 is a schematic structural diagram of an electronic device according to an embodiment provided in the present specification. On the hardware level, the electronic device comprises a processor and optionally an internal bus, a network interface and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 16, but that does not indicate only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code comprising computer operating instructions. The memory may include both memory and non-volatile storage and provides instructions and data to the processor.

The processor reads a corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the block chain consensus device on a logic level. The processor executes the program stored in the memory, and is specifically configured to execute the method steps corresponding to each execution main body in the embodiments of the present specification.

The method disclosed in the embodiments of fig. 1 to 13 in this specification may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The methods, steps, and logic blocks disclosed in one or more embodiments of the present specification may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in hardware, in a software module executed by a hardware decoding processor, or in a combination of the hardware and software modules executed by a hardware decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.

The electronic device may further perform the method in the embodiments shown in fig. 1 to 13, and implement the functions of the corresponding apparatus in the embodiments shown in fig. 14 to 15, which are not described herein again in this specification.

Of course, besides the software implementation, the electronic device of the embodiment of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or a logic device.

Example four

This specification embodiment also proposes a computer readable storage medium storing one or more programs, the one or more programs including instructions, which when executed by an electronic device including a plurality of application programs, can cause the electronic device to perform the method of the embodiment shown in fig. 1 to 13.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present specification shall be included in the protection scope of the present specification.

The system, apparatus, module or unit illustrated in one or more of the above embodiments may be implemented by a computer chip or an entity, or by an article of manufacture with a certain functionality. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Claims

1. A visual detection method is applicable to a visual detection system, wherein a sampling image for visual detection is a video, and a scene with a target detection object exists in the video, and the method comprises the following steps:

performing target detection object detection on each frame in the video based on a target detection algorithm;

if a first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first number of the first target detection object in the key frame;

acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection object in the key frame;

judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

carrying out identity association on the first target detection object to obtain a motion track of the first target detection object;

and determining the related information of the first target detection object under the condition that the length of the motion trail is determined to be larger than a length threshold value.

2. The visual inspection method of claim 1, wherein obtaining coordinates of a second target object in a frame adjacent to the keyframe based on the first coordinates and the first number of the first target object in the keyframe comprises:

carrying out forward neural network processing on the key frame to obtain an initial feature map of the key frame;

if the second number of second target detection objects appearing in the adjacent frames of the key frame is smaller than the first number, performing area pooling on the initial feature map by using the coordinates of the first target detection objects in the key frame to obtain a first pooled feature map;

processing the first pooling feature map by a forward neural network to obtain a core feature map;

expanding the boundary for performing regional pooling on the key frame, and performing regional pooling on the adjacent frames of the key frame to obtain a second pooling feature map;

and performing convolution on the second pooling feature map by using the kernel feature map to obtain a second coordinate of the second target detection object in the adjacent frame.

3. The visual inspection method of claim 1 or 2, wherein the visual inspection system performs visual inspection using a visual inspection model, and before performing target detection on each frame in the video based on a target detection algorithm, the method further comprises:

inputting a video to the visual inspection model.

4. The visual inspection method of claim 3, after inputting the sampled image to the visual inspection model, the method further comprising:

performing feature decomposition on the sampling image based on the common features to obtain a feature sequence;

coding the characteristic sequence to obtain the coding sequence;

adding the coding sequence to the output of the visual inspection model.

5. The visual inspection method of claim 3, the visual inspection model comprising a scene of a forward neural network, after inputting the sampled image to the visual inspection model, the method further comprising:

carrying out initial processing on the sampled image to obtain an initial semantic vector;

respectively inputting the initial semantic vectors into a plurality of first forward neural networks to obtain a plurality of intermediate semantic vectors;

inputting the initial semantic vector into a second forward neural network to obtain an activation vector corresponding to the intermediate semantic vector;

taking the activation vector as the weight of the intermediate semantic vector to obtain a final semantic vector;

and taking the final semantic vector as the output of the visual detection model.

6. The visual inspection method of claim 1, prior to performing target detection object detection on each frame in the video based on a target detection algorithm, the method further comprising:

optimizing the hyper-parameters by adopting gradient descent;

and sharpening the sampling image by adopting a sharpening core, wherein the sharpening core comprises a product of the super parameter and the sharpening core, or the pixels of the area where the target detection object is located are enhanced based on the super parameter.

7. The visual inspection method of claim 3, prior to inputting the sampled image to the visual inspection model, the method further comprising:

iteratively training the visual inspection model using an original dirty data set, wherein training data in the original dirty data set has been annotated;

after iteration is carried out for a set number of times, a prediction result is output to the training data by using the visual detection model;

if the prediction result is inconsistent with the label of the training data, determining that error marked data occur in the training data;

comparing the training data with the truth value, determining the type of the target detection object, and setting confidence coefficient for the corresponding training data;

calculating a distance queue for the confidence level based on the confidence level;

ranking the confidence levels based on the distance queue;

and at least selecting the labeled training data corresponding to the confidence coefficient with the largest distance as the mislabel data based on the sorting result of the distance queue.

8. The visual inspection method of claim 3, the visual inspection system including a product gripping device, prior to inputting the sample image to the visual inspection model, the method further comprising:

correcting the grabbing parameters of the grabbing device by adopting a network grabbing model;

adopting a network grabbing model to correct grabbing parameters of the grabbing device, the method specifically comprises the following steps:

and training the network grabbing model based on the reward information output by the image acquisition device.

9. A vision inspection system adapted to visually inspect a sample image as a video in which a scene of a target inspection object is present, the system comprising:

the detection module is used for detecting a target detection object for each frame in the video based on a target detection algorithm;

a key frame determination module, configured to, if a first target detection object appears, take a frame in which the first target detection object exists as a key frame, and determine a first coordinate and a first number of the first target detection object in the key frame;

an information acquisition module, configured to acquire coordinates of a second target detection object in adjacent frames of the key frame based on a first coordinate and a first number of the first target detection object in the key frame;

the judging module is used for judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

the identity correlation module is used for performing identity correlation on the first target detection object to obtain a motion track of the first target detection object; and the number of the first and second groups,

the information acquisition module is used for determining the related information of the first target detection object under the condition that the length of the motion trail is determined to be larger than a length threshold value.

10. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the visual inspection method of any one of claims 1 to 8.