CN112991280B

CN112991280B - Visual detection method, visual detection system and electronic equipment

Info

Publication number: CN112991280B
Application number: CN202110235680.5A
Authority: CN
Inventors: 刁梁; 朱樊; 顾海松
Original assignee: Wangzhi Technology Shenzhen Co ltd
Current assignee: Wangzhi Technology Shenzhen Co ltd
Priority date: 2021-03-03
Filing date: 2021-03-03
Publication date: 2024-05-28
Anticipated expiration: 2041-03-03
Also published as: CN112991280A

Abstract

One or more embodiments of the present specification disclose a visual inspection method, a system, and an electronic device. The visual inspection method comprises the following steps: detecting a target detection object for each frame in the video based on a target detection algorithm; if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first quantity of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; under the condition that the length of the motion trail is larger than the length threshold value, the related information of the first target detection object is determined, and the accuracy and the robustness of the visual detection system can be improved.

Description

Visual detection method, visual detection system and electronic equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a visual detection method, a visual detection system and electronic equipment.

Background

Product quality detection technology based on image processing is widely used, the existing detection technology is mainly realized by adopting a traditional mode recognition mode and a deep learning model, along with the continuous evolution of the deep learning model, the accuracy of visual detection is improved, and the advantages of the deep learning model in actual production are gradually highlighted. The deep learning model is based on the high abstraction of the objective function, so a large amount of objective sample data is needed to train the model, so that the deep learning model has good feature extraction and coding capability.

The actual industrial scene has the problems that the target samples are difficult to collect, the number of the target samples is small, the data of the target samples is uneven in data distribution and the like, the performance of the deep learning model is affected, and the accuracy and the robustness of the visual detection system are low. How to improve the accuracy and robustness of the visual inspection system is a technical problem to be solved.

Disclosure of Invention

It is an object of one or more embodiments of the present disclosure to provide a visual inspection method, system and electronic device, which can improve the accuracy and robustness of the visual inspection system.

To solve the above technical problems, one or more embodiments of the present specification are implemented as follows:

In a first aspect, a visual detection method is provided, where a sampled image suitable for visual detection by a visual detection system is a video, and a scene of a target detection object exists in the video, and the method further includes: detecting a target detection object for each frame in the video based on a target detection algorithm; if a first target detection object appears, taking a frame in which the first target detection object exists as a key frame, and determining a first coordinate and a first quantity of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and under the condition that the length of the motion trail is larger than a length threshold value, determining the related information of the first target detection object.

In a second aspect, a visual inspection system is provided, where a sampled image suitable for visual inspection is a video, and a scene of a target inspection object exists in the video, and the system includes: the detection module is used for detecting a target detection object for each frame in the video based on a target detection algorithm; a key frame determining module, configured to take a frame in which a first target detection object exists as a key frame if the first target detection object appears, and determine a first coordinate and a first number of the first target detection objects in the key frame; the information acquisition module is used for acquiring the coordinates of a second target detection object in the adjacent frames of the key frame based on the first coordinates and the first quantity of the first target detection objects in the key frame; the judging module is used for judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; the identity association module is used for carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and the information acquisition module is used for determining the related information of the first target detection object under the condition that the length of the motion trail is determined to be greater than a length threshold value.

In a third aspect, an electronic device is provided, comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the visual inspection method as described above.

As can be seen from the technical solutions provided in one or more embodiments of the present disclosure, the visual detection method provided by the present application is suitable for a video that is a sample image for visual detection by a visual detection system, where a scene of a target detection object exists in the video. The video detection method is used for detecting a target detection object for each frame in a video based on a target detection algorithm; if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first quantity of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and under the condition that the length of the motion trail is determined to be larger than the length threshold value, determining the related information of the first target detection object, wherein the related information of the first target detection object comprises the type and the position of the first target detection object. It can be seen that the visual detection method provided by the application can process video, and after judging that the target detection objects appearing in the continuous frames are the same target detection object, the length of the motion track of the target detection object is judged, so that the relevant information of the target detection object is determined, the visual detection method is suitable for industrial scenes with difficult acquisition of target samples and small number of target samples, and the accuracy and the robustness of the visual detection system can be improved.

Drawings

For a clearer description of one or more embodiments of the present description or of the solutions of the prior art, reference will be made below to the accompanying drawings which are used in the description of one or more embodiments or of the prior art, it being apparent that the drawings in the description below are only some of the embodiments described in the description, from which, without inventive faculty, other drawings can also be obtained for a person skilled in the art.

Fig. 1 is a schematic step diagram of a visual inspection method according to an embodiment of the present disclosure.

Fig. 2 is a schematic step diagram of another visual inspection method according to an embodiment of the present disclosure.

Fig. 3 is a schematic step diagram of another visual inspection method according to an embodiment of the present disclosure.

Fig. 4 is a schematic step diagram of another visual inspection method according to an embodiment of the present disclosure.

Fig. 5 is a schematic step diagram of another visual inspection method according to an embodiment of the present disclosure.

Fig. 6 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 7 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 8 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 9 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 10 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 11 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 12 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 13 is a schematic step diagram of a further visual inspection method according to an embodiment of the present disclosure.

Fig. 14 is a schematic structural diagram of a visual inspection system according to an embodiment of the present disclosure.

Fig. 15 is a schematic structural diagram of another visual inspection system according to an embodiment of the present disclosure.

Fig. 16 is a schematic structural view of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order that those skilled in the art will better understand the technical solutions in this specification, a clear and complete description of the technical solutions in one or more embodiments of this specification will be provided below with reference to the accompanying drawings in one or more embodiments of this specification, and it is apparent that the one or more embodiments described are only a part of embodiments of this specification, not all embodiments. All other embodiments, which can be made by one or more embodiments of the present disclosure without inventive faculty, are intended to be within the scope of the present disclosure.

The visual detection method provided by the application is suitable for a scene with a video as a sampling image for visual detection by a visual detection system, and can improve the accuracy and the robustness of the visual detection system under the conditions that a target sample is difficult to collect and fewer target samples are needed. The visual inspection method and the respective steps thereof provided by the present application will be described in detail below.

The object to be detected in the present application is a flaw existing in the product, such as a flaw of black particles, hair, knitting wool, etc. incorporated in the chemical liquid bag.

Example 1

Referring to fig. 1, a schematic diagram of steps of a visual inspection method according to an embodiment of the present disclosure is shown. The visual detection method can be applied to the case that a sampling image for visual detection by a visual detection system is a video, and a scene of a target detection object exists in the video. The visual inspection method comprises the following steps:

Step 80: detecting a target detection object for each frame in the video based on a target detection algorithm;

The acquisition of target samples in the current industrial scene is difficult, and the consistency of space-time sequences of target detection objects is poor. In order to solve the problem, the visual detection method provided by the embodiment of the invention obtains the related information of the target detection object through tracking, judging and analyzing the target detection object which appears in the continuous frames after the target detection object is detected in each frame in the video, reduces the false detection rate of the target detection object and improves the detection precision of the target detection object.

Step 81: if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first quantity of the first target detection object in the key frame;

After the first target detection object is detected, taking a frame with the first target detection object as a key frame, and simultaneously determining a first coordinate and a first quantity of the first target detection object in the key frame. There may be multiple first target detectors in the same keyframe, and the number of first target detectors and the first coordinates of the first target detectors are determined.

Step 82: acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame;

The coordinates of the second target detector in the neighboring frame of the key frame may then be obtained based on the first coordinates and the first number of first target detectors in the key frame in a number of ways, such as by correlation filtering. When the adjacent frames do not have correlation filtering response to the first target detection objects, false detection is considered to occur in the key frames, the first target detection objects detected in the key frames are removed, and certainly, whether false detection occurs can be judged one by one for the condition of a plurality of first target detection objects. In the event that a correlation filter response occurs for the first target detector for an adjacent frame, the second target detector needs to be added to the detection sequence (to which the first target detector that has been previously detected has been added), and tracking is continued as to whether the second target detector that occurs in the adjacent frame is of the same defect as the first target detector in the key frame. It should be noted that the first target object and the second target object are used to distinguish between the impurity found in the key frame and the flaw found in the adjacent frame of the key frame.

Step 83: judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

And under the condition that the second target detection object in the adjacent frame is judged to be the same as the first target detection object in the key frame, correlating the first target detection objects in the continuous frames, so as to obtain the motion trail of the first target detection objects. The first target detection object and the second target detection object can be directly matched through the characteristic distance by adopting the Hungary algorithm.

Step 84: carrying out identity association on the first target detection object to obtain a motion track of the first target detection object;

When the number of frames of the detected first target detection object out of the continuous frames exceeds a threshold value N, carrying out identity association on the first target detection object in the continuous frames, taking a key frame at the moment t and an adjacent frame at the moment t+1 as examples, after judging that the first target detection object in the key frame is identical to a second target detection object in the adjacent frame, respectively extracting a first extracted feature map of the first target detection object in the key frame at the moment t and a second extracted feature map of the second target detection object in the adjacent frame at the moment t+1 through regional pooling ROI pooling, combining the features in the first extracted feature map and the features in the second extracted feature map with Kalman filtering prediction center point coordinates to form a first feature vector and a second feature vector, calculating a distance matrix of the first feature vector obtained by the key frame at the moment t and the second feature vector obtained by the adjacent frame at the moment t+1, and using a Hungary algorithm to carry out minimum distance matching to generate a motion track of the first target detection object.

Step 85: and under the condition that the length of the motion trail is larger than the length threshold value, determining the related information of the first target detection object.

After the motion trail of the first target detection object is obtained, calculating the length of the motion trail, and when the length of the motion trail is larger than a length threshold value, determining that the related information of the first target detection object comprises the type and the position of the first target detection object, and confirming that the flaw is detected. The length threshold value may be set by the target detection object obtained by actual detection.

Referring to FIG. 2, in some embodiments, the sampled image is video, step 80: before detecting the target detection object for each frame in the video based on the target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 86: and adopting a 3D convolution double-flow network model to carry out video classification on the sampled image.

The visual detection method provided by the embodiment of the invention adopts end-to-end 3D convolution as a component of the double-flow network model, and carries out video classification on the sampled images. The video classification may be based on the type of the target detection object, or may be based on other classification principles, which are not limited herein.

Referring to fig. 3, in some embodiments, a visual inspection method provided in an embodiment of the present invention, step 82: acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame specifically comprises:

step 820: performing forward neural network processing on the key frame to obtain an initial feature map of the key frame;

the method is characterized in that a forward neural network processing is performed on a key frame image to obtain an initial characteristic image of a key frame, and the purpose is to obtain a first pooling characteristic image of a first target detection object.

Step 821: if the second number of the second target detection objects appears in the adjacent frames of the key frame is smaller than the first number, carrying out regional pooling on the initial feature map by utilizing the coordinates of the first target detection objects in the key frame to obtain a first pooled feature map;

And if the number of undetected target detection objects or detected target detection objects in adjacent frames of the key frame is smaller than the first number, carrying out regional pooling ROI pooling on an initial feature map obtained after the key frame is processed by a forward neural network based on the first coordinate of the first target detection object in the key frame to obtain a first pooled feature map. The size of the first pooled feature map is set, and the size of the first pooled feature map obtained after regional pooling may be set.

Step 822: processing the first pooled feature map through a forward neural network to obtain a nuclear feature map;

And inputting the first pooling feature map into a forward neural network to obtain a kernel feature map, for example, the kernel size is 3×3×l.

Step 823: expanding the boundary of the regional pooling on the key frame, and regional pooling is carried out on the adjacent frames of the key frame to obtain a second pooling feature map;

and after expanding the boundary of the regional pooling ROI pooling on the key frame according to a certain proportion, regional pooling ROI pooling is carried out on the adjacent frames by utilizing space-time constraint, and a second pooling feature map is obtained. The second pooled feature map here also has a set size, where the size of the first pooled feature map is larger than the size of the second pooled feature map. In addition, the certain ratio may be set according to the actual situation.

Step 824: and convolving the second pooled feature map with the core feature map to obtain a second coordinate of a second target detection object in the adjacent frame.

And convolving the second pooled feature map by using the core feature map, and obtaining a second coordinate of a second target detection object in the adjacent frame by similar correlation filtering.

Specifically, taking the impurity floating in the liquid medicine bag as an example:

(1) Detecting each frame in the video by using a target detection algorithm, if a first target detection object is detected, taking the frame as a key frame k_t, setting the number of the detected first target detection objects as n_t, and adding the first target detection objects to a detection sequence;

(2) For a key frame with the moment t, if no second target detection object is detected in the adjacent frame or the number of the detected second target detection objects is smaller than n_t, performing ROI pooling on a feature map obtained by processing the key frame through a forward neural network by using a first coordinate of a first target detection object in the key frame to obtain a first pooling feature map F_t with a fixed scale;

(3) And (3) processing the first pooling feature map F_t by a forward neural network to obtain a kernel feature map conv_key_t with a kernel size of 3 x l, and expanding the boundary of the regional pooling ROI pooling on the key frame according to a certain proportion. And performing ROI pooling on the adjacent frames by using the expanded regional pooled boundary by utilizing space-time constraint to obtain a second pooled feature map F_i with w.times.h.times.l, and performing convolution on the second pooled feature map F_i by utilizing a kernel feature map conv_key, and performing correlation-like filtering to obtain the position heat map of the second target detection object in the adjacent frames. At this time, it should be noted that: if the correlation filtering of the adjacent frames has no response, confirming that false detection occurs to the key frame, and removing the detected first target detection object from the detection sequence; if the correlation filtering of the adjacent frames generates a response, confirming that the position of the second target detection object in the adjacent frames is obtained, continuously tracking whether the second target detection object and the first target detection object have the same characteristics, and adding the second target detection object into the detection sequence. The first target detection object is added into the detection sequence before, and is removed from the detection sequence under the condition that false detection occurs in the key frame;

(4) When the number of continuous frames of the detectable target object exceeds the threshold value N, identity correlation is performed on the second target object of the adjacent frames. Taking a key frame at the moment t and an adjacent frame at the moment t+1 as an example, after the processing of the previous 3 steps, the number of second target detection objects in the adjacent frames is consistent with the number of first target detection objects in the key frame, extracting a first extracted feature map of the first target detection objects in the key frame at the moment t and a second extracted feature map of the second target detection objects in the adjacent frame at the moment t+1 respectively through ROI pooling, combining the features in the first extracted feature map, the features in the second extracted feature map and the Kalman filtering prediction center point coordinates into a first feature vector and a second feature vector, calculating a distance matrix of the first feature vector obtained by the key frame at the moment t and the second feature vector obtained by the frame at the moment t+1, performing minimum distance matching by using a Hungarian algorithm, and generating a motion track of the first target detection objects;

(5) And calculating the length of the motion trail, and when the length of the motion trail is larger than the length threshold value, confirming that the floating impurity is detected and outputting the position of the impurity.

Referring to FIG. 4, in some embodiments, the visual inspection system employs a visual inspection model for visual inspection, step 80: before detecting the target detection object for each frame in the video based on the target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 87: video is input to the visual inspection model.

The visual detection method provided by the embodiment of the invention is completed by adopting the visual detection model which is used as a deep learning model, wherein the visual detection model comprises a forward neural network, and the visual detection model can be trained before visual detection and automatically learn the clustering of similar semantic features.

Referring to fig. 5, in some embodiments, step 87: after the sampled image is input to the visual detection model, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 30: carrying out feature decomposition on the sampled image based on the common features to obtain a feature sequence;

After the features are decomposed, a feature sequence corresponding to the sampled image is obtained, different sampled images can obtain different feature sequences, but the ordering of semantic features in the feature sequences is consistent. The feature sequence here is a sequence of semantic features.

Aiming at the problems that the target samples which can be identified in the current industrial scene are difficult to collect, the number of the target samples is small, and the sample data is unevenly distributed, the visual detection method provided by the embodiment of the invention can improve the precision and the robustness of a visual detection system.

According to the visual detection method provided by the embodiment of the invention, the feature decomposition is performed on the sampled image, namely, after the refinement and decomposition of semantic features are performed on the image features of the target detection object, the semantic features are converted into a feature sequence serving as a multi-labeling sequence, and the feature sequence is different from single heat value labeling used in the industry at present. Each semantic feature in the feature sequence represents a certain common feature, or can be a personalized feature, and the semantic features such as the common features obtained by each sampling image can be sequentially ordered according to a certain sequence during model training to form the feature sequence, so that the visual detection method provided by the embodiment of the invention realizes regression of semantic feature labeling.

Step 31: coding the characteristic sequence to obtain a coding sequence;

After the feature sequence is obtained, each semantic feature in the feature sequence is encoded to obtain a coding sequence corresponding to the feature sequence.

The target detection object is subjected to semantic feature coding, for example, the target detection object appearing in the liquid medicine bag is as follows:

hair has the following semantic features: black, slender, flexible;

knitting wool has the following semantic features: white, slender, flexible;

the bubbles have the following semantic features: white, round;

And setting the length of the coding sequence of each target detection object to be 5, wherein the semantic features corresponding to each dimension are shown in the following table, namely the characters in the second row in the table are the feature sequences.

TABLE 1

1	2	3	4	5
					Whether or not to be black	Whether or not it is white	Whether or not the shape is elongate	Whether or not it is flexible	Whether or not the shape is circular

The semantic features of the three detection targets are encoded to obtain the hair code sequence [1,0,1,1,0], the knitting wool code sequence [0,1,1,1,0], and the bubble code sequence [0,1,0,0,1]. The multi-labeled coding sequence may be used to train the visual detection model during the deep learning stage of the visual detection model. Thus, when a new target detection object such as a black dot appears, it has the semantic features of: the black and round corresponding coding sequences are [1,0,0,0,1], and the trained visual detection model has the abstract capability of multi-element semantic features, so that the visual detection model can be directly used for visual detection of a sampling image, or can achieve the required precision under the condition that only a small amount of flaw data serving as a target detection object is provided, and the network structure and parameters of any visual detection model are not required to be modified.

Step 32: the coding sequence is added to the output of the visual inspection model.

The coding sequence may be an output of the visual inspection model, and the coding sequence may be a fixed-length vector. In order to enhance the flexibility and mobility of the visual detection model, the network head can share the feature graphs or feature semantic vectors output in the middle of the visual detection model and output a plurality of classification results through a cascade multi-task full-connection layer. If a new coding sequence needs to be added, only a new initialization full-connection layer needs to be cascaded, and only parameters of the initialization full-connection layer need to be finely adjusted, so that the mobility and the flexibility of the network of the visual detection model are more suitable for visual detection of a new small sample which can occur.

Referring to FIG. 6, in some embodiments, the visual inspection model includes a scene of a forward neural network, step 87: after the sampled image is input to the visual detection model, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 10: carrying out initial processing on the sampling image to obtain an initial semantic vector;

the initial processing is performed on the sampled image to obtain an initial semantic vector, for example, a backbone network may be used to perform initial processing on the sampled image, for example, the sampled image is a convolution network such as a residual network Resnet, alexnet, or an initial semantic vector obtained by performing initial processing using a conventional algorithm such as a color histogram, a HOG operator, etc., where the initial semantic vector is a semantic vector description of the sampled image and may be a feature map or a feature vector.

The initial processing is to extract the image features to be acquired in the sampled image as corresponding semantic features and classify the semantic features.

Step 11: respectively inputting the initial semantic vectors into a plurality of first forward neural networks to obtain a plurality of intermediate semantic vectors;

The identifiable target samples in the current industrial scene are difficult to collect, the number of the target samples is small, the data types are unevenly distributed, and therefore the accuracy of the target detection system is low and the robustness is poor. The visual detection method provided by the embodiment of the invention inputs the initial semantic vector into a plurality of first forward neural networks to obtain a plurality of intermediate semantic vectors [ v1, v2, ]. Vn ], wherein the relation between the initial semantic vector and the plurality of intermediate semantic vectors can be that the initial semantic vector is subjected to further feature extraction through the first forward neural networks to obtain the intermediate semantic vector.

Step 12: inputting the initial semantic vector into a second forward neural network to obtain an activation vector corresponding to the intermediate semantic vector;

And inputting the initial semantic vector into a second forward neural network to obtain an activation vector corresponding to the intermediate semantic vector, wherein the relation between the initial semantic vector and the activation vector is that the initial semantic vector is subjected to further feature activation through the second forward neural network to obtain the activation vector.

The activation vector may be a feature vector, corresponding to the number n of intermediate semantic vectors, and the activation vector may be an n-dimensional feature vector. The second forward neural network performs regression on the initial semantic vector such as softmax or SIGMOD to obtain an activation vector W [ W1, W2, ] wn.

Step 13: taking the activation vector as the weight of the intermediate semantic vector to obtain a final semantic vector;

And taking the activation vector as the weight of a plurality of intermediate semantic vectors to obtain a final semantic vector vlast, and finally outputting an identification result through another forward neural network, wherein the forward neural network encodes vlast and then outputs the encoded result. The calculation formula of the final semantic vector vlast is as follows:

The visual detection method provided by the embodiment of the invention can automatically induce semantic features and classify and cluster. For example, v1 may correspond to a semantic feature of a color, v2 corresponds to a semantic feature of a shape, and the activation vector W represents whether a corresponding description operator is needed in the global semantic space of the target detection object, and by activating the vector W and responding to the semantic feature by the intermediate semantic vector [ v1, v 2..vn ], automatic analysis and automatic semantic clustering of the semantic features of small samples and few samples may be implemented.

Step 14: and taking the final semantic vector as the output of the visual detection model.

Taking actual semantics of the intermediate semantic vectors v1 to vn as an example, when the detection of the defect of the liquid medicine in the medicine bag is carried out, vi exists in the intermediate semantic vectors v1 to vn, the black points and the hair can simultaneously respond on vi, and wi gives higher weight, so that the intermediate semantic vectors represent the response to black with high probability. The intermediate semantic vector vi provides a detailed description of the target detection object, such as what the black is, the shade, the brightness, etc., and the expression of the semantics is more abundant.

Referring to FIG. 7, in some embodiments, step 80: before detecting the target detection object for each frame in the video based on the target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises the following steps:

Step 40: optimizing super parameters by adopting gradient descent;

According to the visual detection method provided by the embodiment of the invention, sharpening is integrated into a network frame of a visual detection model, k is used as a conductive variable, the shape of the sharpening kernel is expanded into three-channel sharpening kernels (3, 3 and 3), the sharpening kernel is a constant matrix, and the constant matrix is adjusted through k. Since the hyper-parameter k is in the neural network and variable, it can be optimized using gradient descent:

finally, obtaining the optimal value of the super parameter k through self-deep learning of the visual detection model.

Step 41: and carrying out sharpening processing on the sampled image by adopting a sharpening core, wherein the sharpening core comprises the product of the super parameter and the sharpening core.

Because the target sample data volume in the current industrial scene is small, and the target detection object belongs to low frequency on the image gradient and has interference of high frequency background, the optimal path found on the image gradient by the visual detection model during deep learning is most likely to learn redundant information, and finally the optimal path is overfitted, so that the generalization capability of the visual detection model is poor.

The visual detection method provided by the embodiment of the invention changes the low-frequency information in the original sampling image into the high-frequency information, and sends the high-frequency information into the visual detection model for deep learning, thereby reducing the learning burden of the visual detection model and improving the generalization capability of the visual detection model. Because the traditional algorithm is designed mainly by people, a lot of priori knowledge is involved, and the traditional algorithm is expressed by super-parameters when being converted to a visual detection model for deep learning. This is called prior preprocessing, which introduces super-parameters, and this patent proposes to tune the super-parameters during the deep learning phase. Specific examples are as follows:

For a black spot of a target detection object in a liquid medicine bag, a sharpening core used in the visual detection method provided by the embodiment of the invention is an example of using the sharpening core for the black spot as follows:

Before the visual detection model is used for detection, a sharpening core is used for carrying out sharpening processing on a sampling image, specific parameters of the sharpening core are determined by specific flaw types and shapes serving as target detection objects, super-parameters k are used for controlling sharpening strength, the visual detection model inputs the sampling image as an original three-channel picture, and the three-channel picture after the three-channel picture is processed through channel separation convolution sharpening is output, wherein the three-channel picture comprises a feature image and a feature vector. The method is characterized in that the sharpened feature map and the feature vector are subjected to subsequent visual detection.

Referring to FIG. 8, in some embodiments, the target detection object is present in the sampled image, step 80: before detecting the target detection object for each frame in the video based on the target detection algorithm, the visual detection method provided by the embodiment of the invention further comprises the following steps:

Step 60: optimizing super parameters by adopting gradient descent;

For a portion of defects as the target detection object, such as the pixel characteristics of the defects are highly similar to the background, an area pixel enhancement module can be used as a pre-treatment for visual detection, wherein the area pixel enhancement module contains 4 super parameters (x 1, x2, y1, y 2), and the specific formula is as follows:

The super parameters (x 1, x2, y1, y 2) are slightly within 0-255, so that the super parameters can be embedded into a deep learning network frame of a visual detection model, gradient descent self-optimization is used, and the gradient descent self-optimization can be used for referring to the optimization method of the super parameter k in the previous embodiment. f (u, v) is the pixel value of the sampled image

Step 61: and enhancing the regional pixels where the target detection object is located based on the super-parameters.

The visual detection method provided by the embodiment of the invention can realize that the target detection object is changed from low frequency to high frequency through the regional pixel enhancement module with the micro super-parameters, and the super-parameters are automatically optimized through gradient descent. The regional pixel enhancement module can be a plug-in, can be used for processing a sampling image input by visual detection, and can be used for optimizing super parameters automatically in a deep learning process.

Referring to fig. 9, in some embodiments, step 87: before the sampled image is input into the visual detection model, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 62: the visual inspection model is trained on a random sample model.

At present, the qualitative standard boundary of the flaw serving as the target detection object is not clear in the industrial scene, so that the problem that the error mark and the missing mark are easily generated for the training data mark with the target detection object before the visual detection model performs deep learning is solved, and the accuracy of the trained visual detection model is seriously affected by data noise. Aiming at the problems of error labeling and label missing of data and noise caused by the error labeling, the visual detection method provided by the embodiment of the invention can train a visual detection model according to a random sample model.

The following describes training a visual inspection model with a random sample model based on random consistency sampling RANSAC (english name: random Sample Consensus).

Let Dr be the original data with data noise, N be the total data, dt be the verification set, and set up iteration number D1, D2, sampling probability [ p1, p2], the implementation steps are as follows:

1. Setting an initial discarding proportion coefficient D;

2. Discarding the data with the proportion of D in Dr randomly, generating a new data set by the remaining data of Dr, and training a visual detection model;

3. Testing by using a trained visual detection model by adopting Dr and Dt respectively to obtain the test precision of Dr, and putting a sample with the error smaller than a threshold value in the test result of Dr into Dri, wherein the residual sample in the test result of Dr is Dro, the threshold value can be selected according to specific practical conditions, dri represents an in-class point queue, and Dro represents an out-class point queue;

4. Randomly discarding data with the proportion D from Dri and Dro, wherein the selection probability of the discarded data is obeyed by [ p1, p2], training a visual detection model by utilizing the data which are not discarded in Dri and Dro, if the accuracy of the current visual detection model in a verification set Dt is greater than that of the previous visual detection model by verification, reserving the current visual detection model, replacing the Dri with a sample with an error smaller than a threshold value in a test result of Dr, and replacing the Dro with a residual sample in the test result of Dr;

5. Repeating (4) until the iteration number is D1;

d=d+step_size (here, increasing the discard fraction), repeating (4) until the number of iterations reaches D2;

7. And outputting the optimal visual detection model.

Referring to fig. 10, in some embodiments, step 87: before the sampled image is input into the visual detection model, the visual detection method provided by the embodiment of the invention further comprises the following steps:

Step 71: performing iterative training on the visual detection model by using the original dirty data set, wherein training data in the original dirty data set is marked;

At present, the qualitative standard boundary of the flaw serving as the target detection object is not clear in the industrial scene, so that the problem that the error mark and the missing mark are easily generated on the training data mark of the target detection object before the visual detection model performs deep learning is solved, and the accuracy of the trained visual detection model is seriously affected by data noise. Aiming at the problems of error marking and missing marking of data and noise caused by error marking, the visual detection method provided by the embodiment of the invention can train the visual detection model by adopting a semi-supervised marking noise reduction method, and can realize that the visual detection model has self-correction capability and reduces noise.

Step 72: after the iteration is set for times, outputting a prediction result to the training data by using a visual detection model;

In the initial state, an original dirty Data set data_t0 is provided, a visual detection model is trained by using the Data set, and after a certain number of iterations, the visual detection model is utilized to predict the training set and obtain an output result.

Step 73: if the prediction result is inconsistent with the label of the training data, determining that the error label data exists in the training data;

at this time, the prediction result is inconsistent with the training data set, and fasterrcnn is taken as an example below to describe how to find the error mark data in the visual detection method provided by the embodiment of the invention.

Step 74: comparing the prediction result with a true value, determining the type of the target detection object, and setting a confidence label for the corresponding training data;

For the true and predicted results, it is assumed that the following formats [ xmin, ymin, xmax, ymax, c, c_index ], the first four dimensions being coordinates, c being confidence vectors, c_index being the prediction class, including the background class, have been generated by decoding, wherein:

c_index＝argmax(c)

As can be seen, because the detection network introduces the coordinate parameters, firstly, the training data needs to be set with labels, wherein the labels comprise the type labels and the position labels of the target detection objects, in the embodiment of the invention, when the intersection ratio IOU (English full name: intersection over Union) > t of the manual labels and the prediction results is set, t is a set threshold value, the prediction results are considered to be matched with the manual labels, each manual label is only matched with the prediction frame with the largest IOU and the highest confidence level, and when the prediction results are matched with the IOU < t and c_index of the manual labels-! When=0 (not background class), the artificially labeled number index of the prediction frame is considered to be 0. In order to obtain the positions of the target detection objects subsequently, assuming that n target detection objects exist in a certain training picture, extracting fasterrcnn n regional pooling ROIs (english: region of interest) detected as background classes in one stage, wherein the specific extraction method is as follows: and sequencing whether the target confidence coefficient exists in one stage, taking n ROIs with highest background confidence coefficient, and simultaneously enabling the n ROIs to meet the condition of manual labeling IOU < t.

Step 75: calculating a distance queue of the confidence level based on the confidence level;

For a class of target test objects, the digital index (index) is c_i, c _t is the confidence, the digital index may be a code corresponding to the class of target test object, for example, the breakage index is 1, the fragmentation index is 2, the confidence threshold is calculated, the training data labeled c_i is set to c_n, any one of which is t, and the predicted confidence vector is c:

A distance queue for calculating confidence of training data, wherein a calculation formula of each artificially labeled distance queue is as follows:

marg in_{c_i}＝C_{c_i}-c_t[c_i]；

step 76: sorting the confidence levels based on the distance queues;

The confidence of the training data is ordered based on the distance queue.

Step 77: based on the sorting result of the distance queue, at least selecting training data labels corresponding to the confidence coefficient with the largest distance as error label data.

And selecting the first few confidence degrees with larger distances from the sorting results of the distance queue, and selecting training data with marks corresponding to the confidence degrees as error mark data.

Referring to fig. 11, in some embodiments, the visual inspection system includes a product gripping device, step 87: before the sampled image is input into the visual detection model, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 70: and correcting the grabbing parameters of the grabbing device by adopting a network grabbing model.

In the current industrial detection scene, a visual detection system comprises a product grabbing device and an image acquisition device, wherein the product grabbing device grabs a product, the image acquisition device acquires a sampling image of a shooting image of the product, and the image acquisition device can acquire a space three-dimensional coordinate of a target detection object by adopting a depth camera.

It should be mentioned that the adjustment of the gripping parameters of the gripping means requires a lot of manual experiments and is less robust. The vision detection method provided by the embodiment of the invention adopts the network grabbing device to correct the grabbing parameters of the grabbing device. The gripping parameters may include, for example, the gripping coordinates or the rotation angle of the robot arm with multiple degrees of freedom. The network grabbing model belongs to a deep learning model as well as the visual detection model, and can accurately identify and classify targets.

Referring to fig. 12, in some embodiments, the visual inspection method provided by the embodiment of the present invention, the visual inspection system further includes an image capturing device, step 70: the method for correcting the grabbing parameters of the grabbing device by adopting the network grabbing model specifically comprises the following steps:

step 700: and training the network grabbing model based on the rewarding information output by the image acquisition device.

The visual detection method provided by the embodiment of the invention can train the network grabbing model by utilizing the rewarding information provided by the image acquisition device. The reward information is an interference factor affecting visual detection, and the more the reward information is, the smaller the interference factor is, the less the reward information is, and the larger the interference factor is. The following description will take a grasping medical fluid bag as an example.

In the impurity detection of the medical fluid bag, when the motor of the grabbing device rotates to grab the medical fluid bag, possible impurities in the medical fluid bag move, but bubbles need to be forbidden, so that the fewer bubbles in the example, the more rewards information, and the more bubbles in the example are unfavorable. Therefore, the grabbing target of the grabbing device is to minimize the number of bubbles b_num and maximize the motor rotation speed R, the current state quantity s includes the motor rotation speed v_t used in the previous stage and the number of bubbles b_num_t generated, the network grabbing model of the grabbing device outputs the motor rotation speed v_t+1 in the next stage, and the motor rotation speed boundary is set to v_b, so that the following reward function can be set:

Reward＝-B_num+α*R

alpha is a scaling factor, the reward function provides an optimization direction for the network grabbing model, the larger the reward is, the larger the optimization power provided is, and the reward function is weighted by bubbles and rewards provided by motor rotation speed as can be seen from the expression of the reward function.

The following is an objective function of a semi-offline parameter update algorithm that optimizes the grasping network model, which can be used to optimize the model parameters of the grasping network model:

Obejective＝-output*Reward；

The state quantity s ([ v_t, B_num_t ]) is used as the input of a semi-offline parameter updating algorithm of the network grabbing model, and the output result vp processed by the forward neural network in the network grabbing model is converted because the motor rotating speed is a continuous action space, wherein the specific formula is as follows:

V＝tanh(Vp)*V_b

The tanh is an activation function, positive and negative infinity is mapped between-1 and-1, the function can enable a result to be mapped in the maximum rotating speed, and the objective function of the semi-offline parameter updating algorithm is solved by gradient descent, and the method specifically comprises the following steps:

after the grabbing network model outputs the motor rotating speed, the network grabbing model detects the number of bubbles by utilizing a visual algorithm, the current state quantity is added into a memory bank, when grabbing parameters of the grabbing device are updated each time, a plurality of samples are selected from the memory bank to serve as training samples besides gradients generated in the upper stage, and the gradients are enhanced by manually set sample weights [ w1, w2, w3...wn ] so as to balance the correction amplitude of the gradients to the intelligent body generated in the upper stage and past experience.

Referring to fig. 13, in some embodiments, the visual inspection system includes an industrial personal computer and a cloud server, step 87: before the sampled image is input into the visual detection model, the visual detection method provided by the embodiment of the invention further comprises the following steps:

step 90: the industrial personal computer collects the sampling image and then sends the sampling image to the cloud server;

the industrial personal computer collects the sampling images acquired by the camera through the USB interface, stores samples of the sampling images, trains a model and identifies target detection objects, and can finish all operations locally, and particularly, the industrial personal computer can finish the operations efficiently by adopting an embedded system.

Correspondingly, step 85: after determining the related information of the first target detection object, the method further comprises:

Step 91: at least information about the sample of the sampled image and the first target detection object is stored to a cloud server.

The sample storage and model training part can be added to the cloud to finish, and the cloud characteristics are combined, so that the efficiency is improved, and the cost required by hardware is reduced. The front-end industrial personal computer only reserves a camera required for collecting the sampling image, a 5G wireless module and basic storage and basic operation capacity required by a storage model.

The method can be that the sampling image collected by the industrial personal computer is sent to the cloud server in real time for sample storage of the sampling image, model training and identification of the target detection object, and sample storage of the final semantic vector.

Through the technical scheme, the visual detection method provided by the application is suitable for the video of the sampling image which is visually detected by the visual detection system, and the scene of the target detection object exists in the video. The video detection method is used for detecting a target detection object for each frame in a video based on a target detection algorithm; if the first target detection object appears, taking a frame with the first target detection object as a key frame, and determining a first coordinate and a first quantity of the first target detection object in the key frame; acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame; judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame; carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and under the condition that the length of the motion trail is determined to be larger than the length threshold value, determining the related information of the first target detection object, wherein the related information of the first target detection object comprises the type and the position of the first target detection object. It can be seen that the visual detection method provided by the application can process video, and after judging that the target detection objects appearing in the continuous frames are the same target detection object, the length of the motion track of the target detection object is judged, so that the relevant information of the target detection object is determined, the visual detection method is suitable for industrial scenes with difficult acquisition of target samples and small number of target samples, and the accuracy and the robustness of the visual detection system can be improved.

Example two

Referring to fig. 14, a schematic structural diagram of a visual inspection system provided in an embodiment of the present disclosure is shown, where a sample image suitable for visual inspection is a video, and a scene of a target inspection object exists in the video. The vision inspection system comprises:

a detection module 10, configured to perform object detection on each frame in the video based on an object detection algorithm;

A key frame determining module 20, configured to take a frame in which the first target detection object exists as a key frame if the first target detection object appears, and determine a first coordinate and a first number of the first target detection objects in the key frame;

An information obtaining module 30, configured to obtain coordinates of a second target detection object in an adjacent frame of the key frame based on the first coordinates of the first target detection object in the key frame and the first number;

A judging module 40, configured to judge that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

The identity association module 50 is configured to perform identity association on the first target detection object, so as to obtain a motion track of the first target detection object; and

The information obtaining module 60 is configured to determine information about the first target detection object under a condition that it is determined that the length of the motion trajectory is greater than the length threshold.

Referring to fig. 15, in some embodiments, the visual inspection system provided in the present invention further includes a video classification module 70, wherein before the object detection is performed on each frame in the video based on the object detection algorithm, the video classification module 70 is configured to:

And adopting a 3D convolution double-flow network model to carry out video classification on the sampled image.

In some embodiments, the visual inspection system, the information acquisition module 30, provided in the embodiments of the present invention is further configured to:

And if the number of undetected target detection objects or detected target detection objects in adjacent frames of the key frame is smaller than the first number, carrying out regional pooling ROI pooling on an initial feature map obtained after the key frame is processed by a forward neural network based on the first coordinate of the first target detection object in the key frame to obtain a first pooled feature map. The size of the first pooled feature map is set, and the size of the first pooled feature map obtained after regional pooling may be set. Step 822: processing the first pooled feature map through a forward neural network to obtain a nuclear feature map;

Example III

Fig. 16 is a schematic structural view of an electronic device of an embodiment provided in the embodiment of the present specification. At the hardware level, the electronic device comprises a processor, optionally an internal bus, a network interface, a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (PERIPHERAL COMPONENT INTERCONNECT, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in fig. 16, but not only one bus or one type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs to form the block chain consensus device on a logic level. The processor executes the program stored in the memory, and is specifically configured to execute the method steps corresponding to each execution body in the embodiments of the present disclosure.

The methods disclosed in the embodiments shown in fig. 1 to 13 of the present specification may be applied to a processor or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components. The methods, steps, and logic blocks disclosed in one or more embodiments of the present description may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with one or more embodiments of the present disclosure may be embodied directly in a hardware decoding processor or in a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The electronic device may also execute the method of the embodiment shown in fig. 1 to 13 and implement the functions of the corresponding apparatus in the embodiment shown in fig. 14 to 15, which are not described herein again.

Of course, in addition to the software implementation, the electronic device of the embodiments of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.

Example IV

The present description also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by an electronic device comprising a plurality of application programs, enable the electronic device to perform the methods of the embodiments shown in fig. 1-13.

In summary, the foregoing description is only a preferred embodiment of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the protection scope of the present specification.

The systems, devices, modules, or units illustrated in one or more of the embodiments described above may be implemented in particular by a computer chip or entity, or by a product having some function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

Claims

1. A visual inspection method, suitable for a visual inspection system to perform visual inspection of a sample image as a video, in which a scene of a target inspection object exists, the method comprising:

detecting a target detection object for each frame in the video based on a target detection algorithm;

If a first target detection object appears, taking a frame in which the first target detection object exists as a key frame, and determining a first coordinate and a first quantity of the first target detection object in the key frame;

Acquiring coordinates of a second target detection object in adjacent frames of the key frame based on the first coordinates and the first number of the first target detection objects in the key frame specifically comprises:

performing forward neural network processing on the key frame to obtain an initial feature map of the key frame;

If the second number of second target detection objects appears in adjacent frames of the key frame is smaller than the first number, carrying out regional pooling on the initial feature map by utilizing the coordinates of the first target detection objects in the key frame to obtain a first pooled feature map;

processing the first pooling feature map through a forward neural network to obtain a nuclear feature map;

expanding the boundary for regional pooling on the key frame, and regional pooling is carried out on the adjacent frames of the key frame to obtain a second pooling feature map;

Convolving the second pooled feature map with the kernel feature map to obtain a second coordinate of the second target detection object in the adjacent frame;

judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

Carrying out identity association on the first target detection object to obtain a motion track of the first target detection object;

and under the condition that the length of the motion trail is larger than a length threshold value, determining the related information of the first target detection object.

2. The visual inspection method of claim 1, wherein the visual inspection system performs visual inspection using a visual inspection model, the method further comprising, prior to performing object detection on each frame in the video based on an object detection algorithm:

Video is input to the visual inspection model.

3. The visual inspection method of claim 2, after inputting the sampled image into the visual inspection model, the method further comprising:

Performing feature decomposition on the sampling image based on the common features to obtain a feature sequence;

coding the characteristic sequence to obtain a coding sequence;

adding the coding sequence to the output of the visual inspection model.

4. The visual inspection method of claim 2, the visual inspection model comprising a scene of a forward neural network, the method further comprising, after inputting a sampled image into the visual inspection model:

carrying out initial processing on the sampling image to obtain an initial semantic vector;

Respectively inputting the initial semantic vectors into a plurality of first forward neural networks to obtain a plurality of intermediate semantic vectors;

Inputting the initial semantic vector into a second forward neural network to obtain an activation vector corresponding to the intermediate semantic vector;

Taking the activation vector as the weight of the intermediate semantic vector to obtain a final semantic vector;

and taking the final semantic vector as an output of the visual detection model.

5. The visual inspection method of claim 1, prior to object detection for each frame in the video based on an object detection algorithm, the method further comprising:

optimizing super parameters by adopting gradient descent;

And sharpening the sampled image by adopting a sharpening core, wherein the sharpening core comprises the product of the super parameter and the sharpening core, or enhancing pixels of the area where the target detection object is positioned based on the super parameter.

6. The visual inspection method of claim 2, prior to inputting a sampled image into the visual inspection model, the method further comprising:

Iteratively training the visual inspection model using an original set of dirty data, wherein training data in the original set of dirty data has been annotated;

After iteration is set for times, outputting a prediction result to the training data by utilizing the visual detection model;

if the prediction result is inconsistent with the label of the training data, determining that the error label data exists in the training data;

comparing the training data with a true value, determining the type of the target detection object, and setting a confidence coefficient for the corresponding training data;

Calculating a distance queue of the confidence degree based on the confidence degree;

ranking the confidence levels based on the distance queue;

And based on the sorting result of the distance queue, at least labeling the training data corresponding to the confidence coefficient with the largest distance is selected as error labeling data.

7. The visual inspection method of claim 2, the visual inspection system including a product gripping device, the method further comprising, prior to inputting a sampled image into the visual inspection model:

correcting grabbing parameters of the grabbing device by adopting a network grabbing model;

the method for correcting the grabbing parameters of the grabbing device by adopting the network grabbing model specifically comprises the following steps:

And training the network grabbing model based on the rewarding information output by the image acquisition device.

8. A visual inspection system adapted for visually inspecting a sample image as a video, wherein a scene of a target inspection object is present in the video, the system comprising:

The detection module is used for detecting a target detection object for each frame in the video based on a target detection algorithm;

A key frame determining module, configured to take a frame in which a first target detection object exists as a key frame if the first target detection object appears, and determine a first coordinate and a first number of the first target detection objects in the key frame;

The information obtaining module is configured to obtain, based on the first coordinates and the first number of the first target detection objects in the key frame, coordinates of the second target detection objects in adjacent frames of the key frame, and specifically includes:

the judging module is used for judging that the second target detection object in the adjacent frame is the same as the first target detection object in the key frame;

the identity association module is used for carrying out identity association on the first target detection object to obtain a motion track of the first target detection object; and

The information acquisition module is used for determining the related information of the first target detection object under the condition that the length of the motion trail is determined to be larger than a length threshold value.

9. An electronic device, comprising:

A processor; and

A memory arranged to store computer executable instructions which, when executed, cause the processor to perform the visual detection method of any one of claims 1 to 7.