CN113591654A

CN113591654A - Zinc flotation working condition identification method based on long-term depth characteristics

Info

Publication number: CN113591654A
Application number: CN202110833450.9A
Authority: CN
Inventors: 唐朝晖; 袁鹤; 张虎; 戴智恩; 田灿; 郑锶; 刘嘉鹏
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2021-11-02
Anticipated expiration: 2041-07-22
Also published as: CN113591654B

Abstract

The invention discloses a zinc flotation working condition identification method based on long-term depth characteristics, which comprises the following steps: firstly, a 3D convolution network is used as a basic network, and the partial structure of an RGB stream network is used for simulating the optical stream network by a knowledge distillation method, so that the RGB stream network can learn the motion information of the optical stream without extracting the optical stream during testing; then segmenting the single video, respectively extracting frame level features of each segment by using a trained RGB stream network, and inputting the extracted frame level features of each segment into an LSTM network for further extraction to obtain video level global space-time features; and supplementing a 2D convolutional network for the network, extracting the supplemented appearance characteristics, splicing the global space-time characteristics and the enhanced appearance characteristics together, and inputting the spliced appearance characteristics into a multilayer perceptron to carry out final working condition identification. The invention combines the advantages of the convolutional neural network and the cyclic neural network, and can quickly and accurately identify the zinc flotation working condition so as to effectively guide the dosing.

Description

Zinc flotation working condition identification method based on long-term depth characteristics

Technical Field

The invention relates to the technical field of froth flotation, in particular to a zinc flotation working condition identification method based on long-term depth characteristics

Background

Froth flotation is a mineral separation technology widely used in nonferrous metals, coal and petrochemical industries, and the technology effectively separates target minerals from useless vein ores by utilizing the hydrophilic-hydrophobic property difference of the minerals. The method comprises the following steps of feeding raw ore into a ball mill to be ground into particles with proper sizes, feeding mineral particles into a flotation tank, adding a corresponding flotation reagent into the flotation tank, introducing air from the bottom and continuously stirring to enable target mineral particles to be attached to the surface of foam and scraped out through a scraper blade, and enabling useless minerals to sink into ore pulp to be further processed.

Firstly, manual inspection and manual operation are adopted, however, manual observation has strong subjectivity and randomness, and the working condition cannot be accurately distinguished; and then, machine vision is introduced into a flotation site, so that the flotation working condition can be objectively described, and the development of the flotation process to production automation is promoted. However, the current method for identifying the working condition by adopting the machine vision mainly aims at a single frame image, and has very high misjudgment possibility for identification and judgment only by depending on the appearance information of one frame image. The defects of a single-frame picture can be overcome by combining the characteristics of the whole video for judgment, and a student applies the double-flow method to zinc flotation, but the original double-flow method needs to extract optical flow during reasoning, so that great delay is caused during application in an industrial field, and the double-flow method needs to be improved. In addition, the 3D convolution is often used when processing video data, but the depth of the 3D convolution kernel is much smaller than the number of frames extracted from a single-segment video, so that the 3D convolution can only extract short-term image changes, and the long-term change features of the whole video cannot be extracted well and the appearance feature extraction capability is poor, so that the problem needs to be solved.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a zinc flotation working condition identification method based on long-term depth characteristics, firstly, an RGB flow network is used for simulating the flow network through knowledge distillation, and the problem of large optical flow delay in extraction is solved while optical flow information is utilized; and then segmenting the video, respectively extracting frame level features of each segment by using an RGB stream network, inputting the extracted frame level features of each segment into an LSTM network to further extract video level global space-time features, and finally supplementing a 2DCNN network for the network, so that the problem of loss of RGB stream network appearance information during simulation of an optical flow network is solved.

The technical scheme adopted by the invention comprises the following specific steps:

the method comprises the following steps: collecting foam videos of zinc flotation by using a field image acquisition system, extracting frames of the videos, extracting a fixed frame number from each video, forming an RBG (radial basis group) stream data set by using images extracted from the videos and corresponding working condition categories, and obtaining an optical flow image and the corresponding working condition categories to form an optical flow data set by using a total variation L1 norm TVL1 method.

Step two: 3D ResNeXt101 is selected as the basic network structure of the RGB stream network and the optical stream network. Firstly, an optical flow data set is used as input, a cross entropy loss between a predicted value and a real value is used for training an optical flow network, and all weights of the optical flow network are frozen when the training is completed. And then, a knowledge distillation method is adopted to take the optical flow network as a teacher network and take the RGB network as a student network to train the RGB flow network. And if the optical flow network has n layers, training the RBG flow network by using the mean square error between the features of the n-1 th layer of the optical flow network and the RGB flow network as a loss function to learn the motion features. Simultaneously, an RBG stream data set is used as input, a cross entropy loss function between a predicted value and a real value is used for training an RBG stream network to learn appearance characteristics, the learning of motion characteristics and appearance characteristics is carried out simultaneously, and the following loss functions are used for training the RGB stream network:

wherein y is_RGBA predictive label representing the network of RGB streams,

the real label representing the RGB stream network, α is the weight parameter, fc, that regulates the two tasks_RGBRepresenting the characteristics, fc, of the RGB stream network_FlowRepresenting the characteristics of an optical flow network.

Step three: the depth of a convolution kernel of the 3D convolution network is shallow, only motion information in a short time range can be captured, and long-time-range dependent information of a video cannot be captured; therefore, segmentation is carried out in each video, and long-time sequence characteristics among all the video segments are extracted; firstly, respectively using trained RGB stream network to extract frame-level depth features of each video segment, wherein time sequence relation exists between different video segments in the same video, therefore, inputting each extracted frame-level depth feature into LSTM network for further extraction to obtain video-level global space-time feature vector F_gSelecting the first 100 depth features with the importance ranking larger than 50 to form a global space-time feature vector F_g＝[F_g1，F_g2，F_g3，...，F_g100](ii) a The 3D convolution network has strong extraction capability on the motion characteristics but poor extraction capability on the appearance characteristics, and a 2D convolution network is supplemented in the network structure to extract the appearance characteristics of the same video and take the mean value to obtain a supplemented appearance characteristic vector F_aSelecting the first 49 depth features with the importance ranking larger than 50 to form an enhanced appearance feature vector, F_a＝[F_a1，F_a2，F_a3，...，F_a49](ii) a Finally, combining the global space-time characteristics and the enhanced appearance characteristics together to be recorded as combined characteristics F, wherein the calculation formula is F-F_g+βF_aAnd beta is a weight adjusting parameter, and the combined characteristic F is input into the multilayer perceptron to carry out final working condition identification.

Compared with the prior art, the invention has the beneficial effects that: during training, a network simulation light stream network of RGB streams is used through knowledge distillation, so that light stream extraction in field use is avoided, and the problem that a large amount of time and resources are consumed for light stream extraction in the prior art is solved; segmenting a video, extracting frame-level depth features of each segment by using a trained RGB stream network, inputting the frame-level depth features into an LSTM network to further extract global space-time features of a video level, and solving the problem that a 3D convolution can only extract local frame-level space-time features between adjacent video frames; and finally, a 2D convolution network is supplemented for the network, so that the problem that appearance information is lost due to insufficient learning of the appearance information when the RGB stream network imitates the optical stream network is solved. Compared with the traditional double-flow method, the identification time is usually more than 30 seconds, the method can finish the working condition identification within 20 seconds, the accuracy is higher than 94%, and the use requirement of an industrial field is met.

Drawings

FIG. 1 is a schematic diagram of an RGB stream network emulating an optical stream network;

fig. 2 is a diagram of the overall network architecture of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The method comprises the following steps: collecting foam video of zinc flotation by using a field image acquisition system, dividing the foam flotation into 4 types of working conditions according to data of a fluorescence analyzer and expert experience, intercepting 96 frames of images at fixed intervals of each video of a video frame, forming an RBG (radial basis group) flow data set by the images extracted from the videos and the corresponding working condition categories, and obtaining an optical flow image and the corresponding working condition categories by using a TVL1 method to form an optical flow data set.

Step two: 3D ResNeXt101 is selected as the basic network structure of the RGB stream network and the optical stream network. Firstly, an optical flow data set is used as input, a cross entropy loss between a predicted value and a real value is used for training an optical flow network, and all weights of the optical flow network are frozen when the training is completed. And then, a knowledge distillation method is adopted to take the optical flow network as a teacher network and take the RGB network as a student network to train the RGB flow network. And if the optical flow network and the RGB flow network both have n layers, training the RBG network by using the mean square error between the features of the n-1 th layer of the optical flow network and the RGB flow network as a loss function to learn the motion features. Simultaneously, an RBG stream data set is used as input, a cross entropy loss function between a predicted value and a real value is used for training an RBG stream network to learn appearance characteristics, the learning of motion characteristics and appearance characteristics is carried out simultaneously, and the following loss functions are used for training the RGB stream network:

wherein y is_RGBA predictive label representing the network of RGB streams,

During training, an SGD optimizer is adopted, the initial learning rate is 0.1, the momentum parameter is 0.9, the weight attenuation parameter is 0.005, and the task adjustment parameter alpha is set to be 30.

Step three: the depth of a convolution kernel of the 3D convolution network is shallow, motion information in a short time can be captured only, and long-term dependence information of videos cannot be captured, so that segmentation is performed inside each video, and long-term time sequence characteristics among the videos are extracted. Firstly, 96 frames of images of each video are divided into 6 segments according to the time sequence, the trained RGB stream network is used for extracting the frame-level depth features of each video segment, and the time sequence relation exists between different video segments in the same video, so that the extracted frame-level depth features of each segment are input into an LSTM network for further extraction to obtain a video-level global space-time feature vector F_gSelecting the first 100 depth features with the importance ranking larger than 50 to form a global space-time feature vector F_g＝[F_g1，F_g2，F_g3，...，F_g100]. The 3D convolution network has stronger extraction capability on the motion characteristics but poorer extraction capability on the appearance characteristics, a 2D convolution network is supplemented in the network structure, and Efficii is selectedThe entNet network extracts the appearance characteristics of 96 frames of images of the same video and averages the appearance characteristics to obtain an enhanced appearance characteristic vector F_aSelecting the first 49 depth features with the importance ranking larger than 50 to form an enhanced appearance feature vector, F_a＝[F_a1，F_a2，F_a3，...，F_a49](ii) a Finally, combining the global space-time characteristics and the enhanced appearance characteristics together to be recorded as combined characteristics F, wherein the calculation formula is F-F_g+βF_aAnd beta is a weight adjusting parameter which is set to be 0.6, and the combined characteristic F is input into the multilayer perceptron to carry out final working condition identification.

The specific identification accuracy and identification time of the method and other methods on the froth flotation video data set are shown in the following table:

TABLE 1 runtime and identification accuracy of various methods on froth flotation data sets

As can be seen from the data analysis in the table, the method has the highest identification accuracy compared with other methods, can complete identification in relatively short time, and has better comprehensive effect.

Claims

1. A zinc flotation working condition identification method based on long-term depth characteristics is characterized by comprising the following steps:

the method comprises the following steps: collecting foam videos of zinc flotation by using a flotation field image acquisition system, extracting frames of the videos, extracting a fixed frame number from each video, forming an RBG (radial basis group) stream data set by using images extracted from the videos and working condition category labels corresponding to the images, and obtaining an optical flow image and the working condition category labels corresponding to the optical flow image by using a total variation L1 norm TVL1 method to form an optical flow data set;

step two: 3D ResNeXt101 is selected as the basic network structure of the RGB stream network and the optical stream network:

firstly, taking an optical flow data set as an input training optical flow network, and freezing all weights of the optical flow network when training is finished; then, the optical flow network is used as a teacher network, and the RGB flow network is used as a student network to train the RGB flow network by adopting a knowledge distillation method; setting an optical flow network and an RGB flow network to have n layers, and training the RBG flow network to learn motion characteristics by using the mean square error between the features of the n-1 layer of the optical flow network and the features of the n-1 layer of the RGB flow network as a loss function; simultaneously, taking an RBG stream data set as input, training an RBG stream network by using a cross entropy loss function between a predicted value and a true value to learn appearance characteristics, simultaneously performing learning tasks of motion characteristics and appearance characteristics, and training the RGB stream network by using the following loss functions:

wherein y is_RGBA predictive label representing the network of RGB streams,

the real label representing the RGB stream network, α is the weight parameter, fc, that regulates the two tasks_RGBRepresenting the characteristics, fc, of the RGB stream network_FlowFeatures representing an optical flow network;

step three: segmenting the interior of each video, and extracting long-time sequence characteristics among the videos; firstly, respectively extracting frame-level depth features of each video segment by using a trained RGB stream network, wherein time sequence relations exist among different video segments in the same video, and inputting the extracted frame-level depth features of each segment into an LSTM network for further extraction to obtain a video-level global space-time feature vector F_gSelecting the first 100 depth features with the importance ranking larger than 50 to form a global space-time feature vector F_g＝[F_g1，F_g2，F_g3，...，F_g100](ii) a The 3D convolution network has poor capacity of extracting the appearance characteristics, a 2D convolution network is supplemented in the network structure to extract the appearance characteristics of the same video and take the average value to obtain a supplemented appearance characteristic vector F_aSelecting the first 49 depths with the importance ranking larger than 50The features constitute a complementary appearance feature vector, F_a＝[F_a1，F_a2，F_a3，...，F_a49](ii) a Finally, combining the global space-time characteristics and the enhanced appearance characteristics together to be recorded as combined characteristics F, wherein the calculation formula is F-F_g+βF_aAnd beta is a weight adjusting parameter, and the combined characteristic F is input into the multilayer perceptron to carry out final working condition identification.

2. The method for identifying the zinc flotation working condition based on the long-term depth characteristic as claimed in claim 1, wherein in the first step: each video is decimated to 96 frames of images at fixed intervals.

3. The method for identifying the zinc flotation working condition based on the long-term depth characteristic as claimed in claim 1, wherein in the second step: training 3D resenext 101, using SGD optimizer, initial learning rate set to o.1, momentum parameter set to 0.9, weight decay parameter set to 0.005, weight parameter α set to 30.

4. The method for identifying the zinc flotation working condition based on the long-term depth characteristic according to claim 1, characterized in that in the third step: each video is divided into 6 sections, each section is 16 frames of images, an EfficientNet network is selected as the 2D convolutional network, and the weight parameter beta is set to be 0.6.