CN116604151A - System and method for monitoring MIG welding seam state based on audio-visual dual mode - Google Patents

System and method for monitoring MIG welding seam state based on audio-visual dual mode Download PDF

Info

Publication number
CN116604151A
CN116604151A CN202310575668.8A CN202310575668A CN116604151A CN 116604151 A CN116604151 A CN 116604151A CN 202310575668 A CN202310575668 A CN 202310575668A CN 116604151 A CN116604151 A CN 116604151A
Authority
CN
China
Prior art keywords
visual
audio
sound
sound pressure
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310575668.8A
Other languages
Chinese (zh)
Inventor
马成
褚健
庄开宇
杨根科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Original Assignee
Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University filed Critical Ningbo Institute Of Artificial Intelligence Shanghai Jiaotong University
Priority to CN202310575668.8A priority Critical patent/CN116604151A/en
Publication of CN116604151A publication Critical patent/CN116604151A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01DMEASURING NOT SPECIALLY ADAPTED FOR A SPECIFIC VARIABLE; ARRANGEMENTS FOR MEASURING TWO OR MORE VARIABLES NOT COVERED IN A SINGLE OTHER SUBCLASS; TARIFF METERING APPARATUS; MEASURING OR TESTING NOT OTHERWISE PROVIDED FOR
    • G01D21/00Measuring or testing not otherwise provided for
    • G01D21/02Measuring two or more variables by means not covered by a single other subclass
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23KSOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
    • B23K9/00Arc welding or cutting
    • B23K9/16Arc welding or cutting making use of shielding gas
    • B23K9/173Arc welding or cutting making use of shielding gas and of a consumable electrode
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B23MACHINE TOOLS; METAL-WORKING NOT OTHERWISE PROVIDED FOR
    • B23KSOLDERING OR UNSOLDERING; WELDING; CLADDING OR PLATING BY SOLDERING OR WELDING; CUTTING BY APPLYING HEAT LOCALLY, e.g. FLAME CUTTING; WORKING BY LASER BEAM
    • B23K9/00Arc welding or cutting
    • B23K9/32Accessories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/02Preprocessing
    • G06F2218/04Denoising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Plasma & Fusion (AREA)
  • Mechanical Engineering (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a monitoring system and a monitoring method for a MIG welding seam state based on audio-visual dual mode, which relate to the technical field of intelligent welding and comprise an audio-visual data acquisition module, an audio-visual data processing module and a welding seam penetration state identification module. The method comprises the following steps: step 1, installing a synchronous audio-visual data acquisition system, and acquiring data to obtain audio-visual data; step 2, preprocessing data, namely carrying out noise reduction, unified mode and data standardization processing on the acquired audio-visual data; step 3, building a welding seam penetration state identification model based on audio-visual double modes; step 4, training a weld penetration state recognition model, and adjusting and optimizing a network structure; and 5, deploying the weld penetration state identification model to an industrial site to realize real-time detection. The technical scheme provided by the invention integrates multisource information in the welding process and extracts the mode characteristics with high quality, realizes better weld penetration state identification, and greatly improves the efficiency and accuracy of welding quality monitoring.

Description

System and method for monitoring MIG welding seam state based on audio-visual dual mode
Technical Field
The invention relates to the technical field of intelligent welding, in particular to an audio-visual dual-mode-based MIG welding seam state monitoring system and method.
Background
In recent years, with the proposal of intelligent manufacturing concepts and the high-speed development of technologies such as artificial intelligence, industrial internet, sensors and the like, more and more industrial sites start to use intelligent systems to monitor the production state in the production process, so as to avoid the quality problem of products and cause great damage to the economic society. The traditional welding quality detection mode is postweld detection, and the compactness, magnetic powder, ultrasonic and ray detection and the like are carried out on the welded weldment. However, the conventional post-welding inspection method has the problems of complicated process, high cost and poor real-time performance, and is difficult to meet the requirement of the current-stage intelligent manufacturing on automatic quality detection.
Welding is a complex process involving physical quantities of electricity, light, sound, heat, etc., and the huge heat generated between the electrodes during welding can melt the metals to connect the metals, and the metals are connected through a solid-liquid-solid process. Consumable electrode inert gas shielded welding (Melt Inert Gas Arc Welding, MIG) is one welding method commonly used in today's aluminum welding that uses a meltable wire as an electrode, inert gas (Ar or He) as a shielding gas, and an arc between the wire and the weldment as a heat source. In the welding process, shielding gas is conveyed to a welding position by a pipeline to isolate external air, a welding wire is melted into molten drops, the molten drops enter a welding pool, and a welding seam is formed after cooling. According to different working conditions, the penetration state of the welding seam generated in the welding process is mainly divided into four types: partial penetration, complete penetration, penetration and penetration. The penetration state of a weld has been used as one of the most important indicators of the quality of the welding process.
For many years, many researchers have conducted intensive research in the field of weld penetration state monitoring, and traditional research has often focused on analyzing the characteristics of a single modality during welding. Wu et al in China patent application No. CN201610119705.4, a method for judging the penetration state based on the characteristics of the back holes, realizes effective recognition of the penetration state of a welding seam through a molten pool back hole image and an extreme learning machine model. Gao Yanfeng et al in the chinese patent application for invention, "a method for assisting welder to judge weld penetration state on line by using arc sound" (CN 202210291248.2), study on the frequency domain of welding arc sound by regression analysis and deep learning, recognize weld penetration state and feed back weld state to welder by different vibration forms. The above research mainly focuses on single modes such as an electric signal, an acoustic signal or a visual signal in the welding process, and obtains a certain effect by applying a machine learning and deep learning method, however, in a complex industrial production field, the single modes are often interfered by various factors, the signal to noise ratio of the signal is reduced, the characteristic information related to the penetration state of the welding seam is also reduced, and the final recognition effect is often not optimal.
Therefore, those skilled in the art are working to develop a new weld penetration state identification system and method, which solve the problem in the prior art that the single mode characteristic in the weld penetration state monitoring cannot meet the requirement of high-precision identification.
Disclosure of Invention
In view of the above-mentioned drawbacks of the prior art, the present invention is to solve the technical problem of how to overcome the problem that the requirement of high-precision identification cannot be met due to the single mode characteristic in the weld penetration state monitoring.
In order to achieve the above object, the present invention provides a MIG welding seam state monitoring system based on audio-visual dual mode, comprising:
the audio-visual data acquisition module comprises a visual data acquisition unit, a sound data acquisition unit and a synchronous acquisition communication unit, and acquires data generated in the welding process to obtain audio-visual data;
the audio-visual data processing module is connected with the audio-visual data acquisition module and comprises a noise reduction unit, a unified mode unit and a data standardization unit, and the noise reduction, unified mode and data standardization are respectively carried out on the acquired audio-visual data;
the welding seam penetration state recognition module is connected with the audio-visual data processing module and comprises a visual feature extraction unit, a sound feature extraction unit, an audio-visual feature fusion unit and a welding seam penetration state classification unit, wherein the visual feature extraction unit and the sound feature extraction unit are used for extracting features of the processed audio-visual data, the audio-visual feature fusion unit is used for carrying out feature fusion, and the welding seam penetration state classification unit is entered to realize recognition of the welding seam penetration state.
Further, in the audio-visual data acquisition module,
the visual data acquisition unit is used for communicating a welding machine with an industrial CCD camera through a GigE Vision communication protocol, respectively configuring the industrial CCD camera and transmitting data streams through a GVGP protocol and a GVGP protocol at an application layer of the communication protocol, and adding a filter with a specific wavelength in front of a lens of the industrial CCD camera;
the sound data acquisition unit acquires and stores sound signals in real time through a DAQexpress data acquisition module of LabVIEW image programming control NI matched with the sound pressure sensor, and the sound signals are collected through a microphone, then enter a csv file of a signal regulator and are converted into sound pressure signals, and are stored in a designated file;
the synchronous acquisition communication unit adopts a TCP transmission protocol to realize synchronous communication between the welding machine and the data acquisition equipment, and controls the start and the end of the work of the industrial CCD camera and the sound pressure sensor by monitoring the existence of welding current of the welding machine.
Further, in the audio-visual data processing module,
the noise reduction unit uses a median filter and an adaptive noise complete set empirical mode decomposition-wavelet packet threshold noise reduction method to remove noise in the sound signal;
And the unified mode unit acquires a time-frequency diagram of the time sequence through short-time Fourier transformation, and combines the sound pressure signal of the one-dimensional time sequence and the visual signal of the two-dimensional gray level image into the same mode for subsequent processing.
Further, in the bead penetration state recognition module,
the visual feature extraction unit comprises a CNN network and a spatial attention mechanism; the CNN network comprises a plurality of input layers, a convolution layer, an activation function, a pooling layer and a full connection layer; the CNN updates weight through back propagation; the spatial attention mechanism processes channel attention and spatial attention on the feature layer of the CNN network;
the sound feature extraction unit comprises the CNN network and a multi-scale convolution kernel combined convolution layer; the multi-scale convolution kernel combined convolution layer extracts more time domain and frequency domain characteristic information of the sound signal by adjusting the length and the width of a convolution kernel and combining the length and the width of the convolution kernel; the multi-scale convolution kernel combined convolution layer is arranged behind the input layer;
the audio-visual feature fusion unit flattens the deep visual features extracted by the visual feature extraction unit and the deep sound features extracted by the sound feature extraction unit into feature vectors of N1 and M1 respectively, and connects the feature vectors to form a fusion feature vector of (N+M) 1;
The weld penetration state classification unit comprises two full-connection layers, a softmax layer and an output layer; through the two full-connection layers, the dimension of the fusion feature vector is reduced to the number of categories to be classified, the softmax layer converts the number of categories into the probability of the corresponding categories, and finally the output layer outputs the obtained weld penetration state.
The invention also provides a monitoring method of the state of the MIG welding seam based on audio-visual double modes, which comprises the following steps:
step 1, installing a synchronous audio-visual data acquisition system, and acquiring data to obtain audio-visual data;
step 2, data preprocessing, namely carrying out noise reduction, unified mode and data standardization processing on the acquired audio-visual data;
step 3, building a welding seam penetration state identification model based on audio-visual double modes;
step 4, training the weld penetration state recognition model, and adjusting and optimizing a network structure;
and 5, deploying the weld penetration state identification model to an industrial site to realize real-time detection.
Further, the step 1 includes the following substeps:
step 1.1, fixing a CCD camera and a sound pressure sensor on a welding machine arm towards the contact point of a welding wire and a workpiece, so as to ensure that the middle is free from shielding and complete molten pool images can be acquired;
Step 1.2, setting sampling frequencies of the CCD camera and the sound pressure sensor, setting the sampling frequency of the CCD camera to be 50 frames per second, setting the sampling frequency of the sound pressure sensor to be 50kHz, and starting a LabVIEW program for acquisition initiation control through TCP communication;
step 1.3, setting working parameters of the welding machine, including: arc starting current, welding current and wire feeding speed; and starting the welding machine to start working, starting the CCD camera and the sound pressure sensor to work, collecting the molten pool image and the sound pressure signal in the welding process, obtaining the audio-visual data and storing the audio-visual data.
Further, the step 2 includes the following substeps:
step 2.1, median filtering is carried out on the acquired molten pool image so as to filter redundant image noise and improve the signal-to-noise ratio:
g(x,y)=Med{f(a i ,b j ),(a i ,b j )∈A}
wherein f, g is the value of the pixel point before and after median filtering, A is a filtering window;
step 2.2, performing adaptive noise complete set empirical mode decomposition-wavelet packet threshold noise reduction on the acquired sound pressure signal to remove noise in the sound signal;
step 2.3, determining the length L of each sound frame according to the sampling frequency settings of the CCD camera and the sound pressure sensor:
Wherein f a ,f i Sampling frequencies of the sound pressure sensor and the CCD camera are respectively set;
step 2.4, converting the sound pressure signal of one discrete number into a two-dimensional time-frequency spectrogram through STFT:
wherein an is the sound pressure signal, and w n is a Hamming window function;
and 2.5, re-adjusting the obtained molten pool image and the obtained two-dimensional time-frequency spectrogram into an image with the size of (100, 3) to finish preprocessing the audio-visual data.
Further, the step 2.2 comprises the following sub-steps:
2.2.1, decomposing the sound pressure signal into a plurality of modal components through adaptive noise perfect set empirical mode decomposition, and marking the sound pressure signal as y (t);
step 2.2.2, white Gaussian noise v i (t) adding the sound pressure signal y (t) to obtain a new signal No. 1:
y(t)+(-1) s εv i (t)
wherein s=1, 2; epsilon is a standard table of white noise, i=1, 2, … N, N is the number of components finally obtained by decomposition;
subjecting the new signal No. 1 to Empirical Mode Decomposition (EMD) to obtain a first orderEigenmode function (IMF) C 1 :
Wherein r is i () Is thatAn external residual signal;
averaging the obtained N IMFs to obtain a first IMF component for performing the adaptive noise complete set empirical mode decomposition
The residual signal after removing the first IMF component is r 1 (t):
At residual signal r 1 Adding positive and negative Gaussian white noise pairs into (t) to obtain a No. 2 new signal, and performing EMD (empirical mode decomposition) on the No. 2 new signal to obtain a first-order modal component D 1 Further obtaining a second IMF component obtained by performing empirical mode decomposition of the adaptive noise complete setAnd removing the residual signal r after the second IMF component 2 (t):
Repeating the above operation until the obtained residual signal r K+1 () Monotonous, can not continue to decompose;
at this time, K IMF components are obtained, and the sound pressure signal y (t) is decomposed into:
so far, the adaptive noise complete set empirical mode decomposition process is finished;
step 2.2.3, comparing the correlation coefficient or variance contribution rate of each IMF component and the sound pressure signal y (t), and screening out a noise component;
the correlation coefficient is as follows:
wherein x is i For each of the IMF components, y is the acoustic pressure signal;
the variance contribution ratio is:
wherein D (j) is the variance of the IMF component:
step 2.2.4, wavelet packet threshold noise reduction is carried out; defining subspacesAs a function U n () Is (are) closure space, ">As a function U 2n () Is closed space of U n () The method meets the following conditions:
wherein h (k) and g (k) are low-pass and high-pass filters with the length of 2N, and g (k) = (-1) is satisfied k h (k-1) is a sequence constructed by the above formulaA determined orthogonal wavelet packet;
is provided with Can be expressed as:
and (3) obtaining a wavelet packet decomposition result:
the threshold method judgment criteria are set as follows:
reconstructing the signal after threshold denoising:
thus, the sound pressure signal y (t) is noise reduced.
Further, the step 3 includes the following substeps:
step 3.1, the welding seam penetration state identification model based on the audio-visual double modes comprises a visual feature extraction convolutional neural network; the visual feature extraction convolutional neural network comprises: four convolutional layers, four max pooling layers, and two fully connected layers, the activation function is the ReLU function:
f(x)=max(0,x)
the number of convolution kernels in the convolution layer is 8, 16, 32 and 64 in sequence, the sizes of the convolution kernels are 5*5, and the step sizes are 1; the size of the pooling core of the maximum pooling layer is 3*3, and the step length is 2; the node number of the full connection layer is 3136 and 256, and the output characteristic size of the CNN network is 1×256;
step 3.2, adding a visual attention mechanism including a channel attention mechanism and a space attention mechanism into the visual feature extraction convolutional neural network;
the channel attention mechanism carries out global average pooling and global maximum pooling on input features, inputs the results of the global average pooling and the global maximum pooling into a shared network consisting of a multi-layer perceptron (MLP) and a hidden layer for processing, adds the two results obtained by processing, acquires the channel attention weight of the input features by using a sigmoid function, and multiplies the acquired channel attention weight by the input features, namely:
Wherein Mc is the channel attention weight, F is the input feature, sigma is the sigmoid function, W 0 、W 1 As the weight of the MLP in question,pooling the global average and theGlobal maximum pooling results;
the spatial attention mechanism obtains the maximum value and the average value of channels of each input feature, the convolution layer with the number of the stacked input channels being 1 obtains the weight corresponding to each input feature by using the sigmoid function, and multiplies the obtained corresponding weight by the input feature:
wherein Ms is the spatial attention weight, F is the input feature, σ is the sigmoid function, F 7×7 For a convolution operation of 7 x 7,results for spatial averaging pooling and maximum pooling;
step 3.3, the welding seam penetration state recognition model based on the audio-visual double modes further comprises a sound feature extraction convolutional neural network, and the structure of the sound feature extraction convolutional neural network is generally consistent with that of the visual feature extraction convolutional neural network; the acoustic feature extraction convolutional neural network uses a multi-scale convolutional kernel combination convolutional layer to replace a 5*5 convolutional kernel in the network, and proper padding is performed before convolution; the classification accuracy obtained by comparing different combinations is used for carrying out optimal combination on parallel channels corresponding to convolution kernels of 3*8, 4*6, 5*5, 6*4 and 8*3;
And 3.4, performing feature fusion on the visual features, splicing the 256-dimensional feature vectors extracted by the visual feature extraction convolutional neural network and the 256-dimensional feature vectors extracted by the sound feature extraction convolutional neural network to obtain 512-dimensional fusion feature vectors, inputting the 512-dimensional fusion feature vectors into a classification unit, performing two-time full connection operation to convert the fusion feature vectors into 4-dimensional feature vectors, and entering a softmax layer to finish a final weld penetration state classification task.
Further, training the weld penetration state recognition model in the step 4 includes setting corresponding loss functions, optimizers, learning rate, batch size, epoch, drop out rate, num_class parameters.
The system and the method for monitoring the state of the MIG welding seam based on the audio-visual double modes have the following technical effects:
1. in conventional machine learning, feature engineering is often complex for large-scale data sets. Feature extraction and feature selection rely heavily on a priori knowledge and require manual effort. Besides, the generalization performance of the machine learning algorithm is poor, and the machine learning algorithm cannot adapt to the weld penetration state recognition task under different working conditions. The technical scheme provided by the invention can efficiently realize the effective extraction of the weld joint characteristics by constructing a deep learning network and using methods such as CNN, attention mechanism, multi-scale convolution kernel combination convolution and the like, and can improve the effect of the model under other working conditions in a transfer learning mode on the basis, thereby having stronger generalization performance;
2. In the deep learning of a single mode, the method is limited by a severe factory environment, the signal to noise ratio of the acquired single mode data is not very high, the data quality is low, and the recognition effect is not ideal. The visual and auditory mode fusion deep learning provided by the technical scheme provided by the invention can realize the advantages and the disadvantages during the feature extraction by synchronously collecting visual and auditory information and carrying out corresponding processing on the visual and auditory information, thereby greatly improving the quality of the feature quantity, being more beneficial to improving the recognition effect, being capable of integrating multisource information in the welding process, realizing high-quality extraction of the model features, realizing better weld penetration state recognition and greatly improving the efficiency and the accuracy of welding quality monitoring.
The conception, specific structure, and technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, features, and effects of the present invention.
Drawings
FIG. 1 is a flow chart of weld penetration status identification in accordance with a preferred embodiment of the present invention;
FIG. 2 is a CEEMDAN-wavelet packet thresholding joint noise reduction flowchart of the embodiment shown in FIG. 1;
FIG. 3 is a schematic diagram of a weld penetration state identification model of the embodiment of FIG. 1;
FIG. 4 is a schematic diagram of the channel attention mechanism and spatial attention mechanism of the CBAM of the embodiment of FIG. 1;
fig. 5 is a multi-scale convolution kernel combination convolution schematic of the embodiment shown in fig. 1.
Detailed Description
The following description of the preferred embodiments of the present invention refers to the accompanying drawings, which make the technical contents thereof more clear and easy to understand. The present invention may be embodied in many different forms of embodiments and the scope of the present invention is not limited to only the embodiments described herein.
Aiming at the problem that the single mode characteristic in the existing weld penetration state monitoring cannot meet the requirement of high-precision identification, the invention provides a practical and effective identification method for monitoring the quality of a welding process, and the accurate identification of the weld penetration state can be effectively realized by constructing an end-to-end weld penetration state identification model, so that the cost reduction and efficiency improvement of welding quality detection are realized.
Aiming at the process of coexistence of multiple modal information, the concept of multi-modal deep learning (Multi Modal Deep Learning) has been proposed in recent years, has achieved a certain success in the fields of natural language processing, audio and video recognition and the like, and has been applied to the fields of remote sensing images, medical images, human body gesture recognition and the like. Therefore, in the embodiment of the invention, the recognition problem of the welding seam penetration state is solved by using the multi-mode fusion idea. Synchronous visual and auditory data are acquired through an audio-visual data acquisition module and are fused on a feature level, so that feature quantity is greatly enriched, and a attention mechanism is introduced when the MIG welding visual information feature is extracted, so that the accurate extraction of the weld joint part feature in a visual image is realized; when the MIG welding auditory information features are extracted, the multi-scale convolution kernel combination convolution extraction features are carried out on the time spectrum obtained through STFT, so that feature extraction for richer sound signals is realized.
Example 1
The embodiment of the invention provides an audio-visual dual-mode-based MIG welding seam state monitoring system, which comprises an audio-visual data acquisition module, an audio-visual data processing module and a welding seam penetration state identification module. The audiovisual data acquisition module acquires data generated in the welding process in real time and synchronously, and mainly acquires molten pool images and arc sound signals. The audio-visual data processing module carries out corresponding processing on the acquired original data, including noise reduction, unified mode and data standardization. The weld penetration state identification module respectively performs feature extraction on the processed audio-visual data by using a convolution neural network (Convolutional Neural Networks, CNN), an attention mechanism and a multi-scale convolution kernel combination convolution method, fuses the audio-visual features in a feature fusion layer, and finally enters a classification network to realize effective identification of the weld penetration state.
Wherein, the liquid crystal display device comprises a liquid crystal display device,
the audio-visual data acquisition module comprises a visual data acquisition unit, a sound data acquisition unit and a synchronous acquisition communication unit, and acquires data generated in the welding process to obtain audio-visual data;
the audio-visual data processing module is connected with the audio-visual data acquisition module and comprises a noise reduction unit, a unified mode unit and a data standardization unit, and noise reduction, unified mode and data standardization are respectively carried out on the acquired audio-visual data;
The welding seam penetration state recognition module is connected with the audio-visual data processing module and comprises a visual feature extraction unit, a sound feature extraction unit, an audio-visual feature fusion unit and a welding seam penetration state classification unit, wherein the visual feature extraction unit and the sound feature extraction unit are used for extracting features of the processed audio-visual data, the audio-visual feature fusion unit is used for carrying out feature fusion, and the welding seam penetration state classification unit is entered to recognize the welding seam penetration state.
Example 2
On the basis of example 1, visual data acquisition unit: in the visual data acquisition process, a welding machine host and an industrial CCD camera are communicated through a GigE Vision communication protocol, and the CCD camera is configured and data flow is transmitted through a GVGP protocol and a GVGP protocol respectively at the application layer of the communication protocol. In addition, a filter with a specific wavelength is added in front of the lens to reduce the influence of strong arc light in the welding process on the molten pool image.
Sound data acquisition unit: in the process of sound data acquisition, a DAQexpress data acquisition module of NI matched with a sound pressure sensor is controlled through LabVIEW image programming to acquire and store sound signals in real time, and the sound signals are collected through a microphone, then enter a signal regulator to be converted into csv files of the sound pressure signals, and are stored in a designated file;
Synchronous acquisition communication unit: the synchronous communication between the welding machine and the data acquisition equipment is realized by adopting a TCP transmission protocol, and the starting and ending of the work of the industrial CCD camera and the sound pressure sensor are controlled by monitoring the existence of welding current of the welding machine.
The main function of the audiovisual data processing is to process the collected original data to obtain standard and high-quality data to be input into a welding seam penetration state recognition module for final recognition.
Noise reduction unit: soldering is a complex process involving electricity, light, sound, heat, and noise is inevitably generated in this process. The visual information noise is mainly caused by arc light or metal splash, and the embodiment of the invention uses a nonlinear filter of median filtering, so that the noise point is removed and the edge information of the image can be protected; noise in the sound signal is mainly caused by peripheral machine operation, bath oscillation and the like, and the embodiment of the invention uses an adaptive noise complete set empirical mode decomposition (Complete Ensemble Empirical Mode Decomposition with Adaptive Noise, CEEMDAN) -wavelet packet threshold noise reduction method (shown in fig. 2) to effectively remove noise in the sound signal. The original data can greatly improve the signal to noise ratio through the noise reduction module so as to improve the final recognition accuracy.
Unified modality unit: because the sound pressure signal is a one-dimensional time sequence and the visual signal is a two-dimensional gray level image, in order to eliminate the difference of cross modes, the embodiment of the invention acquires the time-frequency diagram of the time sequence through short-time Fourier transform (Short Time Fourier Transform, STFT), and further combines the two modes into the same mode for subsequent processing. Meanwhile, the sound has short-time stability, and the sound signal can be considered to be approximately unchanged within 10-30 ms, so that a short-time analysis method is adopted to analyze the sound signal. By formulating an audio-visual data matching rule, the number of sound pressure samples corresponding to each picture is determined according to the sampling frequency of the visual image, so that one-to-one correspondence between the visual image and the sound pressure data segment can be realized.
Data normalization unit: considering the problems that the training speed is greatly reduced due to the fact that the size of an image acquired by data acquisition equipment and the time-frequency spectrogram obtained after STFT is possibly inconsistent or the number of channels is not consistent, certain standardized processing is carried out on the visual image and the time-frequency spectrogram, the size of the image is adjusted before the image is input into a weld penetration state identification module, and the image is converted into a standard RGB format. After data normalization, the audiovisual image can be used as input of a deep learning network to carry out subsequent weld penetration state identification.
In the weld penetration state recognition module, the visual feature extraction unit: the visual feature extraction unit consists of CNN and spatial attention mechanism. The CNN consists of a plurality of input layers, a convolution layer, an activation function, a pooling layer and a full connection layer, and the CNN carries out weight updating through back propagation and has been verified to be capable of obtaining a good high-dimensional feature extraction effect on a two-dimensional image. On the basis, the image of the molten pool in the visual image of the incoming weld penetration state recognition module only occupies a part of the whole image, other part information in the image including the background, the non-welded part of the weldment at the weld seam side and the like has little contribution to final penetration state recognition, and the same calculation resources and the same weight as the effective molten pool information are consumed during learning, so that the recognition efficiency is often reduced and the accuracy is often reduced. In CNN, the network can be focused on the more informative molten pool area after the attention mechanism is introduced. The embodiment of the invention carries out the processing of channel attention and spatial attention on the input characteristic layer respectively by adding a convolution block attention module (Convolutional Block Attention Module, CBAM) in the CNN network (as shown in figure 4).
A sound feature extraction unit: the sound feature extraction unit is composed of CNN and multi-scale convolution kernel combination convolution. Unlike an actual puddle image actually captured by a CCD camera, the time spectrum obtained via STFT represents the trend of change in the time domain and the frequency domain on the horizontal axis and the vertical axis, respectively, and the conventional square convolution kernel is used to miss Xu Duoshi information in the frequency domain. The multi-scale convolution kernel combination convolution provided by the embodiment of the invention can extract time-frequency domain characteristic information which is richer in sound signals by adjusting the length and the width of the convolution kernel and combining the convolution kernels in parallel. A multi-scale convolution kernel combining convolution layer is disposed after the input layer.
An audiovisual feature fusion unit: flattening the deep visual features extracted by the visual feature extraction unit and the deep sound features extracted by the sound feature extraction unit into feature vectors of N1 and M1, connecting the two feature vectors to form a fused feature vector of (N+M) 1, and sending the fused feature vector into the weld penetration state classification unit for classification.
A weld penetration state classification unit: the weld penetration state classification unit consists of two full-connection layers, a softmax layer and an output layer. The dimension of the high-dimensional fusion characteristic is reduced to the number of categories to be classified through two layers of full-connection layers, and then the high-dimensional fusion characteristic is converted into the probability of the corresponding category through the softmax layer, and finally the fusion state of the welding seam is obtained through output.
Example 3
The embodiment of the invention provides a monitoring method for the state of a MIG welding seam based on audio-visual dual mode, which comprises the steps of firstly building an audio-visual data acquisition device, installing a CCD camera and a sound pressure sensor which move together with a welding machine arm above a welding seam molten pool, and keeping the relative position of the CCD camera and the sound pressure sensor unchanged. Then inputting the acquired original data into an audio-visual data processing module through a database technology to perform noise reduction, unified mode and data standardization, processing the original weld pool image and sound pressure data into images with standard sizes and channel numbers, labeling the audio-visual data with four penetration states, and marking the audio-visual data according to 8:2 to divide the training set and the test set. And finally, sending the data into a welding seam penetration state recognition module for training, adjusting and optimizing a network structure, and obtaining a final welding seam penetration state recognition result through feature extraction of a deep learning network and classification of a classification unit. On the basis, the system and the method for recognizing the penetration state of the MIG welding seam based on visual auditory bimodal deep learning are deployed to an industrial site to realize real-time monitoring of an visual data stream.
The method specifically comprises the following steps, as shown in fig. 1:
step 1, installing a synchronous audio-visual data acquisition system, and acquiring data to obtain audio-visual data;
Step 2, preprocessing data, namely carrying out noise reduction, unified mode and data standardization processing on the acquired audio-visual data;
step 3, building a welding seam penetration state identification model based on audio-visual double modes;
step 4, training a weld penetration state recognition model, and adjusting and optimizing a network structure;
and 5, deploying the weld penetration state identification model to an industrial site to realize real-time detection.
Example 4
On the basis of example 3, step 1 comprises the following sub-steps:
step 1.1, fixing a CCD camera and a sound pressure sensor on a welding machine arm towards the contact point of a welding wire and a workpiece, so as to ensure that the middle is free from shielding and complete molten pool images can be acquired;
step 1.2, setting sampling frequencies of a CCD camera and a sound pressure sensor, setting the sampling frequency of the CCD camera to be 50 frames per second, setting the sampling frequency of the sound pressure sensor to be 50kHz, and starting a LabVIEW program for acquisition initiation control through TCP communication;
step 1.3, setting working parameters of a welding machine, including: arc starting current, welding current and wire feeding speed; starting the welder to start working, starting the CCD camera and the sound pressure sensor to work, collecting molten pool images and sound pressure signals in the welding process, obtaining audio-visual data and storing the audio-visual data.
Example 5
On the basis of example 4, step 2 comprises the following sub-steps:
step 2.1, median filtering is carried out on the acquired molten pool image so as to filter redundant image noise and improve the signal-to-noise ratio:
g(x,y)=Med{f(a i ,b j ),(a i ,b j )∈A}
wherein f, g is the value of the pixel point before and after median filtering, A is a filtering window;
step 2.2, performing CEEMDAN-wavelet packet threshold noise reduction on the collected sound pressure signals to remove noise in the sound signals;
step 2.3, determining the length L of each sound frame according to sampling frequency settings of the CCD camera and the sound pressure sensor:
wherein f a ,f i Sampling frequencies of the sound pressure sensor and the CCD camera are respectively set;
step 2.4, converting a discrete digital sound pressure signal into a two-dimensional time-frequency spectrogram through STFT:
wherein a n is a sound pressure signal, w n is a Hamming window function;
and 2.5, re-adjusting the obtained molten pool image and the two-dimensional time-frequency spectrogram into an image with the size of (100, 3) to finish the pretreatment of the visual data.
Wherein step 2.2 comprises the following sub-steps (as shown in fig. 2):
2.2.1, decomposing the sound pressure signal into a plurality of modal components through adaptive noise complete set empirical mode decomposition, and marking the sound pressure signal as y (t);
Step 2.2.2, white Gaussian noise v i (t) adding the sound pressure signal y (t) to obtain a new signal No. 1:
y(t)+(-1) s εv i (t)
wherein s=1, 2; epsilon is a standard table of white noise, i=1, 2, … N, N is the number of components finally obtained by decomposition;
performing Empirical Mode Decomposition (EMD) on the new signal No. 1 to obtain first-order eigenmode function (IMF) C 1 :
Wherein r is i () Is thatAn external residual signal;
averaging the obtained N IMFs to obtain a first IMF component for adaptive noise complete set empirical mode decomposition/>
The residual signal after removal of the first IMF component is r 1 (t):
At residual signal r 1 Adding positive and negative Gaussian white noise pairs into (t) to obtain a No. 2 new signal, and performing EMD (empirical mode decomposition) on the No. 2 new signal to obtain a first-order modal component D 1 Further obtaining a second IMF component obtained by adaptive noise complete set empirical mode decompositionAnd removing the residual signal r after the second IMF component 2 (t):
Repeating the above operation until the obtained residual signal r K+1 () Monotonous, can not continue to decompose;
at this time, K IMF components are obtained, and the sound pressure signal y (t) is decomposed into:
the adaptive noise complete set empirical mode decomposition process is finished;
step 2.2.3, comparing the correlation coefficient or variance contribution rate of each IMF component and the sound pressure signal y (t), and screening out a noise component;
The correlation coefficient is:
wherein x is i For each IMF component, y is the sound pressure signal;
the variance contribution rate is:
where D (j) is the variance of the IMF component:
step 2.2.4, wavelet packet threshold noise reduction is carried out; defining subspacesAs a function U n () Closure of (2)Space (S)>As a function U 2n () Is closed space of U n () The method meets the following conditions:
wherein h (k) and g (k) are low-pass and high-pass filters with the length of 2N, and g (k) = (-1) is satisfied k h (k-1) is a sequence constructed by the above formulaA determined orthogonal wavelet packet; />
Is provided with Can be expressed as:
and (3) obtaining a wavelet packet decomposition result:
the threshold method judgment criteria are set as follows:
reconstructing the signal after threshold denoising:
thus, the sound pressure signal y (t) is noise reduced.
Example 6
On the basis of example 5, step 3 comprises the following sub-steps, as shown in fig. 4:
step 3.1, a welding seam penetration state identification model based on audio-visual double modes comprises a visual feature extraction convolutional neural network; the visual feature extraction convolutional neural network comprises: four convolutional layers, four max pooling layers, and two fully connected layers, the activation function is the ReLU function:
f(x)=max(0,x)
the number of convolution kernels in the convolution layer is 8, 16, 32 and 64 in sequence, the sizes of the convolution kernels are 5*5, and the step sizes are 1; the size of the pooling core of the maximum pooling layer is 3*3, and the step length is 2; the node number of the full connection layer is 3136 and 256, and the output characteristic size of the CNN network is 1×256;
Step 3.2, adding a visual attention mechanism CBAM in the visual feature extraction convolutional neural network, wherein the visual attention mechanism CBAM comprises a channel attention mechanism and a spatial attention mechanism;
the channel attention mechanism carries out global average pooling and global maximum pooling on input features, inputs the results of the global average pooling and the global maximum pooling into a shared network consisting of a multi-layer perceptron (Multi Layer Perception, MLP) and a hidden layer for processing, adds the two results obtained by processing, acquires the channel attention weight of the input features by using a sigmoid function, and multiplies the acquired channel attention weight by the input features, namely:
wherein Mc is the channel attention weight, F is the input feature, sigma is the sigmoid function, W 0 、W 1 As the weight of the MLP,results of global average pooling and global maximum pooling;
the space attention mechanism obtains maximum value and average value on the channel of each input feature, the number of the stacked convolution layers with the number of the input channels being 1, the sigmoid function is used for obtaining the weight corresponding to each input feature on the result, and the obtained corresponding weight is multiplied by the input feature:
wherein Ms is a spatial attention weight, F is an input feature, σ is a sigmoid function, F 7×7 For a convolution operation of 7 x 7,results for spatial averaging pooling and maximum pooling;
step 3.3, the welding seam penetration state recognition model based on the audio-visual double modes further comprises a sound feature extraction convolutional neural network, and the structure of the sound feature extraction convolutional neural network is generally consistent with that of the visual feature extraction convolutional neural network; acoustic feature extraction convolutional neural networks use a multi-scale convolutional kernel combination convolutional layer instead of the convolutional kernel of 5*5 in the network, and perform appropriate padding (as shown in fig. 5) before convolution; the classification accuracy obtained by comparing different combinations is used for carrying out optimal combination on parallel channels corresponding to convolution kernels of 3*8, 4*6, 5*5, 6*4 and 8*3;
and 3.4, performing feature fusion on the visual features, splicing 256-dimensional feature vectors extracted by the visual feature extraction convolutional neural network and 256-dimensional feature vectors extracted by the sound feature extraction convolutional neural network to obtain 512-dimensional fusion feature vectors, inputting the 512-dimensional fusion feature vectors into a classification unit, performing two-time full-connection operation to convert the 512-dimensional fusion feature vectors into 4-dimensional feature vectors, and entering a softmax layer to finish a final weld penetration state classification task.
And training a weld penetration state identification model in the step 4, wherein the training comprises setting corresponding loss functions, optimizers and parameters of learning rate, batch size, epoch, dropout rate and num_class.
And deploying the trained model on a host computer of a welding site, and finally realizing real-time monitoring of the visual and audio data stream.
The foregoing describes in detail preferred embodiments of the present invention. It should be understood that numerous modifications and variations can be made in accordance with the concepts of the invention without requiring creative effort by one of ordinary skill in the art. Therefore, all technical solutions which can be obtained by logic analysis, reasoning or limited experiments based on the prior art by the person skilled in the art according to the inventive concept shall be within the scope of protection defined by the claims.

Claims (10)

1. An audio-visual dual-mode-based MIG welding seam state monitoring system, comprising:
the audio-visual data acquisition module comprises a visual data acquisition unit, a sound data acquisition unit and a synchronous acquisition communication unit, and acquires data generated in the welding process to obtain audio-visual data;
the audio-visual data processing module is connected with the audio-visual data acquisition module and comprises a noise reduction unit, a unified mode unit and a data standardization unit, and the noise reduction, unified mode and data standardization are respectively carried out on the acquired audio-visual data;
The welding seam penetration state recognition module is connected with the audio-visual data processing module and comprises a visual feature extraction unit, a sound feature extraction unit, an audio-visual feature fusion unit and a welding seam penetration state classification unit, wherein the visual feature extraction unit and the sound feature extraction unit are used for extracting features of the processed audio-visual data, the audio-visual feature fusion unit is used for carrying out feature fusion, and the welding seam penetration state classification unit is entered to realize recognition of the welding seam penetration state.
2. The audio-visual bi-modal based MIG weld condition monitoring system of claim 1, wherein in the audio-visual data collection module,
the visual data acquisition unit is used for communicating a welding machine with an industrial CCD camera through a GigE Vision communication protocol, respectively configuring the industrial CCD camera and transmitting data streams through a GVGP protocol and a GVGP protocol at an application layer of the communication protocol, and adding a filter with a specific wavelength in front of a lens of the industrial CCD camera;
the sound data acquisition unit acquires and stores sound signals in real time through a DAQexpress data acquisition module of LabVIEW image programming control NI matched with the sound pressure sensor, and the sound signals are collected through a microphone, then enter a csv file of a signal regulator and are converted into sound pressure signals, and are stored in a designated file;
The synchronous acquisition communication unit adopts a TCP transmission protocol to realize synchronous communication between the welding machine and the data acquisition equipment, and controls the start and the end of the work of the industrial CCD camera and the sound pressure sensor by monitoring the existence of welding current of the welding machine.
3. The audio-visual bi-modal based MIG weld condition monitoring system of claim 2, wherein in the audio-visual data processing module,
the noise reduction unit uses a median filter and an adaptive noise complete set empirical mode decomposition-wavelet packet threshold noise reduction method to remove noise in the sound signal;
and the unified mode unit acquires a time-frequency diagram of the time sequence through short-time Fourier transformation, and combines the sound pressure signal of the one-dimensional time sequence and the visual signal of the two-dimensional gray level image into the same mode for subsequent processing.
4. The audio-visual bimodal MIG welding seam state monitoring system according to claim 2, wherein in the welding seam penetration state identification module,
the visual feature extraction unit comprises a CNN network and a spatial attention mechanism; the CNN network comprises a plurality of input layers, a convolution layer, an activation function, a pooling layer and a full connection layer; the CNN updates weight through back propagation; the spatial attention mechanism processes channel attention and spatial attention on the feature layer of the CNN network;
The sound feature extraction unit comprises the CNN network and a multi-scale convolution kernel combined convolution layer; the multi-scale convolution kernel combined convolution layer extracts more time domain and frequency domain characteristic information of the sound signal by adjusting the length and the width of a convolution kernel and combining the length and the width of the convolution kernel; the multi-scale convolution kernel combined convolution layer is arranged behind the input layer;
the audio-visual feature fusion unit flattens the deep visual features extracted by the visual feature extraction unit and the deep sound features extracted by the sound feature extraction unit into feature vectors of N1 and M1 respectively, and connects the feature vectors to form a fusion feature vector of (+ M) 1;
the weld penetration state classification unit comprises two full-connection layers, a softmax layer and an output layer; through the two full-connection layers, the dimension of the fusion feature vector is reduced to the number of categories to be classified, the softmax layer converts the number of categories into the probability of the corresponding categories, and finally the output layer outputs the obtained weld penetration state.
5. An audio-visual dual-mode-based MIG welding seam state monitoring method is characterized by comprising the following steps of:
step 1, installing a synchronous audio-visual data acquisition system, and acquiring data to obtain audio-visual data;
Step 2, data preprocessing, namely carrying out noise reduction, unified mode and data standardization processing on the acquired audio-visual data;
step 3, building a welding seam penetration state identification model based on audio-visual double modes;
step 4, training the weld penetration state recognition model, and adjusting and optimizing a network structure;
and 5, deploying the weld penetration state identification model to an industrial site to realize real-time detection.
6. The method for monitoring the state of a MIG welding seam based on the audio-visual bi-mode according to claim 5, wherein the step 1 includes the following sub-steps:
step 1.1, fixing a CCD camera and a sound pressure sensor on a welding machine arm towards the contact point of a welding wire and a workpiece, so as to ensure that the middle is free from shielding and complete molten pool images can be acquired;
step 1.2, setting sampling frequencies of the CCD camera and the sound pressure sensor, setting the sampling frequency of the CCD camera to be 50 frames per second, setting the sampling frequency of the sound pressure sensor to be 50kHz, and starting a LabVIEW program for acquisition initiation control through TCP communication;
step 1.3, setting working parameters of the welding machine, including: arc starting current, welding current and wire feeding speed; and starting the welding machine to start working, starting the CCD camera and the sound pressure sensor to work, collecting the molten pool image and the sound pressure signal in the welding process, obtaining the audio-visual data and storing the audio-visual data.
7. The method for monitoring the state of a MIG welding seam based on the audio-visual bi-mode according to claim 6, wherein the step 2 includes the following sub-steps:
step 2.1, median filtering is carried out on the acquired molten pool image so as to filter redundant image noise and improve the signal-to-noise ratio:
g(x,y)=Med{f(a i ,b j ),(a i ,b j )∈A}
wherein f, g is the value of the pixel point before and after median filtering, A is a filtering window;
step 2.2, performing adaptive noise complete set empirical mode decomposition-wavelet packet threshold noise reduction on the acquired sound pressure signal to remove noise in the sound signal;
step 2.3, determining the length L of each sound frame according to the sampling frequency settings of the CCD camera and the sound pressure sensor:
wherein f a ,f i Sampling frequencies of the sound pressure sensor and the CCD camera are respectively set;
step 2.4, converting the sound pressure signal of one discrete number into a two-dimensional time-frequency spectrogram through STFT:
wherein an is the sound pressure signal, and w n is a Hamming window function;
and 2.5, re-adjusting the obtained molten pool image and the obtained two-dimensional time-frequency spectrogram into an image with the size of (100, 3) to finish preprocessing the audio-visual data.
8. The method for monitoring the state of a MIG welding seam based on the audio-visual bi-mode according to claim 7, wherein the step 2.2 includes the sub-steps of:
2.2.1, decomposing the sound pressure signal into a plurality of modal components through adaptive noise perfect set empirical mode decomposition, and marking the sound pressure signal as y (t);
step 2.2.2, white Gaussian noise v i (t) adding the sound pressure signal y (t) to obtain a new signal No. 1:
y(t)+(-1) s εv i (t)
wherein s=1, 2; epsilon is a standard table of white noise, i=1, 2, … N, N is the number of components finally obtained by decomposition;
performing Empirical Mode Decomposition (EMD) on the No. 1 new signal to obtain a first-order eigenmodeFunction (IMF) C 1 :
Wherein r is i (t) isAn external residual signal;
averaging the obtained N IMFs to obtain a first IMF component for performing the adaptive noise complete set empirical mode decomposition
The residual signal after removing the first IMF component is r 1 (t):
At residual signal r 1 Adding positive and negative Gaussian white noise pairs into (t) to obtain a No. 2 new signal, and performing EMD (empirical mode decomposition) on the No. 2 new signal to obtain a first-order modal component D 1 Further obtaining a second IMF component obtained by performing empirical mode decomposition of the adaptive noise complete setAnd removing the residual signal r after the second IMF component 2 (t):
Repeating the above operation until the obtained residual signal r K+1 () Monotonous, can not continue to decompose;
At this time, K IMF components are obtained, and the sound pressure signal y (t) is decomposed into:
so far, the adaptive noise complete set empirical mode decomposition process is finished;
step 2.2.3, comparing the correlation coefficient or variance contribution rate of each IMF component and the sound pressure signal y (t), and screening out a noise component;
the correlation coefficient is as follows:
wherein x is i For each of the IMF components, y is the acoustic pressure signal;
the variance contribution ratio is:
wherein D (j) is the variance of the IMF component:
step 2.2.4, wavelet packet threshold noise reduction is carried out; defining subspacesIs a letterNumber U n () Is (are) closure space, ">As a function U 2n () Is closed space of U n () The method meets the following conditions:
wherein h (k) and g (k) are low-pass and high-pass filters with the length of 2N, and g (k) = (-1) is satisfied k h (k-1) is a sequence constructed by the above formulaA determined orthogonal wavelet packet;
is provided withCan be expressed as:
and (3) obtaining a wavelet packet decomposition result:
the threshold method judgment criteria are set as follows:
reconstructing the signal after threshold denoising:
thus, the sound pressure signal y (t) is noise reduced.
9. The method for monitoring the state of a MIG welding seam based on the audio-visual bi-mode according to claim 8, wherein the step 3 includes the sub-steps of:
Step 3.1, the welding seam penetration state identification model based on the audio-visual double modes comprises a visual feature extraction convolutional neural network; the visual feature extraction convolutional neural network comprises: four convolutional layers, four max pooling layers, and two fully connected layers, the activation function is the ReLU function:
f(x)=max(0,x)
the number of convolution kernels in the convolution layer is 8, 16, 32 and 64 in sequence, the sizes of the convolution kernels are 5*5, and the step sizes are 1; the size of the pooling core of the maximum pooling layer is 3*3, and the step length is 2; the node number of the full connection layer is 3136 and 256, and the output characteristic size of the CNN network is 1×256;
step 3.2, adding a visual attention mechanism including a channel attention mechanism and a space attention mechanism into the visual feature extraction convolutional neural network;
the channel attention mechanism carries out global average pooling and global maximum pooling on input features, inputs the results of the global average pooling and the global maximum pooling into a shared network consisting of a multi-layer perceptron (MLP) and a hidden layer for processing, adds the two results obtained by processing, acquires the channel attention weight of the input features by using a sigmoid function, and multiplies the acquired channel attention weight by the input features, namely:
Wherein Mc is the channel attention weight, F is the input feature, sigma is the sigmoid function, W 0 、W 1 As the weight of the MLP in question,results of pooling the global average and the global maximum;
the spatial attention mechanism obtains the maximum value and the average value of channels of each input feature, the convolution layer with the number of the stacked input channels being 1 obtains the weight corresponding to each input feature by using the sigmoid function, and multiplies the obtained corresponding weight by the input feature:
wherein Ms is the spatial attention weight, F is the input feature, σ is the sigmoid function, F 7×7 For a convolution operation of 7 x 7,results for spatial averaging pooling and maximum pooling;
step 3.3, the welding seam penetration state recognition model based on the audio-visual double modes further comprises a sound feature extraction convolutional neural network, and the structure of the sound feature extraction convolutional neural network is generally consistent with that of the visual feature extraction convolutional neural network; the acoustic feature extraction convolutional neural network uses a multi-scale convolutional kernel combination convolutional layer to replace a 5*5 convolutional kernel in the network, and proper padding is performed before convolution; the classification accuracy obtained by comparing different combinations is used for carrying out optimal combination on parallel channels corresponding to convolution kernels of 3*8, 4*6, 5*5, 6*4 and 8*3;
And 3.4, performing feature fusion on the visual features, splicing the 256-dimensional feature vectors extracted by the visual feature extraction convolutional neural network and the 256-dimensional feature vectors extracted by the sound feature extraction convolutional neural network to obtain 512-dimensional fusion feature vectors, inputting the 512-dimensional fusion feature vectors into a classification unit, performing two-time full connection operation to convert the fusion feature vectors into 4-dimensional feature vectors, and entering a softmax layer to finish a final weld penetration state classification task.
10. The method of claim 9, wherein training the weld penetration state recognition model in step 4 includes setting corresponding loss functions, optimizers, and learning rate, batch size, epoch, drop rate, num_class parameters.
CN202310575668.8A 2023-05-19 2023-05-19 System and method for monitoring MIG welding seam state based on audio-visual dual mode Pending CN116604151A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310575668.8A CN116604151A (en) 2023-05-19 2023-05-19 System and method for monitoring MIG welding seam state based on audio-visual dual mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310575668.8A CN116604151A (en) 2023-05-19 2023-05-19 System and method for monitoring MIG welding seam state based on audio-visual dual mode

Publications (1)

Publication Number Publication Date
CN116604151A true CN116604151A (en) 2023-08-18

Family

ID=87679508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310575668.8A Pending CN116604151A (en) 2023-05-19 2023-05-19 System and method for monitoring MIG welding seam state based on audio-visual dual mode

Country Status (1)

Country Link
CN (1) CN116604151A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117428291A (en) * 2023-12-18 2024-01-23 南京理工大学 Weld bead fusion width quantification method based on sonogram characteristic analysis

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117428291A (en) * 2023-12-18 2024-01-23 南京理工大学 Weld bead fusion width quantification method based on sonogram characteristic analysis

Similar Documents

Publication Publication Date Title
CN109719368B (en) Multi-information acquisition monitoring system and method for robot welding process
CN116604151A (en) System and method for monitoring MIG welding seam state based on audio-visual dual mode
CN102528225B (en) Sound signal transduction and prediction method of GTAW (gas tungsten arc welding) welding fusion penetration state
Wu et al. Visual-acoustic penetration recognition in variable polarity plasma arc welding process using hybrid deep learning approach
CN112906624B (en) Video data feature extraction method based on audio and video multi-mode time sequence prediction
CN111299763A (en) Anti-noise-interference laser visual welding seam automatic tracking method and system
CN113379740A (en) VPPAW fusion in-situ real-time monitoring system based on perforation molten pool image and deep learning
CN116226715A (en) Multi-mode feature fusion-based online polymorphic identification system for operators
Liu et al. Seam tracking system based on laser vision and CGAN for robotic multi-layer and multi-pass MAG welding
CN110837760B (en) Target detection method, training method and device for target detection
CN115311111A (en) Classroom participation evaluation method and system
CN113253850A (en) Multitask cooperative operation method based on eye movement tracking and electroencephalogram signals
CN106488197A (en) A kind of intelligent person recognition robot
KR20210048172A (en) Gap recognition system and method of fillet welding using deep learning algorithm
CN114706338B (en) Interaction control method and system based on digital twin model
CN110909603A (en) Intelligent monitoring system based on support vector machine
CN114821174B (en) Content perception-based transmission line aerial image data cleaning method
CN114120634B (en) Dangerous driving behavior identification method, device, equipment and storage medium based on WiFi
CN105867625A (en) Long-distance gesture control method
CN113254713B (en) Multi-source emotion calculation system and method for generating emotion curve based on video content
CN113658188B (en) Solution crystallization process image semantic segmentation method based on improved Unet model
CN115482166A (en) Underwater thermal disturbance image recovery method based on deep learning
Grabs et al. Supervised Machine Learning based Classification of Video Traffic Types
CN204524507U (en) Arc welding robot intelligence coupled system
CN112801984A (en) Weld joint positioning method based on countermeasure learning under laser vision system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination