CN117238313A

CN117238313A - Watermelon maturity nondestructive detection method and system based on Mel spectrum and deep learning

Info

Publication number: CN117238313A
Application number: CN202311132110.9A
Authority: CN
Inventors: 李金屏; 刘军; 夏英杰; 董子昊; 厉广伟; 毛英宇; 陈艺博
Original assignee: Jinan University Industrial Technology Research Institute Co ltd
Current assignee: Jinan University Industrial Technology Research Institute Co ltd
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-12-15

Abstract

The application belongs to the technical field of audio signal processing, and provides a watermelon maturity nondestructive testing method and system based on Mel spectrum and deep learning.

Description

Watermelon maturity nondestructive detection method and system based on Mel spectrum and deep learning

Technical Field

The application belongs to the technical field of audio signal processing, and particularly relates to a watermelon maturity nondestructive testing method and system based on Mel spectrum and deep learning.

Background

The main source of the knocking sound of the watermelon is internal resonance, and the physiological structure in the watermelon can be changed along with growth and maturity, and the knocking sound can be correspondingly changed. Acoustic resonance testing is the use of vibrations to infer defects in the object being tested, and acoustic resonance can be considered as the voiceprint of fruit, whose resonant frequency changes over the fruit's growth cycle. However, the current acoustic nondestructive detection method for the maturity of watermelons is generally low in characteristic information, is carried out in a laboratory environment, needs to acquire knocking audio by means of professional equipment, and is poor in generalization capability.

The inventor finds that the key point of acoustic nondestructive testing is to find out the audio characteristics closely related to the maturity of the watermelon, then obtain a corresponding classification model, the current research method can be basically divided into two ideas, one of which is that different audio signal characteristic distribution intervals are different, the maturity of the watermelon is judged by manually dividing the intervals, the training of the model is avoided, a large amount of data is not needed, but the method depends on manually selecting the characteristics, only the specific problem can be solved, and if the watermelon variety or detection equipment is changed, the effect of the method is poor; the second idea is to extract the characteristics of the audio signal, learn the rule from the data by using a machine learning method, and the method has the advantages that the algorithm can automatically generate a model for classification without complex characteristic extraction, extract the characteristics from the frequency domain, obtain the frequency spectrum characteristics of the audio through Fourier transform, perform dimension reduction processing on the frequency spectrum characteristics by using PCA (Principal Component Analysis) and other methods, and then train the classification model by using a machine learning technology, however, the dimension reduction method using PCA has the problem that the characteristics are less contained, and is not suitable for the processing requirement of big data.

Disclosure of Invention

In order to solve the problems, the application provides a watermelon maturity nondestructive testing method and system based on Mel spectrum and deep learning, wherein the application firstly converts audio information from time domain to frequency domain by means of short-time Fourier transform to obtain the frequency spectrum characteristics of audio; then, the spectrum features are converted into Mel spectrum features by using a Mel filter bank, the dimension of the features is reduced, and the complexity and the calculation cost of the model are reduced; the data set is constructed, and the trained network model is utilized to realize accurate detection of the maturity of the watermelon, so that the problem that the characteristics are less in content after the characteristics are manually selected and the dimensions are reduced is solved.

In order to achieve the above object, the present application is realized by the following technical scheme:

in a first aspect, the application provides a watermelon maturity nondestructive testing method based on mel spectrum and deep learning, comprising the following steps:

acquiring audio information of watermelons to be detected;

performing pre-emphasis processing, framing processing, windowing processing and short-time Fourier transformation on the audio information, and converting the audio information from a time domain to a frequency domain to obtain frequency spectrum characteristics of the audio;

converting the obtained spectrum characteristics into Mel spectrum characteristics, and reducing the dimension;

and obtaining a watermelon maturity detection result by utilizing the Mel spectrum characteristics and the trained deep learning detection model.

Further, a plurality of watermelon knocking audios are collected by using a smart phone.

Further, preprocessing the audio information includes converting the audio into mono, unifying the sampling rate to a preset value, saving to a preset format, denoising and unifying the audio duration.

Further, the data enhancement is performed on the training set when the deep learning detection model is trained by adjusting the volume.

Further, pre-emphasis is performed on the audio signal by adopting a high-pass filter with positive gain characteristic; when framing, the frame number is equal to the ratio of the product of the sampling rate and the audio time length to the frame shift, and the sum of the constant 1 is added; the Hamming window is utilized, the slope of the function of the Hamming window gradually decreases at two ends of the window, the middle of the window is a rectangular window, and the Hamming window is multiplied with the audio information point by point.

Further, the audio information after Fourier transformation is subjected to scale conversion and filtering, and frequency multiplication accumulation is carried out on the frequency spectrum characteristics and each triangular filter of the Mel filter bank respectively.

Further, the deep learning detection model is an ECAPA-TDNN network model.

In a second aspect, the application also provides a watermelon maturity nondestructive testing system based on mel spectrum and deep learning, comprising:

a data acquisition module configured to: acquiring audio information of watermelons to be detected;

a feature extraction module configured to: performing pre-emphasis processing, framing processing, windowing processing and short-time Fourier transformation on the audio information, and converting the audio information from a time domain to a frequency domain to obtain frequency spectrum characteristics of the audio;

a dimension reduction module configured to: converting the obtained spectrum characteristics into Mel spectrum characteristics, and reducing the dimension;

a detection module configured to: and obtaining a watermelon maturity detection result by utilizing the Mel spectrum characteristics and the trained deep learning detection model.

In a third aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the melon maturity nondestructive detection method based on mel spectrum and deep learning of the first aspect.

In a fourth aspect, the present application also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the steps of the watermelon maturity nondestructive testing method based on mel spectrum and deep learning according to the first aspect when executing the program.

Compared with the prior art, the application has the beneficial effects that:

according to the application, pre-emphasis processing, framing processing, windowing processing and short-time Fourier transformation are carried out on the audio information, the audio information is converted from a time domain to a frequency domain by utilizing the characteristic that the audio information has more frequency domain than time domain containing characteristic information, the frequency spectrum characteristics of the audio are obtained, the obtained frequency spectrum characteristics are converted into Mel spectrum characteristics, the dimension is reduced, the characteristic that the main frequency spectrum characteristics can be reserved by utilizing the Mel spectrum is utilized, the problem that the dimension is reduced so that the characteristics contain less is solved, the obtained frequency spectrum characteristics are converted into Mel spectrum characteristics, the complexity and the calculation cost of a detection model are reduced by dimension reduction, and the problem that the detection effect is poor due to the fact that a manual characteristic selection mode is adopted when the watermelon variety or detection equipment is changed is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments and are incorporated in and constitute a part of this specification, illustrate and explain the embodiments and together with the description serve to explain the embodiments.

FIG. 1 is a flow chart of the method of embodiment 1 of the present application;

FIG. 2 is a diagram of an original audio waveform of embodiment 1 of the present application;

FIG. 3 is a diagram of an audio waveform after noise reduction according to embodiment 1 of the present application;

FIG. 4 is a chart showing the Fourier transformed spectrum of example 1 of the present application;

fig. 5 is a converted mel profile of example 1 of the present application.

Detailed Description

The application will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

Example 1:

the existing watermelon maturity acoustic nondestructive detection method is generally low in characteristic information, is carried out in a laboratory environment, needs to acquire knocking audio by means of professional equipment, and is poor in generalization capability. The key point of acoustic nondestructive testing is that the audio features closely related to the maturity of the watermelons are found, then the corresponding classification model is obtained, the current research method can be basically divided into two ideas, one idea is that different audio signal feature distribution intervals are different, the maturity of the watermelons is judged by manually dividing the intervals, the method only can solve specific problems depending on manually selecting features, and if the watermelon variety or detection equipment is changed, the effect of the method is poor; the second idea is to extract the characteristics of the audio signal, learn the law from the data by using a machine learning method, and the method has the problem of less characteristic implications by using the dimension reduction technology such as PCA and the like, and is not suitable for the processing requirement of big data. In order to solve the above problems, the embodiment provides a watermelon maturity nondestructive testing method based on mel spectrum and deep learning, which comprises the following steps:

acquiring audio information of watermelons to be detected; optionally, collecting knocking audio of a plurality of watermelons by using a smart phone;

In the embodiment, the characteristic that the frequency domain of the audio information is richer than the time domain contains the characteristic information is utilized, the audio information is converted from the time domain to the frequency domain, the frequency spectrum characteristic of the audio is obtained, the obtained frequency spectrum characteristic is converted into the Mel spectrum characteristic, the complexity and the calculation cost of a detection model are reduced through dimension reduction, the problem that the characteristic contains less by a dimension reduction method such as PCA (principal component analysis) is solved, and the problem that the detection effect is poor due to the fact that a manual characteristic selection mode is adopted when the variety of watermelons or detection equipment is changed is also solved; the intelligent mobile phone is used for collecting the knocking audio of a plurality of watermelons, and professional equipment is not needed for collecting the knocking audio.

Optionally, the method in this embodiment includes the following specific steps:

s1, constructing a training set and a testing set:

s1.1, equipment such as a smart phone can be used for collecting a large amount of knocking audio information in environments such as a watermelon greenhouse and the like, and a data set is constructed by utilizing the collected audio; it can be understood that besides the collection of the audio signals by using the smart phone, other terminal devices such as a tablet and the like can be used, and environments except greenhouse environments, such as a mall, a watermelon storage warehouse, an open-air planting place and the like, can be obtained;

s1.2, dividing an audio data set into four grades of maturity, octal maturity, penta-maturity and immature according to different dates of planting watermelons in different batches by melon farmers and combining with taste scoring results of researchers;

s1.3, randomly selecting audio information from the constructed data set, wherein the audio information is prepared according to 8:2 to construct training and test sets.

S2, preprocessing the audio information, including converting the audio into a single channel, unifying the sampling rate to a preset value, storing the single channel and the unified sampling rate to a preset format, denoising the single channel and unifying the audio duration:

s2.1, converting the collected audio signals from two channels to a single channel;

s2.2, the uniform sampling rate is 16000Hz;

s2.3, storing the audio in a wav format;

s2.4, carrying out noise reduction treatment on the monophonic audio, and filtering plant leaf friction sound, vehicle and other environmental noise;

s2.5, unifying the audio duration to 4 seconds in a head-tail silence supplementing mode.

S3, data enhancement is carried out on the training set:

data enhancement is carried out on a training set when a deep learning detection model is trained by adjusting the volume, and optionally, the volume is randomly increased and decreased from-15 dB to +15 dB; the training set can be subjected to data enhancement in a mode of increasing white noise;

s4, extracting frequency spectrum characteristics of the audio by short-time Fourier transform:

s4.1, because the collected audio data has larger energy in the low frequency band and smaller energy in the high frequency band, in the embodiment, the high-pass filter with positive gain characteristic is adopted to pre-emphasis the audio information, so that the signal of the high frequency part can be increased, and the audio is flattened;

s4.2, the purpose of framing is to divide the signal into a plurality of sections, the non-stationary signal can be regarded as a relatively stable signal in a very short time, so that Fourier transformation can be carried out in the very short time, and 320 sampling points can be taken for frame shift; the frame number is equal to the ratio of the product of the sampling rate and the audio time length to the frame shift, and the sum of the constant 1 is added, and the calculation formula is as follows:

wherein s is the sampling rate; t is the duration of the audio; h is the frame shift.

S4.3, frequency spectrum leakage may occur due to the truncation of the signal in the time domain, resulting in errors in the frequency spectrum analysis, and windowing is the multiplication of the signal by a window function to reduce the frequency spectrum leakage. In this embodiment, a hamming window with smooth transition characteristics is selected, and the window length can take 1024 sampling points. The slope of the hamming window function gradually decreases at both ends of the window, the middle part approximates to a rectangular window, and the hamming window function is multiplied with the original signal point by point, so that strong changes can be avoided. The hamming window function is:

where n=1, 2, …, N-1, N represents the total length of the window function; m represents the effective length of the window function; h (n) represents the value of the window function at the nth sample point.

S4.4, short-time Fourier transform, which aims to complete the conversion of an audio signal from a time domain to a frequency domain, and is more suitable for signal conversion because watermelon knocking audio has abrupt signals.

S5, converting the spectrum characteristics into Mel spectrum characteristics through a Mel filter bank:

and performing scale conversion and filtering on the signals after Fourier transformation according to the relation between the linear frequency and the Mel frequency. And respectively carrying out frequency multiplication accumulation on the frequency spectrum characteristics and each triangular filter, wherein the obtained value is the energy value of the frame data in the frequency band corresponding to the triangular filter.

The mel filter bank consists of a K-order triangular filter, and the expression is:

wherein f is a triangular filter; f (m) is the f center frequency, and f (m) is defined as:

wherein, N is the number of sample points in a frame of signal; f (F) _s Is the sampling frequency of the signal; f (f) ₁ And f _h The lowest frequency and the highest frequency in the triangular filter bank respectively; m is the number of filters; in the above, B ^-1 (x)＝700(e ^x/1125 -1)。

Optionally, the number M of Mel filter banks is set to 64, the highest frequency f in the triangular filter bank ₁ At 50Hz, the lowest frequency f _h Is 14000Hz.

S6, training and testing by using an ECAPA-TDNN network:

s6.1, inputting a training set into an ECAPA-TDNN network for training;

s6.2, testing the accuracy of classification on the test set after each training; in this embodiment, the effect of identifying the maturity of watermelons is shown in table 1:

table 1 test set test results

Example 2:

the embodiment provides a watermelon maturity nondestructive testing system based on mel spectrum and deep learning, which comprises:

The working method of the system is the same as that of the watermelon maturity nondestructive testing method based on mel spectrum and deep learning in embodiment 1, and is not repeated here.

Example 3:

the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the melon maturity nondestructive detection method based on mel spectrum and deep learning described in embodiment 1.

Example 4:

the embodiment provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the watermelon maturity nondestructive testing method based on mel spectrum and deep learning in embodiment 1 when executing the program.

The above description is only a preferred embodiment of the present embodiment, and is not intended to limit the present embodiment, and various modifications and variations can be made to the present embodiment by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present embodiment should be included in the protection scope of the present embodiment.

Claims

1. The watermelon maturity nondestructive testing method based on mel spectrum and deep learning is characterized by comprising the following steps of:

acquiring audio information of watermelons to be detected;

2. The non-destructive testing method for watermelon maturity based on mel spectrum and deep learning of claim 1, wherein a smartphone is used to collect the tapping audio of a plurality of watermelons.

3. The method for non-destructive testing of watermelon maturity based on mel-spectrum and deep learning according to claim 1, wherein preprocessing the audio information comprises converting the audio into mono, unifying the sampling rate to a preset value, saving as a preset format, denoising, and unifying the audio duration.

4. The non-destructive testing method for the maturity of watermelons based on mel spectrum and deep learning according to claim 1, wherein the training set used for training the deep learning test model is subjected to data enhancement by adjusting the volume.

5. The non-destructive testing method of watermelon maturity based on mel spectrum and deep learning of claim 1, wherein the audio signal is pre-emphasized with a high pass filter having positive gain characteristics; when framing, the frame number is equal to the ratio of the product of the sampling rate and the audio time length to the frame shift, and the sum of the constant 1 is added; the Hamming window is utilized, the slope of the function of the Hamming window gradually decreases at two ends of the window, the middle of the window is a rectangular window, and the Hamming window is multiplied with the audio information point by point.

6. The non-destructive testing method for watermelon maturity based on mel spectrum and deep learning of claim 1, wherein the fourier transformed audio information is scaled and filtered, and the spectral features are frequency multiplied and accumulated with each triangular filter of the mel filter bank, respectively.

7. The non-destructive testing method for the maturity of watermelons based on mel spectrum and deep learning as claimed in claim 1, wherein the deep learning testing model is an ECAPA-TDNN network model.

8. Watermelon maturity nondestructive testing system based on mel spectrum and deep learning, which is characterized by comprising:

9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the mel-spectrum and deep learning based watermelon maturity non-destructive testing method according to any one of claims 1-7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the mel-spectrum and deep learning based watermelon maturity nondestructive testing method of any one of claims 1-7 when the program is executed.