CN115662395B - Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction - Google Patents

Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction Download PDF

Info

Publication number
CN115662395B
CN115662395B CN202211291791.9A CN202211291791A CN115662395B CN 115662395 B CN115662395 B CN 115662395B CN 202211291791 A CN202211291791 A CN 202211291791A CN 115662395 B CN115662395 B CN 115662395B
Authority
CN
China
Prior art keywords
wireless communication
unhealthy
sparse
signal
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211291791.9A
Other languages
Chinese (zh)
Other versions
CN115662395A (en
Inventor
余安云
张美群
黄珍英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongguan Jiexun Electronic Technology Co ltd
Original Assignee
Dongguan Jiexun Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongguan Jiexun Electronic Technology Co ltd filed Critical Dongguan Jiexun Electronic Technology Co ltd
Priority to CN202211291791.9A priority Critical patent/CN115662395B/en
Publication of CN115662395A publication Critical patent/CN115662395A/en
Application granted granted Critical
Publication of CN115662395B publication Critical patent/CN115662395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Abstract

The invention relates to the technical field of sound filtering and eliminating, and discloses a method for intelligently eliminating unhealthy sound by a learning earphone based on air conduction, which comprises the following steps: receiving a wireless communication signal and converting the wireless communication signal into a time-frequency waterfall diagram; extracting structural feature vectors of the wireless communication signals on a time-frequency waterfall graph by using a multi-scale feature extraction method; performing sparse representation on the extracted structural feature vector; an unhealthy sound identification model based on VC theory is constructed, a wireless communication signal sparse feature vector is input into the model, and the model outputs a detection and identification result of the wireless communication signal and eliminates unhealthy sound. The extracted multi-scale structural features have geometric invariance, the influence of the signal intensity change of a geometric local area on the extracted features is avoided, the wireless communication signal sparse feature vector is mapped to a high-dimensional space based on the VC (vitamin C) theory, and the detection and identification of unhealthy sound wireless communication signals are more accurately carried out in the high-dimensional space.

Description

Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction
Technical Field
The invention relates to the technical field of sound filtering and eliminating, in particular to a method for intelligently eliminating unhealthy sound based on a learning earphone under air conduction.
Background
The problem of signal interference exists between the wireless Bluetooth headset and Wi-Fi, and the reason for the problem is that the wireless Bluetooth headset and Wi-Fi use a 2.4GHz frequency band, so that when the wireless Bluetooth headset and Wi-Fi are simultaneously started, the data throughput of Bluetooth can be drastically reduced, equipment pairing is difficult, unhealthy noise exists, the health of users and the user experience are affected, and when the wireless communication signals of unhealthy sounds are detected, filtering and eliminating processing is needed for the wireless communication signals. CN102780938B provides a bluetooth headset circuit board, which comprises a signal layer S1, a ground layer G, a power layer P and a signal layer S2 from top to bottom, wherein a radio frequency signal layer and an audio signal layer in the bluetooth headset circuit board are respectively arranged on the signal layer S1 and the signal layer S2, and the ground layer G and the power layer P isolate the audio signal layer from the radio frequency signal layer; and implementing a differential wiring principle on the audio signal layer, wherein a differential mode is also adopted for a power supply for supplying power to the audio circuit; and the differential wires are equal in length, equal in thickness, equidistant and parallel in wires, so that signal interference can be reduced from the signal layer layout, the microphone end and the loudspeaker end in all directions, and the tone quality of the Bluetooth headset is improved. CN109587714a provides a method, an apparatus, a storage medium and a bluetooth headset for transmitting audio data, the method comprising: before the master earphone transmits audio data to the slave earphone or before the master earphone transmits and receives the audio data to and from the intelligent device, determining the transmission time advance of the WIFI CTS signal, wherein the transmission time advance is at least the duration of a CTS data packet plus SIFS time; sending a WIFI CTS signal on a frequency band meeting a preset condition according to the sending time advance, wherein the WIFI CTS signal carries a time for requesting the WIFI equipment to keep silent, and the time for keeping silent at least comprises duration time of one or more Bluetooth packets between a master earphone and a slave earphone or between the master earphone and the intelligent equipment; the audio data is sent to the slave earphone or received and sent with the intelligent device, so that Bluetooth signals are less interfered by WIFI signals. Although the existing earphone can reduce the interference of the WIFI signal, improve the tone quality of the earphone, the following two problems still exist: firstly, unhealthy sounds are not identified, and weakening processing is carried out on the intensity of all wireless communication signals, so that useful communication signals are weakened as well; secondly, the hardware configuration of the earphone and the audio data receiving mode need to be adjusted, and the adjusting range is too large. Aiming at the problem, the patent provides a method for intelligently eliminating unhealthy sounds by learning headphones under air conduction, which realizes the identification of unhealthy sounds based on the characteristics of wireless communication signals propagating in the air and eliminates the identified unhealthy sound wireless communication signals.
Disclosure of Invention
In view of the above, the invention provides a method for intelligently eliminating unhealthy sounds by learning headphones based on air conduction, which aims to (1) construct a time-frequency waterfall diagram of a wireless communication signal, wherein the time-frequency waterfall diagram comprises time domain information, frequency domain information and signal strength of the communication signal, represents complete information of the communication signal, and utilizes a multi-scale image structural feature extraction method to obtain structural features of the time-frequency waterfall diagram, wherein the extracted structural features effectively represent signal features of the communication signal on different scales, and the multi-scale structural features have geometric and optical conversion invariance, so that influence of signal strength changes of geometric local areas on the extracted features is avoided, and the method has better globally; (2) The method is based on a sparse representation method to realize dimension reduction of the structural feature vector, reduce the calculation amount of subsequent detection and recognition, and obtain an unhealthy sound wireless communication signal accurate detection and recognition model based on a VC dimension theory.
The invention provides a method for intelligently eliminating unhealthy sound by learning headphones based on air conduction, which comprises the following steps:
S1: receiving communication signals of indoor Bluetooth wireless communication and WIFI wireless communication respectively, and converting the received communication signals into a time-frequency waterfall diagram, wherein the time-frequency waterfall diagram comprises time domain information, frequency domain information and signal strength of the communication signals;
s2: extracting structural feature vectors of wireless communication signals on a time-frequency waterfall by using a multi-scale feature extraction method, wherein a directional gradient histogram feature extraction algorithm is a main multi-scale feature extraction method;
s3: performing sparse representation processing on the extracted structural feature vector to obtain a wireless communication signal sparse feature vector, wherein the sparse representation based on the overcomplete dictionary is a main method for the structural feature vector sparse processing;
s4: an unhealthy voice recognition model based on a VC (video coding) dimension theory is constructed, a wireless communication signal sparse feature vector is input into the model, the model outputs a detection and recognition result of a wireless communication signal, and the unhealthy voice wireless communication signal obtained by detection is eliminated, wherein the unhealthy voice recognition model maps the sparse feature vector of the wireless communication signal which is sparsely represented to a high-dimension space, and detection and recognition of the unhealthy voice wireless communication signal are carried out in the high-dimension space.
As a further improvement of the present invention:
optionally, in the step S1, receiving communication signals of indoor bluetooth wireless communication and WIFI wireless communication respectively, including:
the learning earphone receives wireless communication signals respectively by utilizing a built-in signal receiving device, wherein the wireless communication signals comprise Bluetooth wireless communication signals and WIFI wireless communication signals, the indoor Bluetooth wireless communication signals comprise wireless communication signals of indoor Internet of things equipment and mobile communication equipment, and the WIFI wireless communication signals comprise wireless communication signals of various terminal equipment under a local area network;
the wireless communication signal set S received by the signal receiving device is:
S={s 1 (t 1 ),s 2 (t 2 ),...,s i (t i ),...,s m (t m )}
wherein:
s i (t i ) Representing the ith wireless communication signal received by the signal receiving apparatus, t i Representing a wireless communication signal s i (t i ) Time domain information, t i ∈[t i,0 ,t i,end ],t i,0 Indicating receipt of a wireless communication signal s i (t i ) Time t of (2) i,end Representing a wireless communication signal s i (t i ) Is the vanishing time of (2);
A i signal amplitude s representing a wireless communication signal i (t i ),f i Representing a wireless communication signal s i (t i ) Is used for the frequency of (a),representing a wireless communication signal s i (t i ) Is a phase of the initial phase of (a);
m represents the number of wireless communication signals received by the signal receiving device;
the digital-to-analog converter in the signal receiving device converts the received wireless communication signal into an analog signal, and the conversion flow of the analog signal is as follows:
S11: construction frequency f s Is a sequence of impulses:
any wireless communication signal S in the set of wireless communication signals S i (t i ) Is the impulse sequence of (1)
S12: based on the impulse sequence of the wireless communication signal, any wireless communication signal S in the wireless communication signal set S is processed i (t i ) Conversion to analogue signals
S13: the digital-to-analog converter converts the wireless communication signals in the wireless communication signal set S into analog signals according to the analog signal conversion flow to obtain a converted set
Optionally, the converting the received wireless communication signal into a time-frequency waterfall diagram in the step S1 includes:
the time-frequency converter built in the learning earphone converts the analog signals in the set S' into a time-frequency waterfall diagram, and the conversion flow of the time-frequency waterfall diagram is as follows:
for any analog signal in set SPerforming time-frequency conversion:
wherein:
s i (0) Representing analog signalsThe value of the 1 st sampling signal point, s i (N i -1) representing an analog signalThe value of the last sampling signal point, N i Representing analog signal +.>The number of mid-sample signal points;
j represents an imaginary unit, j 2 =-1;
z (·) represents the window function, L represents the length of the window function, z * (. Cndot.) represents the complex conjugate of the window function;
F i representing any analog signal in set S Time-frequency conversion results of (2);
calculating arbitrary analog signalsSignal strength En of (2) i
En i =(F i ) 2
Constructing a time-frequency waterfall diagram of the analog signal based on the time domain information, the frequency domain information and the signal strength of the analog signal, wherein any analog signalThe result of the time domain waterfall diagram is G i (t i ,f i ,En i ) Wherein the time domain waterfall graph G i (t i ,f i ,En i ) For a two-dimensional color image, a rectangular coordinate system is established by taking the lower left corner of the image as an origin, and the horizontal axis of the coordinate system represents analog signals +.>Frequency domain range f of (2) i The vertical axis represents analog signal +.>Time domain range t of (2) i The corresponding frequency of any pixel point (x, y) in the image is x, the signal with the time point being y, the pixel color represents the signal intensity of the signal at the corresponding time point and the frequency point, and the higher the signal intensity is, the closer the color is to red;
the set of time-frequency waterfall diagrams of analog signals in the set S' is as follows: { G 1 (t 1 ,f 1 ,En 1 ),...,G m (t m ,f m ,En m ) And (2) transmitting the time-frequency waterfall graph set to a processor in the learning earphone.
Optionally, in the step S2, a multi-scale feature extraction method is used to extract a structural feature vector of the wireless communication signal on a time-frequency waterfall graph, including:
the processor in the learning earphone utilizes a multi-scale feature extraction method to extract structural feature vectors of wireless communication signals on a time-frequency waterfall diagram, wherein the directional gradient histogram feature extraction is carried out The algorithm is a main method for extracting multi-scale features, and for any time-frequency waterfall graph G in a time-frequency waterfall graph set i (t i ,f i ,En i ) The multi-scale feature extraction method comprises the following steps:
s21: converting pixel values of any pixel (x, y) in the time-frequency waterfall diagram into gray values, wherein a conversion formula of the pixel values is as follows:
g(x,y)=0.299×R(x,y)+0.587×G(x,y)+0.114×B(x,y)
wherein:
r (x, y) represents the value of pixel (x, y) in the red color channel, G (x, y) represents the value of pixel (x, y) in the green color channel, and B (x, y) represents the value of pixel (x, y) in the blue color channel;
g (x, y) represents the gray value of the pixel (x, y);
s22: calculating the gradient value and the gradient direction of any pixel (x, y), wherein the calculation formulas of the gradient value and the gradient direction of the pixel are as follows:
wherein:
α (x, y) represents the gradient value of the pixel (x, y);
beta (x, y) represents the gradient direction of the pixel (x, y);
s23: time-frequency waterfall graph G using a sliding window of 16 x 16 pixel size i (t i ,f i ,En i ) Performing sliding segmentation, wherein the sliding segmentation adopts a sequence from left to right and from top to bottom, the image specification of the time-frequency waterfall is 64 multiplied by 128, and 32 pixel blocks are obtained by total segmentation;
s24: calculating a gradient value and a gradient direction of pixels in each pixel block, dividing a gradient direction statistical interval into (0 DEG, 90 DEG ], (90 DEG, 180 DEG ], (180 DEG, 270 DEG ], (270 DEG, 360 DEG ], and calculating to obtain a gradient distribution histogram of each pixel block, wherein the horizontal axis of the gradient distribution histogram is the gradient direction statistical interval, and the vertical axis is the sum of pixel gradients belonging to any gradient direction statistical interval in the pixel block;
S25: combining gradient distribution histograms of 32 pixel blocks, the combined result H i The method comprises the following steps:
wherein:
H 1,i representing a time-frequency waterfall graph G i (t i ,f i ,En i ) Gradient distribution histograms of the 1 st pixel block in 4 directions;
H i representing a time-frequency waterfall graph G i (t i ,f i ,En i ) A combination result of the medium gradient distribution histogram;
the combination result comprises 128 characteristic points of the time-frequency waterfall diagram in 4 directions, corresponding to H i And carrying out normalization processing, wherein the normalization processing formula is as follows:
wherein:
represents an L2 norm;
ε=0.01;
H′ i for a time-frequency waterfall graph G i (t i ,f i ,En i ) Is a structural feature vector of (1);
and extracting structural feature vectors of all the time-frequency waterfall graphs in the time-frequency waterfall graph set based on a multi-scale feature extraction method.
Optionally, in the step S3, sparse representation processing is performed on the extracted structural feature vector, including:
performing sparse representation processing on the extracted structural feature vectors to obtain wireless communication signal sparse feature vectors, wherein the sparse representation based on an overcomplete dictionary is a main method for the structural feature vector sparse processing, and the sparse representation processing flow is as follows:
s31: constructing a dictionary for sparse representation, and initializing the dictionary to D 0 The dimension of the dictionary is 64×128 dimensions;
s32: setting the current iteration number of the dictionary as d, and setting the initial value of d as 0;
S33: collecting a training set data1 for dictionary training, wherein samples in the training set data1 are image structure feature vectors;
s34: constructing sparse coefficient representation of each sample in the training set datal, and then the sparse coefficient of the q-th sample in the training set data1 is represented as follows:
wherein:
τ q for the q-th sample data1 in the training set data1 q Is represented by a sparse coefficient of (a);
D d a dictionary representing a d-th dictionary iteration;
s35: the sparse coefficient representation of all samples in the training set data1 is formed into a sparse coefficient representation matrix tau, each column in the matrix tau represents the sparse coefficient representation of the sample, and the dictionary is updated for the (d+1) th time:
D d+1 =τ T (ττ T ) -1
judgingWhether or not it is smaller than the threshold value, if->Less than the threshold value, D d+1 For the over-complete dictionary D obtained by training, tau is a corresponding sparse coefficient representation matrix;if not, d=d+1, and the process returns to step S34;
s36: constructing an objective function of structural feature vector sparse processing:
wherein:
H″ i representing structural feature vector H' i Sparse feature vectors after sparse processing;
the constraint conditions of the objective function are as follows:
wherein:
τ r representing row vectors in a matrix for sparse coefficients, ε 0 For the sparsity constraint threshold value, I.I 0 Represents an L0 norm;
s37: converting the training objective function into a Lagrangian function:
Wherein:
lambda is the Lagrangian multiplier;
h', which will minimize the Lagrangian function i As a sparse feature vector after sparse representation processing, the dimension of the sparse feature vector is 64 dimensions; the sparse feature vector set of all wireless communication signals received by the learning earphone is { H } " 1 ,H″ 2 ,...,H″ i ,...,H″ m }。
Optionally, in the step S4, an unhealthy voice recognition model based on VC dimension theory is constructed, where the unhealthy voice recognition model maps sparse feature vectors of the sparse representation wireless communication signals to a high-dimensional space, and performs detection and recognition of unhealthy voice wireless communication signals in the high-dimensional space, and the method includes:
an unhealthy voice recognition model based on a VC (video coding) dimension theory is constructed, wherein the unhealthy voice recognition model maps sparse feature vectors of wireless communication signals which are sparsely represented to a high-dimension space, and detection and recognition of unhealthy voice wireless communication signals are carried out in the high-dimension space;
based on the VC theory, the sparse feature vector can be subjected to two classification by an assumption space with the VC size of 65D, and a healthy sound wireless communication signal and an unhealthy sound wireless communication signal are obtained through recognition and detection, wherein the dimension of the sparse feature vector is 64D;
The unhealthy voice recognition model is 65-dimensional hypothesis space, and the recognition flow of the unhealthy voice recognition model is as follows:
sparse feature vector H i Input into an unhealthy voice recognition model:
wherein:
representing a high-dimensional nonlinear mapping;
w represents a weight vector, b represents an offset;
u(H″ i ) Representing sparse feature vector H i Corresponding wireless communication signal s i (t i ) U (H) ", of unhealthy voice recognition result i )={0,1},u(H′ i ) =1 denotes a wireless communication signal s i (t i ) U (H) ", which is unhealthy voice wireless communication signal i ) =0 denotes the wireless communication signal s i (t i ) Is a healthy sound wireless communication signal.
Optionally, the training process of the unhealthy voice recognition model includes:
s41: constructing a regression equation of the unhealthy voice recognition model:
wherein:
h represents an input value of the unhealthy voice recognition model;
s42: collecting training data for training an unhealthy voice recognition model to form a training data set data2, wherein each group of training data in the training data set data2 comprises sparse feature vectors and corresponding unhealthy voice recognition results;
s43: obtaining model parameters w and b based on a training data set data2, wherein the calculation formula of the model parameters is as follows:
wherein:
omega represents a sparse feature vector set in the training data set data2, and v represents a sparse feature vector in omega;
N + Indicating the number of healthy acoustic wireless communication signals in the training data set, N - Representing a number of unhealthy acoustic wireless communication signals in the training data set;
representing unhealthy voice recognition results corresponding to the sparse feature vectors;
and (3) taking w and b which enable the model parameter calculation formula to reach the minimum as model parameters obtained through training.
Optionally, in the step S4, the sparse feature vector of the wireless communication signal is input into an unhealthy voice recognition model, the model outputs a detection recognition result of the wireless communication signal, and eliminates unhealthy voice obtained by detection, including:
all wireless communication signal sparse feature vectors received by the learning earphone are input into an unhealthy voice recognition model, the model outputs detection recognition results of wireless communication signals, and the unhealthy voice wireless communication signals obtained through detection are filtered and eliminated, and in the embodiment of the invention, the unhealthy voice wireless communication signal filtering and eliminating method comprises the following steps: the learning earphone sets the cut-off frequency of the signal receiving device to the frequency of the unhealthy sound wireless communication signal so that the wireless communication signal equal to the cut-off frequency cannot be received by the signal receiving device.
In order to solve the above problems, the present invention further provides a device for intelligently eliminating unhealthy sounds based on a learning earphone under air conduction, which is characterized in that the device comprises:
the signal receiving device is used for receiving communication signals of indoor Bluetooth wireless communication and WIFI wireless communication and performing digital-to-analog conversion on the received wireless communication signals;
the signal feature extraction module is used for converting the wireless communication signals into a time-frequency waterfall diagram, extracting structural feature vectors of the wireless communication signals on the time-frequency waterfall diagram by using a multi-scale feature extraction method, and performing sparse representation processing on the extracted structural feature vectors to obtain wireless communication signal sparse feature vectors;
the unhealthy sound eliminating device is used for constructing an unhealthy sound identification model based on the VC dimension theory, inputting the sparse feature vector of the wireless communication signal into the model, outputting the detection and identification result of the wireless communication signal by the model, and eliminating the unhealthy sound wireless communication signal obtained by detection.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one instruction; a kind of electronic device with high-pressure air-conditioning system
And the processor executes the instructions stored in the memory to realize the method for intelligently eliminating unhealthy sounds based on the learning earphone under the air conduction.
In order to solve the above problems, the present invention further provides a computer readable storage medium having at least one instruction stored therein, the at least one instruction being executed by a processor in an electronic device to implement the method for intelligently eliminating unhealthy sounds based on learning headphones under air conduction as described above.
Compared with the prior art, the invention provides a method for intelligently eliminating unhealthy sounds by learning headphones based on air conduction, which has the following advantages:
firstly, the scheme provides a time-frequency waterfall diagram for describing time domain information, frequency domain information and intensity of signals and corresponding multi-scale structural features, and the conversion flow of the time-frequency waterfall diagram is as follows: for any analog signal in set SPerforming time-frequency conversion:
wherein: s is(s) i (0) Representing analog signalsThe value of the 1 st sampling signal point, s i (N i -1) represents an analog signal ∈ ->The value of the last sampling signal point, N i Representing analog signal +.>The number of mid-sample signal points; j represents an imaginary unit, j 2 -1; z (·) represents the window function, L represents the length of the window function, z * (. Cndot.) represents the complex conjugate of the window function; f (F) i Represents any analog signal +. >Time-frequency conversion results of (2); calculation ofArbitrary analog signal +.>Signal strength En of (2) i
En i =(F i ) 2
Constructing a time-frequency waterfall diagram of the analog signal based on the time domain information, the frequency domain information and the signal strength of the analog signal, wherein any analog signalThe result of the time domain waterfall diagram is G i (t i ,f i ,En i ) Wherein the time domain waterfall graph G i (t i ,f i ,En i ) For a two-dimensional color image, a rectangular coordinate system is established by taking the lower left corner of the image as an origin, and the horizontal axis of the coordinate system represents analog signals +.>Frequency domain range f of (2) i The vertical axis represents analog signal +.>Time domain range t of (2) i The corresponding frequency of any pixel point (x, y) in the image is x, the signal with the time point being y, the pixel color represents the signal intensity of the signal on the corresponding time point and the frequency point, the higher the signal intensity is, the closer the color is to red, and for any time-frequency waterfall graph G in the time-frequency waterfall graph set i (t i ,f i ,En i ) The multi-scale feature extraction method comprises the following steps: converting pixel values of any pixel (x, y) in the time-frequency waterfall diagram into gray values, wherein a conversion formula of the pixel values is as follows:
g(x,y)=0.299×R(x,y)+0.587×G(x,y)+0.114×B(x,y)
wherein: r (x, y) represents the value of pixel (x, y) in the red color channel, G (x, y) represents the value of pixel (x, y) in the green color channel, and B (x, y) represents the value of pixel (x, y) in the blue color channel; g (x, y) represents the gray value of the pixel (x, y); calculating the gradient value and the gradient direction of any pixel (x, y), wherein the calculation formulas of the gradient value and the gradient direction of the pixel are as follows:
Wherein: α (x, y) represents the gradient value of the pixel (x, y); beta (x, y) represents the gradient direction of the pixel (x, y); time-frequency waterfall graph G using a sliding window of 16 x 16 pixel size i (t i ,f i ,En i ) Performing sliding segmentation, wherein the sliding segmentation adopts a sequence from left to right and from top to bottom, the image specification of the time-frequency waterfall is 64 multiplied by 128, and 32 pixel blocks are obtained by total segmentation; calculating the gradient value and gradient direction of pixels in each pixel block, and dividing the statistical interval of gradient directions into (0, 90 DEG)],(90°,180°],(180°,270°],(270°,360°]Calculating a gradient distribution histogram of each pixel block, wherein the horizontal axis of the gradient distribution histogram is a gradient direction statistical interval, and the vertical axis is the sum of pixel gradients belonging to any gradient direction statistical interval in the pixel block; combining gradient distribution histograms of 32 pixel blocks, the combined result H i The method comprises the following steps:
wherein: h 1,i Representing a time-frequency waterfall graph G i (t i ,f i ,En i ) Gradient distribution histograms of the 1 st pixel block in 4 directions; h i Representing a time-frequency waterfall graph G i (t i ,f i ,En i ) A combination result of the medium gradient distribution histogram; the combination result comprises 128 characteristic points of the time-frequency waterfall diagram in 4 directions, corresponding to H i And carrying out normalization processing, wherein the normalization processing formula is as follows:
wherein: Represents an L2 norm; epsilon=0.01; h i ' time-frequency waterfall graph G i (t i ,f i ,En i ) Is a structural feature vector of (1); compared with the prior art, the method has the advantages that the time-frequency waterfall diagram of the wireless communication signal is constructed, the time-frequency waterfall diagram contains time domain information, frequency domain information and signal intensity of the communication signal, the complete information of the communication signal is represented, the structural characteristics of the time-frequency waterfall diagram are obtained by utilizing the multi-scale image structural characteristic extraction method, the extracted structural characteristics effectively represent the signal characteristics of the communication signal on different scales, the multi-scale structural characteristics have geometric and optical conversion invariance, the influence of the signal intensity change of a geometric local area on the extracted characteristics is avoided, and the method has better globally.
Meanwhile, the scheme provides a feature vector sparse method and a high-dimensional space recognition detection method, the sparse feature vector of the wireless communication signal is obtained by performing sparse representation processing on the extracted structural feature vector, and the sparse representation processing flow is as follows: constructing a dictionary for sparse representation, and initializing the dictionary to D 0 The dimension of the dictionary is 64×128 dimensions; setting the current iteration number of the dictionary as d, and setting the initial value of d as 0; collecting a training set data1 for dictionary training, wherein samples in the training set data1 are image structure feature vectors; constructing sparse coefficient representation of each sample in the training set data1, and then the sparse coefficient of the q-th sample in the training set data1 is represented as:
Wherein: τ q For the q-th sample data1 in the training set data1 q Is represented by a sparse coefficient of (a); d (D) d A dictionary representing a d-th dictionary iteration; the sparse coefficient representation of all samples in the training set data1 is formed into a sparse coefficient representation matrix tau, each column in the matrix tau represents the sparse coefficient representation of the sample, and the dictionary is updated for the (d+1) th time:
D d+1 =τ T (ττ T ) -1
judgingWhether or not it is smaller than the threshold value, if->Less than the threshold value, D d+1 Overcomplete dictionary D obtained for training * τ is the corresponding sparse coefficient representation matrix 4; constructing an objective function of structural feature vector sparse processing:
wherein: h' i Representing structural feature vector H' i Sparse feature vectors after sparse processing; the constraint conditions of the objective function are as follows:
wherein: τ r Representing row vectors in a matrix for sparse coefficients, ε 0 For the sparsity constraint threshold value, I.I 0 Represents an L0 norm; converting the training objective function into a Lagrangian function:
wherein: lambda is the Lagrangian multiplier; h', which will minimize the Lagrangian function i And as a sparse feature vector after the sparse representation processing, the dimension of the sparse feature vector is 64 dimensions. ConstructionAn unhealthy voice recognition model based on a VC (video coding) dimension theory, wherein the unhealthy voice recognition model maps sparse feature vectors of wireless communication signals which are sparsely represented to a high-dimension space, and detection and recognition of unhealthy voice wireless communication signals are carried out in the high-dimension space; based on the VC theory, the sparse feature vector can be subjected to two classification by an assumption space with the VC size of 65D, and a healthy sound wireless communication signal and an unhealthy sound wireless communication signal are obtained through recognition and detection, wherein the dimension of the sparse feature vector is 64D; the unhealthy voice recognition model is 65-dimensional hypothesis space, and the recognition flow of the unhealthy voice recognition model is as follows: sparse feature vector H i Input into an unhealthy voice recognition model:
wherein:representing a high-dimensional nonlinear mapping; w represents a weight vector, b represents an offset; u (H) " i ) Representing sparse feature vector H i Corresponding wireless communication signal s i (t i ) U (H) ", of unhealthy voice recognition result i )={0,1},u(H″ i ) =1 denotes a wireless communication signal s i (t i ) U (H) ", which is unhealthy voice wireless communication signal i ) =0 denotes the wireless communication signal s i (t i ) Is a healthy sound wireless communication signal. Compared with the traditional scheme, the method has the advantages that the dimension reduction of the structural feature vector is realized based on the sparse representation method, the calculation amount of detection and identification of subsequent unhealthy sound wireless communication signals is reduced, and the unhealthy sound identification model based on the VC dimension theory is obtained, so that the unhealthy sound wireless communication signals can be accurately detected and identified in a high-dimension space.
Drawings
Fig. 1 is a schematic flow chart of a method for intelligently eliminating unhealthy sounds based on a learning earphone under air conduction according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating one of the steps in the embodiment of FIG. 1;
FIG. 3 is a functional block diagram of an apparatus for intelligently eliminating unhealthy sounds based on a learning earphone under air conduction according to an embodiment of the present invention;
Fig. 4 is a schematic structural diagram of an electronic device for implementing a method for intelligently eliminating unhealthy sounds based on a learning earphone under air conduction according to an embodiment of the present application.
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The embodiment of the application provides a method for intelligently eliminating unhealthy sounds by learning headphones based on air conduction. The execution subject of the method for intelligently eliminating unhealthy sounds by learning headphones based on air conduction includes, but is not limited to, at least one of a server, a terminal and the like, which can be configured to execute the method provided by the embodiment of the application. In other words, the method for intelligently eliminating unhealthy sounds based on the learning headphones under air conduction may be performed by software or hardware installed in a terminal device or a server device, where the software may be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1:
s1: and respectively receiving communication signals of indoor Bluetooth wireless communication and WIFI wireless communication, and converting the received communication signals into a time-frequency waterfall diagram, wherein the time-frequency waterfall diagram comprises time domain information, frequency domain information and signal strength of the communication signals.
In the step S1, communication signals of indoor bluetooth wireless communication and WIFI wireless communication are received respectively, and the method includes:
the learning earphone receives wireless communication signals respectively by utilizing a built-in signal receiving device, wherein the wireless communication signals comprise Bluetooth wireless communication signals and WIFI wireless communication signals, the indoor Bluetooth wireless communication signals comprise wireless communication signals of indoor Internet of things equipment and mobile communication equipment, and the WIFI wireless communication signals comprise wireless communication signals of various terminal equipment under a local area network;
the wireless communication signal set S received by the signal receiving device is:
S={s 1 (t 1 ),s 2 (t 2 ),...,s i (t i ),...,s m (t m )}
wherein:
s i (t i ) Representing the ith wireless communication signal received by the signal receiving apparatus, t i Representing a wireless communication signal s i (t i ) Time domain information, t i ∈[t i,0 ,t i,end ],t i,0 Indicating receipt of a wireless communication signal s i (t i ) Time t of (2) i,end Representing a wireless communication signal s i (t i ) Is the vanishing time of (2);
A i signal amplitude s representing a wireless communication signal i (t i ),f i Representing a wireless communication signal s i (t i ) Is used for the frequency of (a),representing a wireless communication signal s i (t i ) Is a phase of the initial phase of (a);
m represents the number of wireless communication signals received by the signal receiving device;
the digital-to-analog converter in the signal receiving device converts the received wireless communication signal into an analog signal, and in detail, referring to fig. 2, the conversion flow of the analog signal is as follows:
s11: construction frequency f s Is a sequence of impulses:
any wireless communication signal S in the set of wireless communication signals S i (t i ) Is the impulse sequence of (1)
S12: based on the impulse sequence of the wireless communication signal, any wireless communication signal S in the wireless communication signal set S is processed i (t i ) Conversion to analogue signals
S13: the digital-to-analog converter converts the wireless communication signals in the wireless communication signal set S into analog signals to obtain a converted set
The step S1 is to convert the received wireless communication signal into a time-frequency waterfall diagram, and comprises the following steps:
the time-frequency converter built in the learning earphone converts the analog signals in the set S' into a time-frequency waterfall diagram, and the conversion flow of the time-frequency waterfall diagram is as follows:
for any analog signal in set SPerforming time-frequency conversion:
wherein:
s i (0) Representing analog signalsThe value of the 1 st sampling signal point, s i (N i -1) representing an analog signalThe value of the last sampling signal point, N i Representing analog signal +.>The number of mid-sample signal points;
j represents an imaginary unit, j 2 =-1;
z (·) represents the window function, L represents the length of the window function, z * (. Cndot.) represents the complex conjugate of the window function;
F i representing any analog signal in set STime-frequency conversion results of (2);
calculating arbitrary analog signalsSignal strength En of (2) i
En i =(F i ) 2
Constructing a time-frequency waterfall diagram of the analog signal based on the time domain information, the frequency domain information and the signal strength of the analog signal, wherein any analog signalThe result of the time domain waterfall diagram is G i (t i ,f i ,En i ) Wherein the time domain waterfall graph G i (t i ,f i ,En i ) For a two-dimensional color image, a rectangular coordinate system is established by taking the lower left corner of the image as an origin, and the horizontal axis of the coordinate system represents analog signals +.>Frequency domain range f of (2) i The vertical axis represents analog signal +.>Time domain range t of (2) i The corresponding frequency of any pixel point (x, y) in the image is x, the signal with the time point being y, the pixel color represents the signal intensity of the signal at the corresponding time point and the frequency point, and the higher the signal intensity is, the closer the color is to red;
the set of time-frequency waterfall diagrams of analog signals in the set S' is as follows: { G 1 (t 1 ,f 1 ,En 1 ),...,G m (t m ,f m ,En m ) And (2) transmitting the time-frequency waterfall graph set to a processor in the learning earphone.
S2: and extracting structural feature vectors of the wireless communication signals on the time-frequency waterfall by using a multi-scale feature extraction method, wherein a directional gradient histogram feature extraction algorithm is a main multi-scale feature extraction method.
In the step S2, a multi-scale feature extraction method is used to extract a structural feature vector of the wireless communication signal on a time-frequency waterfall diagram, including:
the processor in the learning earphone utilizes a multi-scale feature extraction method to extract structural feature vectors of wireless communication signals on a time-frequency waterfall diagram, wherein a direction gradient histogram feature extraction algorithm is a multi-scale feature extraction main method, and for any time-frequency waterfall diagram G in a time-frequency waterfall diagram set i (t i ,f i ,En i ) The multi-scale feature extraction method comprises the following steps:
s21: converting pixel values of any pixel (x, y) in the time-frequency waterfall diagram into gray values, wherein a conversion formula of the pixel values is as follows:
g(x,y)=0.299×R(x,y)+0.587×G(x,y)+0.114×B(x,y)
wherein:
r (x, y) represents the value of pixel (x, y) in the red color channel, G (x, y) represents the value of pixel (x, y) in the green color channel, and B (x, y) represents the value of pixel (x, y) in the blue color channel;
g (x, y) represents the gray value of the pixel (x, y);
s22: calculating the gradient value and the gradient direction of any pixel (x, y), wherein the calculation formulas of the gradient value and the gradient direction of the pixel are as follows:
Wherein:
α (x, y) represents the gradient value of the pixel (x, y);
beta (x, y) represents the gradient direction of the pixel (x, y);
s23: time-frequency waterfall graph G using a sliding window of 16 x 16 pixel size i (t i ,f i ,En i ) Performing sliding segmentation, wherein the sliding segmentation adopts a sequence from left to right and from top to bottom, the image specification of the time-frequency waterfall is 64 multiplied by 128, and 32 pixel blocks are obtained by total segmentation;
s24: calculating a gradient value and a gradient direction of pixels in each pixel block, dividing a gradient direction statistical interval into (0 DEG, 90 DEG ], (90 DEG, 180 DEG ], (180 DEG, 270 DEG ], (270 DEG, 360 DEG ], and calculating to obtain a gradient distribution histogram of each pixel block, wherein the horizontal axis of the gradient distribution histogram is the gradient direction statistical interval, and the vertical axis is the sum of pixel gradients belonging to any gradient direction statistical interval in the pixel block;
s25: combining gradient distribution histograms of 32 pixel blocks, the combined result H i The method comprises the following steps:
wherein:
H 1,i representing a time-frequency waterfall graph G i (t i ,f i ,En i ) Gradient distribution histograms of the 1 st pixel block in 4 directions;
H i representing a time-frequency waterfall graph G i (t i ,f i ,En i ) A combination result of the medium gradient distribution histogram;
the combination result comprises 128 characteristic points of the time-frequency waterfall diagram in 4 directions, corresponding to H i And carrying out normalization processing, wherein the normalization processing formula is as follows:
wherein:
represents an L2 norm;
ε=0.01;
H′ i for a time-frequency waterfall graph G i (t i ,f i ,En i ) Is a structural feature vector of (1);
and extracting structural feature vectors of all the time-frequency waterfall graphs in the time-frequency waterfall graph set based on a multi-scale feature extraction method.
S3: and carrying out sparse representation processing on the extracted structural feature vector to obtain a wireless communication signal sparse feature vector, wherein the sparse representation based on the overcomplete dictionary is a main method for the structural feature vector sparse processing.
And in the step S3, sparse representation processing is carried out on the extracted structural feature vector, and the method comprises the following steps:
performing sparse representation processing on the extracted structural feature vectors to obtain wireless communication signal sparse feature vectors, wherein the sparse representation based on an overcomplete dictionary is a main method for the structural feature vector sparse processing, and the sparse representation processing flow is as follows:
s31: constructing a dictionary for sparse representation, and initializing the dictionary to D 0 The dimension of the dictionary is 64×128 dimensions;
s32: setting the current iteration number of the dictionary as d, and setting the initial value of d as 0;
s33: collecting a training set data1 for dictionary training, wherein samples in the training set data1 are image structure feature vectors;
S34: constructing sparse coefficient representation of each sample in the training set datal, and then the sparse coefficient of the q-th sample in the training set data1 is represented as follows:
wherein:
τ q for the q-th sample data1 in the training set data1 q Is represented by a sparse coefficient of (a);
D d a dictionary representing a d-th dictionary iteration;
s35: the sparse coefficient representation of all samples in the training set data1 is formed into a sparse coefficient representation matrix tau, each column in the matrix tau represents the sparse coefficient representation of the sample, and the dictionary is updated for the (d+1) th time:
D d+1 =τ T (ττ T ) -1
judgingWhether or not it is smaller than the threshold value, if->Less than the threshold value, D d+1 Overcomplete dictionary D obtained for training * τ is a corresponding sparse coefficient representation matrix; if not, d=d+1, and the process returns to step S34;
s36: constructing an objective function of structural feature vector sparse processing:
wherein:
H″ i representing structural feature vector H' i Sparse feature vectors after sparse processing;
the constraint conditions of the objective function are as follows:
wherein:
τ r representing row vectors in a matrix for sparse coefficients, ε 0 For the sparsity constraint threshold value, I.I 0 Represents an L0 norm;
s37: converting the training objective function into a Lagrangian function:
wherein:
lambda is the Lagrangian multiplier;
h', which will minimize the Lagrangian function i As a sparse feature vector after sparse representation processing, the dimension of the sparse feature vector is 64 dimensions; the sparse feature vector set of all wireless communication signals received by the learning earphone is { H } " 1 ,H″ 2 ,...,H″ i ,...,H″ m }。
S4: an unhealthy voice recognition model based on a VC (video coding) dimension theory is constructed, a wireless communication signal sparse feature vector is input into the model, the model outputs a detection and recognition result of a wireless communication signal, and the unhealthy voice wireless communication signal obtained by detection is eliminated, wherein the unhealthy voice recognition model maps the sparse feature vector of the wireless communication signal which is sparsely represented to a high-dimension space, and detection and recognition of the unhealthy voice wireless communication signal are carried out in the high-dimension space.
In the step S4, an unhealthy voice recognition model based on VC dimension theory is constructed, where the unhealthy voice recognition model maps sparse feature vectors of wireless communication signals with sparse representation to a high-dimensional space, and performs detection and recognition of unhealthy voice wireless communication signals in the high-dimensional space, and the method includes:
an unhealthy voice recognition model based on a VC (video coding) dimension theory is constructed, wherein the unhealthy voice recognition model maps sparse feature vectors of wireless communication signals which are sparsely represented to a high-dimension space, and detection and recognition of unhealthy voice wireless communication signals are carried out in the high-dimension space;
based on the VC theory, the sparse feature vector can be subjected to two classification by an assumption space with the VC size of 65D, and a healthy sound wireless communication signal and an unhealthy sound wireless communication signal are obtained through recognition and detection, wherein the dimension of the sparse feature vector is 64D;
The unhealthy voice recognition model is 65-dimensional hypothesis space, and the recognition flow of the unhealthy voice recognition model is as follows:
sparse feature vector H i Input into an unhealthy voice recognition model:
wherein:
representing a high-dimensional nonlinear mapping;
w represents a weight vector, b represents an offset;
u(H″ i ) Representing sparse feature vector H i Corresponding wireless communication signal s i (t i ) U (H) ", of unhealthy voice recognition result i )={0,1},u(H″ i ) =1 denotes a wireless communication signal s i (t i ) U (H) ", which is unhealthy voice wireless communication signal i ) =0 indicates noLine communication signal s i (t i ) Is a healthy sound wireless communication signal.
The training process of the unhealthy voice recognition model comprises the following steps:
s41: constructing a regression equation of the unhealthy voice recognition model:
wherein:
h represents an input value of the unhealthy voice recognition model;
s42: collecting training data for training an unhealthy voice recognition model to form a training data set data2, wherein each group of training data in the training data set data2 comprises sparse feature vectors and corresponding unhealthy voice recognition results;
s43: obtaining model parameters w and b based on a training data set data2, wherein the calculation formula of the model parameters is as follows:
wherein:
omega represents a sparse feature vector set in the training data set data2, and v represents a sparse feature vector in omega;
N + Indicating the number of healthy acoustic wireless communication signals in the training data set, N - Representing a number of unhealthy acoustic wireless communication signals in the training data set;
representing unhealthy voice recognition results corresponding to the sparse feature vectors;
and (3) taking w and b which enable the model parameter calculation formula to reach the minimum as model parameters obtained through training.
In the step S4, the sparse feature vector of the wireless communication signal is input into an unhealthy voice recognition model, the model outputs a detection recognition result of the wireless communication signal, and eliminates unhealthy voice obtained by detection, including:
all wireless communication signal sparse feature vectors received by the learning earphone are input into an unhealthy voice recognition model, the model outputs detection recognition results of wireless communication signals, and the unhealthy voice wireless communication signals obtained through detection are filtered and eliminated, and in the embodiment of the invention, the unhealthy voice wireless communication signal filtering and eliminating method comprises the following steps: the learning earphone sets the cut-off frequency of the signal receiving device to the frequency of the unhealthy sound wireless communication signal so that the wireless communication signal equal to the cut-off frequency cannot be received by the signal receiving device.
Example 2:
Fig. 3 is a functional block diagram of an apparatus for intelligently eliminating unhealthy sounds based on a learning earphone under air conduction according to an embodiment of the present invention, which can implement the method for intelligently eliminating unhealthy sounds based on the learning earphone in embodiment 1.
The learning earphone intelligent unhealthy sound eliminating device 100 based on air conduction can be installed in an electronic device. Depending on the functions implemented, the learning earphone intelligent unhealthy sound eliminating device based on air conduction may include a signal receiving device 101, a signal feature extraction module 102, and an unhealthy sound eliminating device 103. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
A signal receiving device 101, configured to receive communication signals of indoor bluetooth wireless communication and WIFI wireless communication, and perform digital-to-analog conversion on the received wireless communication signals;
the signal feature extraction module 102 is configured to convert a wireless communication signal into a time-frequency waterfall graph, extract a structural feature vector of the wireless communication signal on the time-frequency waterfall graph by using a multi-scale feature extraction method, and perform sparse representation processing on the extracted structural feature vector to obtain a sparse feature vector of the wireless communication signal;
The unhealthy sound eliminating device 103 is configured to construct an unhealthy sound identification model based on VC dimension theory, input a sparse feature vector of a wireless communication signal into the model, output a detection and identification result of the wireless communication signal by the model, and eliminate the unhealthy sound wireless communication signal obtained by detection.
In detail, the modules in the learning earphone intelligent canceling unhealthy sound device 100 based on air conduction in the embodiment of the present invention use the same technical means as the method for canceling unhealthy sound based on air conduction as described in fig. 1, and can produce the same technical effects, which are not described herein.
Example 3:
fig. 4 is a schematic structural diagram of an electronic device for implementing a method for intelligently eliminating unhealthy sounds based on learning headphones under air conduction according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as codes of the program 12, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective parts of the entire electronic device using various interfaces and lines, executes or executes programs or modules (a program 12 for intelligently eliminating unhealthy sounds, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.
The bus may be a peripheral component interconnect standard (peripheralcomponent interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 4 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
communication signals of indoor Bluetooth wireless communication and WIFI wireless communication are received respectively, and the received communication signals are converted into a time-frequency waterfall diagram;
extracting structural feature vectors of the wireless communication signals on a time-frequency waterfall graph by using a multi-scale feature extraction method;
Performing sparse representation processing on the extracted structural feature vector to obtain a wireless communication signal sparse feature vector;
an unhealthy sound identification model based on VC theory is constructed, a wireless communication signal sparse feature vector is input into the model, the model outputs a detection and identification result of the wireless communication signal, and the unhealthy sound wireless communication signal obtained through detection is eliminated.
Specifically, the specific implementation method of the above instruction by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 4, which are not repeated herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one does not exclude the presence of other like elements in a process, apparatus, article, or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (8)

1. A method for intelligently eliminating unhealthy sounds based on learning headphones under air conduction, the method comprising:
S1: receiving communication signals of indoor Bluetooth wireless communication and WIFI wireless communication respectively, and converting the received communication signals into a time-frequency waterfall diagram, wherein the time-frequency waterfall diagram comprises time domain information, frequency domain information and signal strength of the communication signals;
s2: extracting structural feature vectors of wireless communication signals on a time-frequency waterfall by using a multi-scale feature extraction method, wherein a directional gradient histogram feature extraction algorithm is the multi-scale feature extraction method;
s3: performing sparse representation processing on the extracted structural feature vector to obtain a wireless communication signal sparse feature vector, wherein the sparse representation based on the overcomplete dictionary is the structural feature vector sparse processing method, which comprises the following steps: and in the step S3, sparse representation processing is carried out on the extracted structural feature vector, and the method comprises the following steps:
performing sparse representation processing on the extracted structural feature vectors to obtain wireless communication signal sparse feature vectors, wherein the sparse representation based on an overcomplete dictionary is a structural feature vector sparse processing method, and the sparse representation processing flow is as follows:
s31: constructing a dictionary for sparse representation, and initializing the dictionary to D 0 The dimension of the dictionary is 64×128 dimensions;
s32: setting the current iteration number of the dictionary as d, and setting the initial value of d as 0;
s33: collecting a training set data1 for dictionary training, wherein samples in the training set data1 are image structure feature vectors;
s34: constructing sparse coefficient representation of each sample in the training set data1, and then the sparse coefficient of the q-th sample in the training set data1 is represented as:
wherein:
τ q for the q-th sample data1 in the training set data1 q Is represented by a sparse coefficient of (a);
D d a dictionary representing a d-th dictionary iteration;
s35: the sparse coefficient representation of all samples in the training set data1 is formed into a sparse coefficient representation matrix tau, each column in the matrix tau represents the sparse coefficient representation of the sample, and the dictionary is updated for the (d+1) th time:
D d+1 =τ T (ττ T ) -1
judgingWhether or not it is smaller than the threshold value, if->Less than the threshold value, D d+1 Overcomplete dictionary D obtained for training * τ is a corresponding sparse coefficient representation matrix; if not, d=d+1, and the process returns to step S34;
s36: constructing an objective function of structural feature vector sparse processing:
wherein:
H″ i representing structural feature vector H' i Sparse feature vectors after sparse processing;
the constraint conditions of the objective function are as follows:
wherein:
τ r representing row vectors in a matrix for sparse coefficients, ε 0 For the sparsity constraint threshold value, I.I 0 Represents an L0 norm;
s37: converting the training objective function into a Lagrangian function:
wherein:
lambda is the Lagrangian multiplier;
h', which will minimize the Lagrangian function i As a sparse feature vector after sparse representation processing, the dimension of the sparse feature vector is 64 dimensions; the sparse feature vector set of all wireless communication signals received by the learning earphone is { H } " 1 ,H″ 2 ,…,H″ i ,,H″ m };
S4: an unhealthy voice recognition model based on a VC (video coding) dimension theory is constructed, a wireless communication signal sparse feature vector is input into the model, the model outputs a detection and recognition result of a wireless communication signal, and the unhealthy voice wireless communication signal obtained by detection is eliminated, wherein the unhealthy voice recognition model maps the sparse feature vector of the wireless communication signal which is sparsely represented to a high-dimension space, and detection and recognition of the unhealthy voice wireless communication signal are carried out in the high-dimension space.
2. The method for intelligently eliminating unhealthy sounds by using a learning earphone based on air conduction as set forth in claim 1, wherein in step S1, communication signals of indoor bluetooth wireless communication and WIFI wireless communication are received respectively, and the method comprises:
The learning earphone receives wireless communication signals respectively by utilizing a built-in signal receiving device, wherein the wireless communication signals comprise Bluetooth wireless communication signals and WIFI wireless communication signals, the indoor Bluetooth wireless communication signals comprise wireless communication signals of indoor Internet of things equipment and mobile communication equipment, and the WIFI wireless communication signals comprise wireless communication signals of various terminal equipment under a local area network;
the wireless communication signal set S received by the signal receiving device is:
S={s 1 (t 1 ),s 2 (t 2 ),…,s i (t i ),…,s m (t m )}
wherein:
s i (t i ) Representing the ith wireless communication signal received by the signal receiving apparatus, t i Representing a wireless communication signal s i (t i ) Time domain information, t i ∈[t i,0 ,t i,end ],t i,0 Indicating receipt of a wireless communication signal s i (t i ) Time t of (2) i,end Representing a wireless communication signal s i (t i ) Is the vanishing time of (2);
A i signal amplitude s representing a wireless communication signal i (t i ),f i Representing a wireless communication signal s i (t i ) Is used for the frequency of (a),representing a wireless communication signal s i (t i ) Is a phase of the initial phase of (a);
m represents the number of wireless communication signals received by the signal receiving device;
the digital-to-analog converter in the signal receiving device converts the received wireless communication signal into an analog signal, and the conversion flow of the analog signal is as follows:
s11: construction frequency f s Is a sequence of impulses:
Any wireless communication signal S in the set of wireless communication signals S i (t i ) Is the impulse sequence of (1)
S12: based on the impulse sequence of the wireless communication signal, any wireless communication signal S in the wireless communication signal set S is processed i (t i ) Conversion to analogue signals
S13: the digital-to-analog converter converts the wireless communication signals in the wireless communication signal set S into analog signals according to the analog signal conversion flow to obtain a converted set
3. The method for intelligently eliminating unhealthy sounds by using a learning earphone based on air conduction according to claim 2, wherein the converting the received wireless communication signal into a time-frequency waterfall diagram in step S1 comprises:
the time-frequency converter built in the learning earphone converts the analog signals in the set S' into a time-frequency waterfall diagram, and the conversion flow of the time-frequency waterfall diagram is as follows:
for any analog signal in set SPerforming time-frequency conversion:
wherein:
s i (0) Representing analog signalsThe value of the 1 st sampling signal point, s i (N i -1) represents an analog signal ∈ ->The value of the last sampling signal point, N i Representing analog signal +.>The number of mid-sample signal points;
j represents an imaginary unit, j 2 =-1;
z (·) represents the window function, L represents the length of the window function, z * (. Cndot.) represents the complex conjugate of the window function;
F i representing any analog signal in set STime-frequency conversion results of (2);
calculating arbitrary analog signalsSignal strength En of (2) i
En i =(F i ) 2
Constructing a time-frequency waterfall diagram of the analog signal based on the time domain information, the frequency domain information and the signal strength of the analog signal, wherein any analog signalThe result of the time domain waterfall diagram is G i (t i ,f i ,En i );
The set of time-frequency waterfall diagrams of analog signals in the set S' is as follows: { G 1 (t 1 ,f 1 ,En 1 ),…,G m (t m ,f m ,En m ) And (2) transmitting the time-frequency waterfall graph set to a processor in the learning earphone.
4. The method for intelligently eliminating unhealthy sounds by using a learning earphone based on air conduction as set forth in claim 3, wherein the step S2 of extracting structural feature vectors of wireless communication signals on a time-frequency waterfall by using a multi-scale feature extraction method comprises:
the processor in the learning earphone utilizes a multi-scale feature extraction method to extract structural feature vectors of wireless communication signals on a time-frequency waterfall diagram, wherein the directional gradient histogram feature extraction algorithm is the multi-scale feature extraction method, and for any time-frequency waterfall diagram G in a time-frequency waterfall diagram set i (t i ,f i ,En i ) The multi-scale feature extraction method comprises the following steps:
s21: converting pixel values of any pixel (x, y) in the time-frequency waterfall diagram into gray values, wherein a conversion formula of the pixel values is as follows:
g(x,y)=0.299×R(x,y)+0.587×G(x,y)+0.114×B(x,y)
Wherein:
r (x, y) represents the value of pixel (x, y) in the red color channel, G (x, y) represents the value of pixel (x, y) in the green color channel, and B (x, y) represents the value of pixel (x, y) in the blue color channel;
g (x, y) represents the gray value of the pixel (x, y);
s22: calculating the gradient value and the gradient direction of any pixel (x, y), wherein the calculation formulas of the gradient value and the gradient direction of the pixel are as follows:
wherein:
α (x, y) represents the gradient value of the pixel (x, y);
beta (x, y) represents the gradient direction of the pixel (x, y);
s23: time-frequency waterfall graph G using a sliding window of 16 x 16 pixel size i (t i ,f i ,En i ) Performing sliding segmentation, wherein the sliding segmentation adopts a sequence from left to right and from top to bottom, the image specification of the time-frequency waterfall is 64 multiplied by 128, and 32 pixel blocks are obtained by total segmentation;
s24: calculating the gradient value and gradient direction of the pixels in each pixel block, and dividing the statistical interval of the gradient direction into (0) ° ,90 ° ],(90 ° ,180 ° ],(180 ° ,270 ° ],(270 ° ,360 ° ]Calculating a gradient distribution histogram of each pixel block, wherein the horizontal axis of the gradient distribution histogram is a gradient direction statistical interval, and the vertical axis is the sum of pixel gradients belonging to any gradient direction statistical interval in the pixel block;
s25: combining gradient distribution histograms of 32 pixel blocks, the combined result H i The method comprises the following steps:
wherein:
H 1,i representing a time-frequency waterfall graph G i (t i ,f i ,En i ) Gradient distribution histograms of the 1 st pixel block in 4 directions;
H i representing a time-frequency waterfall graph G i (t i ,f i ,En i ) A combination result of the medium gradient distribution histogram;
the combination result comprises 128 characteristic points of the time-frequency waterfall diagram in 4 directions, corresponding to H i And carrying out normalization processing, wherein the normalization processing formula is as follows:
wherein:
represents an L2 norm;
ε=0.01;
H i for a time-frequency waterfall graph G i (t i ,f i ,En i ) Is a structural feature vector of (1);
and extracting structural feature vectors of all the time-frequency waterfall graphs in the time-frequency waterfall graph set based on a multi-scale feature extraction method.
5. The method for intelligently eliminating unhealthy sounds by using a learning earphone based on air conduction according to claim 1, wherein in the step S4, an unhealthy sound recognition model based on VC dimension theory is constructed, the unhealthy sound recognition model maps sparse feature vectors of wireless communication signals with sparse representation to a high-dimensional space, and performs detection and recognition of unhealthy sound wireless communication signals in the high-dimensional space, and the method comprises the following steps:
an unhealthy voice recognition model based on a VC (video coding) dimension theory is constructed, wherein the unhealthy voice recognition model maps sparse feature vectors of wireless communication signals which are sparsely represented to a high-dimension space, and detection and recognition of unhealthy voice wireless communication signals are carried out in the high-dimension space;
Based on the VC theory, the sparse feature vector can be subjected to two classification by an assumption space with the VC size of 65D, and a healthy sound wireless communication signal and an unhealthy sound wireless communication signal are obtained through recognition and detection, wherein the dimension of the sparse feature vector is 64D;
the unhealthy voice recognition model is 65-dimensional hypothesis space, and the recognition flow of the unhealthy voice recognition model is as follows:
will sparse feature vector H i "input to notIn the healthy sound recognition model:
wherein:
representing a high-dimensional nonlinear mapping;
w represents a weight vector, b represents an offset;
u(H″ i ) Representing sparse feature vector H i Corresponding wireless communication signal s i (t i ) U (H) ", of unhealthy voice recognition result i )={0,1},u(H″ i ) =1 denotes a wireless communication signal s i (t i ) U (H) ", which is unhealthy voice wireless communication signal i ) =0 denotes the wireless communication signal s i (t i ) Is a healthy sound wireless communication signal.
6. The method for intelligently eliminating unhealthy sounds by learning headphones based on air conduction according to claim 5, wherein the training process of the unhealthy sound recognition model comprises:
s41: constructing a regression equation of the unhealthy voice recognition model:
wherein:
h represents an input value of the unhealthy voice recognition model;
S42: collecting training data for training an unhealthy voice recognition model to form a training data set data2, wherein each group of training data in the training data set data2 comprises sparse feature vectors and corresponding unhealthy voice recognition results;
s43: obtaining model parameters w and b based on a training data set data2, wherein the calculation formula of the model parameters is as follows:
wherein:
omega represents a sparse feature vector set in the training data set data2, and v represents a sparse feature vector in omega;
N + indicating the number of healthy acoustic wireless communication signals in the training data set, N - Representing a number of unhealthy acoustic wireless communication signals in the training data set;
representing unhealthy voice recognition results corresponding to the sparse feature vectors;
and (3) taking w and b which enable the model parameter calculation formula to reach the minimum as model parameters obtained through training.
7. The method for intelligently eliminating unhealthy sounds by learning headphones based on air conduction according to claim 1, wherein in step S4, a sparse feature vector of a wireless communication signal is input into an unhealthy sound recognition model, the model outputs a detection recognition result of the wireless communication signal, and eliminates unhealthy sounds obtained by detection, and the method comprises the following steps:
And the sparse feature vectors of all the wireless communication signals received by the learning earphone are input into an unhealthy voice recognition model, the model outputs detection and recognition results of the wireless communication signals, and the unhealthy voice wireless communication signals obtained by detection are filtered and eliminated.
8. An apparatus for intelligently eliminating unhealthy sounds based on learning headphones under air conduction, the apparatus comprising:
the signal receiving device is used for receiving communication signals of indoor Bluetooth wireless communication and WIFI wireless communication and performing digital-to-analog conversion on the received wireless communication signals;
the signal feature extraction module is used for converting the wireless communication signals into a time-frequency waterfall diagram, extracting structural feature vectors of the wireless communication signals on the time-frequency waterfall diagram by using a multi-scale feature extraction method, and performing sparse representation processing on the extracted structural feature vectors to obtain wireless communication signal sparse feature vectors;
the unhealthy sound eliminating device is used for constructing an unhealthy sound identifying model based on VC (vitamin C) theory, inputting a sparse feature vector of a wireless communication signal into the model, outputting a detection and identification result of the wireless communication signal by the model, and eliminating the unhealthy sound wireless communication signal obtained by detection, so as to realize the method for intelligently eliminating unhealthy sound based on the learning earphone under air conduction as claimed in claims 1-7.
CN202211291791.9A 2022-10-20 2022-10-20 Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction Active CN115662395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211291791.9A CN115662395B (en) 2022-10-20 2022-10-20 Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211291791.9A CN115662395B (en) 2022-10-20 2022-10-20 Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction

Publications (2)

Publication Number Publication Date
CN115662395A CN115662395A (en) 2023-01-31
CN115662395B true CN115662395B (en) 2023-11-10

Family

ID=84989991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211291791.9A Active CN115662395B (en) 2022-10-20 2022-10-20 Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction

Country Status (1)

Country Link
CN (1) CN115662395B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327632A (en) * 2021-05-13 2021-08-31 南京邮电大学 Unsupervised abnormal sound detection method and unsupervised abnormal sound detection device based on dictionary learning

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858502B2 (en) * 2014-03-31 2018-01-02 Los Alamos National Security, Llc Classification of multispectral or hyperspectral satellite imagery using clustering of sparse approximations on sparse representations in learned dictionaries obtained using efficient convolutional sparse coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113327632A (en) * 2021-05-13 2021-08-31 南京邮电大学 Unsupervised abnormal sound detection method and unsupervised abnormal sound detection device based on dictionary learning

Also Published As

Publication number Publication date
CN115662395A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
JP7265003B2 (en) Target detection method, model training method, device, apparatus and computer program
Gao et al. Biologically inspired image quality assessment
CN110490296A (en) A kind of method and system constructing convolutional neural networks (CNN) model
CN107909583B (en) Image processing method and device and terminal
CN110717953B (en) Coloring method and system for black-and-white pictures based on CNN-LSTM (computer-aided three-dimensional network-link) combination model
CN109902763B (en) Method and device for generating feature map
CN110826567B (en) Optical character recognition method, device, equipment and storage medium
CN109255774B (en) Image fusion method, device and equipment
WO2017088434A1 (en) Human face model matrix training method and apparatus, and storage medium
CN107145855B (en) Reference quality blurred image prediction method, terminal and storage medium
CN115662395B (en) Method for intelligently eliminating unhealthy sound of learning earphone based on air conduction
CN110970050A (en) Voice noise reduction method, device, equipment and medium
CN112614110B (en) Method and device for evaluating image quality and terminal equipment
WO2022021624A1 (en) Model determining method and related apparatus, terminal, computer readable storage medium, and computer program product
CN113284206A (en) Information acquisition method and device, computer readable storage medium and electronic equipment
CN117055726A (en) Micro-motion control method for brain-computer interaction
CN110245669B (en) Palm key point identification method, device, terminal and readable storage medium
CN114399028B (en) Information processing method, graph convolution neural network training method and electronic equipment
CN110728352A (en) Large-scale image classification method based on deep convolutional neural network
Huang et al. Edge device-based real-time implementation of CycleGAN for the colorization of infrared video
CN112801997B (en) Image enhancement quality evaluation method, device, electronic equipment and storage medium
CN115147870A (en) Pedestrian re-identification method and device
CN114612531A (en) Image processing method and device, electronic equipment and storage medium
CN110929663B (en) Scene prediction method, terminal and storage medium
CN108564580B (en) Image quality evaluation method based on human visual system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant