CN108986834A - The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network - Google Patents

The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network Download PDF

Info

Publication number
CN108986834A
CN108986834A CN201810960512.0A CN201810960512A CN108986834A CN 108986834 A CN108986834 A CN 108986834A CN 201810960512 A CN201810960512 A CN 201810960512A CN 108986834 A CN108986834 A CN 108986834A
Authority
CN
China
Prior art keywords
voice
training
bone conduction
spectrum
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810960512.0A
Other languages
Chinese (zh)
Other versions
CN108986834B (en
Inventor
张雄伟
单东晶
郑昌艳
曹铁勇
李莉
杨吉斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Army Engineering University of PLA
Original Assignee
Army Engineering University of PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Army Engineering University of PLA filed Critical Army Engineering University of PLA
Priority to CN201810960512.0A priority Critical patent/CN108986834B/en
Publication of CN108986834A publication Critical patent/CN108986834A/en
Application granted granted Critical
Publication of CN108986834B publication Critical patent/CN108986834B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a kind of blind Enhancement Methods of the bone conduction voice based on codec framework and recurrent neural network, conductance and bone conduction phonetic feature are extracted first, alignment pretreatment is carried out to the voice feature data of extraction, then it is inputted using bone conduction phonetic feature as training, initiation parameter using conductance voice dictionary combination coefficient as training objective, as encoder in next step;The decoder model based on local attention mechanism is constructed, using encoder output as the input of decoder, using conductance phonetic feature as training objective, joint training codec models, and storage model parameter;Bone conduction phonetic feature to be reinforced is finally extracted, renormalization and feature inverse transformation is carried out using the trained encoding and decoding neural fusion Feature Conversion of above-mentioned steps, then to the output of neural network, finally obtains enhanced time domain speech.The present invention solves the recovery of radio-frequency component, bone conduction unvoiced segments are restored and the problems such as compared with recovery under strong noise background, improves the enhancing quality of bone conduction voice.

Description

The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network
Technical field
It is one kind based on codec framework and the long short-term memory of depth the invention belongs to speech signal processing technology The blind Enhancement Method of bone conduction voice of recurrent neural network.
Background technique
Bone-conduction microphone is a kind of non-sonic transducer equipment, and vocal cord vibration can be transmitted to larynx and skull when people speaks, this Kind microphone is exactly based on this vibration signal of acquisition and is converted to electric signal to obtain voice.With traditional air transmitted Mike Wind voice is different, and ambient noise is difficult to have an impact this kind of non-sonic transducer, so bone conduction voice just shields from sound source Noise has very strong noiseproof feature, military and civil upper be applied.For example, many national in military equipment, example If being provided with the communication system based on bone conduction in armed helicopter, tank, in " the following soldier " Single-soldier system in the U.S. Osophone is its important communication tool, and at civilian aspect, iASUS company of the U.S. transports for limit such as racing car, racing motorcycles It is dynamic, the equipment such as a plurality of larynx microphones, osophone are had developed, the companies such as Japanese Panasonic, Sony also develop a variety of bone conductions Communication product, and it is applied to the neck such as fire-fighting, forestry, petroleum exploration and exploitation, mine, emergency relief, special duty, engineering construction Domain.
Although bone conduction voice can be effective against the interference of ambient noise, due to human body signal conduction low pass with And thick and heavy high frequency section missing, intermediate-frequency section, air-flow sound, nasal cavity sound missing is presented in the inherent characteristics of vibration signal, bone conduction voice Phenomena such as, voice sounds that comparison is dull, not clear enough, has seriously affected the auditory perception of people.In addition, in bone conduction voice Can be mixed into some non-acoustic physical noises, for example, equipment generated with the skin being close to frictional noise, extreme sport when it is powerful The noise etc. that introduces when colliding of wind-force frictional noise, people's chewing or tooth, the communication quality that these noises also reduce.Cause This, carries out the research to bone conduction voice enhancement algorithm, to the practicalization for being pushed further into bone-conduction microphone product, improves strong Voice communication quality under noise circumstance has important theory significance and practical value.
Currently, the blind enhancing of bone conduction voice is there are mainly three types of comparing typical method: unsupervised spread spectrum method, equalization, Spectrum envelope transformation approach.
Unsupervised spread spectrum method (Bouserhal RE, Falk T H, Voix J.In-ear microphone speech quality enhancement via adaptive filtering and artificial bandwidth Extension. [J] .Journal of the Acoustical Society of America.2017) think bone conduction voice There is consistent harmonic structure between the consistent resonance peak structure of conductance voice or the low frequency and high frequency of voice, Using this architectural characteristic, directly low-frequency spectra can be extended, the high-frequency resonance peak or harmonic structure enhanced, i.e., Realize the blind enhancing of bone conduction voice.
The thought of equalization is to find the inverse transform function g (t) of transmission channel transforming function transformation function h (t), from bone conduction voice signal In recover conductance voice signal.Equalization proposes (Shimamura T, Tamiya T.A by Shimamura first reconstruction filter for bone-conducted speech[C].Circuits and Systems, 2005.Midwest Symposium on, 2005.2005:1847-1850), by modeling g (t), and construct inverse filter reality Existing bone conduction speech enhan-cement.Equalization is able to maintain the harmonic structure of low frequency in voice, and is effectively compressed in bone conduction voice excessive Energy, but the high-frequency components being relatively difficult to recover in bone conduction voice.
Currently, the most of blind enhancing of bone conduction voice is using method (Turan, the M.A.T.and converted based on spectrum envelope E.Erzin,Source and Filter Estimation for Throat-Microphone Speech Enhancement [J],2016.Mohammadi,S.H.and A.Kain,An overview of voice conversion systems[J], 2017).The basic ideas of spectrum envelope transformation approach are the source-filter models according to voice, are driving source feature by speech decomposition With spectrum envelope feature.In the training stage, bone conduction voice and conductance voice data synthetic model by analysis, extract incentive characteristic and Spectrum envelope feature establishes the transformational relation between spectrum envelope feature by training transformation model;In the enhancing stage, bone to be reinforced Lead cent solution obtains incentive characteristic and spectrum envelope feature, is estimated from bone conduction speech manual envelope characteristic using trained model Conductance spectrum envelope feature out recycles the envelope estimated and the original envelope characteristic of bone conduction voice to synthesize the voice of enhancing.
Above based on source-Filtering Model decomposing synthesizing process, certain progress is achieved on bone conduction speech enhan-cement, still Performance is undesirable under generally existing feature selecting difficulty, high-noise environment, restores the problems such as inaccurate to voice high-frequency ingredient, leads Cause that enhanced voice low frequency is thick and heavy, not clear enough the intelligibility of sound is insufficient, there are process noises etc..Some research has started to adopt With the decomposing synthesizing process based on signal model, voice signal is divided into higher-dimension amplitude spectrum and phase, and utilizes deep learning skill Art is established the relationship between bone conduction voice and pure gas lead pitch dimension amplitude spectrum, is achieved when restoring bone conduction voice good Effect, but since it is not used due to dictionary introduces structural information etc., however it remains low frequency is thick and heavy, high-frequency information restores not Completely, a series of problems, such as sound is not clear and legible enough.
Summary of the invention
The purpose of the present invention is to provide a kind of blind increasings of the bone conduction voice based on codec framework and recurrent neural network Strong method, is driving with data, obtains model parameter by training, recycles trained model enhancing bone conduction voice, solves The recovery of radio-frequency component, bone conduction unvoiced segments restore and the problems such as compared with recoveries under strong noise background, to promote bone conduction voice Clarity and intelligibility further improve the enhancing quality of bone conduction voice.
The technical solution for realizing the aim of the invention is as follows: a kind of bone based on codec framework and recurrent neural network Lead tone deaf's Enhancement Method, includes the following steps:
Data prediction extracts conductance and bone conduction phonetic feature, carries out alignment pretreatment to the voice feature data of extraction, And conductance voice dictionary is calculated using sparse Non-negative Matrix Factorization (Sparse NMF) on conductance voice feature data;
The pre-training of encoder, using bone conduction phonetic feature as training input, using conductance voice dictionary combination coefficient as Training objective, use is non-negative, sparse long short-term memory recurrent neural network (NS-LSTM) training encoder model, and stores instruction The deep neural network parameter perfected, the initiation parameter as encoder in next step;
The joint training of codec is constructed the decoder model based on local attention mechanism, is made with encoder output For the input of decoder, using conductance phonetic feature as training objective, joint training codec models, and storage model parameter;
Speech enhan-cement extracts bone conduction phonetic feature to be reinforced, utilizes the trained encoding and decoding neural network of above-mentioned steps It realizes Feature Conversion, then renormalization and feature inverse transformation is carried out to the output of neural network, finally obtain enhanced time domain Voice.
Compared with prior art, the present invention its remarkable advantage is: voice dictionary and non-negative sparse recurrent neural network are answered It uses in bone conduction speech enhan-cement task, constructs the codec framework based on local attention mechanism, be driving with data, lead to It crosses training and obtains network model parameter, bone conduction speech enhan-cement quality is effectively promoted using trained model, is specifically included:
(1) it is effectively utilized the structured message that the voice dictionary that sparse Non-negative Matrix Factorization calculates provides, it is preferably heavy Build voice high-frequency ingredient;
(2) output of encoder is the linear combination coefficient of voice dictionary, and voice dictionary passes through Sparse NMF by true Real pure conductance voice extracts, i.e., so that encoder has preferable anti-noise ability, can go automatically during coding Remove the noise in bone conduction voice;
(3) it is effectively utilized sparse non-negative recurrent neural network and models bone conduction voice on the basis of dictionary to conductance voice spy The complex nonlinear relationship for levying conversion, compared to traditional neural network, non-negative sparse neural network passes through specially designed net Network cellular construction, can effectively learn sequence it is long when dependence, and establish and the mapping relations of voice dictionary;
(4) decoder network based on local attention mechanism is determined the input content of decoder by training, makes it Have the recovery capability to bone conduction voice unvoiced segments (corresponding conductance voice is not necessarily noiseless), equally has to very noisy anti- Interference performance can further promote bone conduction voice Quality of recovery.
Present invention is further described in detail with reference to the accompanying drawing.
Detailed description of the invention
Fig. 1 is that the present invention is based on the blind Enhancement Method flow charts of the bone conduction voice of codec framework and recurrent neural network.
Fig. 2 is the codec configuration diagram that the present invention uses.
Fig. 3 is non-negative sparse NS-LSTM cellular construction schematic diagram.
Fig. 4 is the blind enhancing instance graph of bone conduction voice of the present invention.
Specific embodiment
The present invention is based on the blind Enhancement Methods of the bone conduction voice of codec framework and recurrent neural network combined with Figure 1 and Figure 2, Specific implementation be divided into two stages: training stage and enhancing stage.Training stage includes Step 1: Step 2: step 3, increases The strong stage includes Step 4: step 5.Training stage and enhancing stage voice data do not repeat, i.e., no speech content is identical Sentence.
First stage is the training stage: being trained by training data to neural network model.
Step 1: conductance (Air Conduction, AC) and bone conduction (Bone Conduction, BC) voice amplitudes are extracted Spectrum signature, and data prediction is carried out to meet the input demand of neural network to the phonetic feature of extraction, it is specifically divided into following Processing stage, wherein " the unsupervised language of single channel based on low-rank and sparse matrix decomposition is made an uproar point for the first two processing stage and patent From method " the data prediction step of (CN 102915742B) is consistent, in order to reduce extraction amplitude spectrum signature dynamic model It encloses, uses log-magnitude spectrum signature, the steps include:
(1) voice data is by the same person while to wear AC and BC voice pair that AC and BC microphone apparatus are recorded, AC Voice is represented by A, and BC voice is represented by B, is respectively believed AC and BC voice time domain using Short Time Fourier Transform (STFT) Number y (A), y (B) transform to time-frequency domain, specific steps respectively are as follows:
1. carrying out framing windowing process respectively to voice time domain signal y (A), y (B), window function is Hamming window, frame length N, N is taken as 2 integral number power, and interframe movable length is H;
2. carrying out leaf transformation in K point discrete Fourier to the speech frame after framing, the time-frequency spectrum Y of voice is obtainedA(k,t)、YB(k, T), calculation formula is as follows:
Wherein, k=0,1 ..., K-1 indicate discrete point in frequency, and K indicates frequency points when discrete Fourier transform, K= N, t=0,1 ..., T-1 indicate that frame number, T are the totalframes of framing, and h (n) is Hamming window function;
(2) it takes absolute value to frequency spectrum Y (k, t), amplitude spectrum M is calculatedA、MB, calculation formula is as follows:
M (k, t)=| Y (k, t) |
(3) logarithm (ln) using e the bottom of as is taken to amplitude spectrum M (k, t), obtains log-magnitude spectrum LA、LB, calculation formula is as follows:
L (k, t)=lnM (k, t)
(4) using sparse Non-negative Matrix Factorization (Sparse Non-negative Matrix Factorization: Sparse NMF) on pure gas lead sound log-magnitude spectrum eigenmatrix calculate conductance voice dictionary D.
Step 2: the pre-training of encoder: pre-training encoder (Encoder) network is by up of three-layer (see Fig. 2): linear Layer (Linear), long short-term memory Recursive Networks layer (LSTM) and the long short-term memory recurrent neural net network layers (NS- of non-negative sparse LSTM).When training, the log-magnitude spectrum signature after normalizing (Normalization) using bone conduction voice is inputted as training, with The log-magnitude spectrum signature of conductance voice is as training objective, using time reversal propagation algorithm (Back Propagation Through Time, BPTT) training neural network model, and store trained neural network parameter, NS-LSTM neural network Cellular construction and encoder network pre-training process are as follows:
(1) the long Memory Neural Networks model in short-term of non-negative sparse is a kind of deformation of long memory models (LSTM) in short-term, is led to Cross and introduce non-negative, sparse control variable, the output vector for meeting constraint condition can be generated, component units as shown in figure 3, Following formula subrepresentation can be used:
ft=σ (WfxXt+Wfhht-1+bf)
it=σ (WixXt+Wihht-1+bi)
gt=σ (WgxXt+Wghht-1+bg)
ot=σ (WoxXt+Wohht-1+bo)
St=gt⊙it+St-1⊙ft
ht=sh(D,u)(ψ(φ(St))⊙ot)
Wherein,φ (x)=tanh (x), ψ (x)=ReLU (x)=max (0, x) are nonnegativity restrictions, sh(D,u)(x)=D (tanh (x+u)+tanh (x-u)) is sparse activation primitive;
(2) it abandons Regularization Technique: for the robustness for improving model, regularization (dropout will be abandoned Regularization) technology is applied in neural metwork training, which is to be mentioned by cutting down neural unit number to reach The effect of high generalization ability.It is p (such as p is set as 0.2) that setting, which abandons ratio, abandons regularization formula are as follows:
Wherein,Indicate the existing probability of l layers of j-th of neuron, Bernoulli (p) refers to that probability is the Bai Nu of p Benefit distribution, the distribution are that occur 1 with Probability p, occur 0 with probability 1-p,It is the output valve of l layers of j-th of neuron,It isMultiplied byValue afterwards, the i.e. value are equal toOr 0,It is network weight,It is biasing, f indicates that activation is single Member,It is to be exported by the neuron of activation primitive.
(3) training of encoder neural network: training loss objective function is network output valve and corresponding A C voice logarithm The mean square deviation of amplitude spectrum:
Wherein, c is dictionary coefficient, and M=[D, I ,-I] is conductance voice dictionary and the set for compensating dictionary, and I is diagonal element Element is the diagonal matrix that 1 remaining element is 0, plays the role of compensation in dictionary element linear combination and promoted to indicate precision. The training data of b% (such as b is set as between 10-20) collects data as verifying in training process, minimizes damage in training process Lose function, the random initial weight of network [- 0.1,0.1], specifically using stochastic gradient descent algorithm (Stochastic Gradient Descent, SGD) a kind of deformation root mean square propagation algorithm (Root Mean Square Propagation, RMSProp), learning rate initial value is set as lr (such as lr is set as 0.01), when verifying collection loss function value does not decline, learning rate Multiplied by factor ratio (such as ratio is set as 0.1), momentum is momentum (such as momentum is set as 0.9), when verifying collects Deconditioning when a trained bout of the continuous i of loss function value (such as i is set as 3) does not decline saves the loss function value of verifying collection The smallest neural network model parameter, is denoted as S '.
Step 3: codec joint training.Decoder (Decoder) structure is as shown in Fig. 2, include two-tier network knot Structure is Recursive Networks layer (LSTM) and linear layer (Linear), a respectivelyiIndicate the decoder net based on local attention mechanism Network input, formula are as follows:
ejIt is j-th of encoder output, the neighborhood output of i-th of device of N (i) presentation code output can be exported at 10-20 Value.ωijThe weighted array coefficient for indicating these neighbouring inputs, its calculation formula is:
scoreijIt is j-th of output e of encoderjTo i-th of input a of decoderiWeighted score, by normalization after To the weight of linear combination.WaIt is the parameter matrix of a linear layer, linear layer and e are passed through in the output at the (i-1)-th moment of decoderj Inner product is done, for calculating weighted score (scoreij)。
The effect of decoder is obtained by training closer to true on the synthesis speech basic by dictionary encoding Voice signal.Decoder using with optimized by the way of encoder joint training, by the period one by one optimization in the way of, with The log-magnitude spectrum latent structure mean square error loss function of actual speech signal obtains optimal network by gradient decline and joins Number, and it is stored in local, it is denoted as S.
Second stage is the enhancing stage: utilizing trained codec network model, enhances bone conduction voice.
Step 4: bone conduction phonetic feature to be reinforced is extracted, and according to the log-magnitude after step 1 alignment obtained Compose LQBData statistical characteristics, including mean valueAnd varianceCarry out data normalization:
Firstly, to BC voice data B to be reinforcedE, using Fourier transformation by voice time domain waveform convertion to time-frequency domainThe process of BC phonetic feature to be reinforced is extracted as shown in Fig. 1 strengthening part, is mentioned compared to the feature in step 1 It takes, the more phase extraction steps of the step are obtaining time domain speech frequency spectrumAfterwards, it not only needs to calculate amplitude spectrum, also It needs to calculate phase, according to time-frequency spectrumIts amplitude spectrum is calculatedAnd phaseCalculation formula are as follows:
Wherein, atan2 (x) is four-quadrant arctan function, and imag (x) and real (x) respectively represent the imaginary part of time-frequency spectrum With real part.According to amplitude spectrumLog-magnitude spectrum is calculatedThen, the BC voice logarithm obtained according to the training stage The mean value of amplitude spectrumAnd varianceIt is rightIt is normalized:
Step 5: when enhancing, step 4 is extracted using first stage trained codec neural network bone conduction Phonetic feature is converted, then carries out renormalization and feature inverse transformation to the output of neural network, is finally obtained enhanced Time domain speech.
Firstly, by after normalizationIt is input in trained encoding and decoding neural network model S, network is calculated Output, i.e., enhanced feature
Secondly, by enhanced featureRenormalization and inverse transformation are carried out, enhanced time domain language is finally obtained Sound, steps are as follows:
(1) according to the mean value of training stage AC voice log-magnitude spectrumAnd varianceBy bidirectional gate recurrent neural net The output that network obtainsRenormalization is carried out, log-magnitude spectrum is obtainedCalculation formula is as follows:
(2) by log-magnitude spectrumExponent arithmetic is carried out, amplitude spectrum is obtainedCalculation formula is as follows:
(3) amplitude spectrum is utilizedAnd phase informationTime-frequency spectrum is calculatedCalculation formula is as follows:
(4) using duplicate removal Superposition Formula after inverse Fourier transform and voice framing, by frequency spectrumWhen being transformed into Domain finally obtains enhanced time domain speech signal y (BE)。
Embodiment
Fig. 4 is figure of the embodiment of the present invention, and example voice length is about 3.5s and 4s, and speech sample frequency is 8kHz, setting Voice frame length 32ms, frame move 10ms, carry out discrete Fourier transform, frequency point number K=256, obtained log-magnitude spectrum to every frame Dimension is 129 dimensions.In Fig. 4-1 and Fig. 4-2, (a) is the sound spectrograph of bone conduction voice, (b) for using the increasing of LSTM deep neural network Voice sound spectrograph after strong is (c) the enhanced voice sound spectrograph of the present invention.It can be seen that the high-frequency signal of voice after enhancing Effective recovery is obtained with signals such as aspirant, the fricatives of missing, and compare LSTM algorithm to have preferable performance boost, In addition subjective test results also indicate that the present invention achieves good bone conduction speech enhan-cement effect.

Claims (6)

1. a kind of blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network, it is characterised in that following step It is rapid:
Data prediction: conductance AC and bone conduction BC voice amplitudes spectrum signature are extracted, the voice feature data of extraction is aligned Pretreatment, and conductance voice dictionary is calculated using sparse Non-negative Matrix Factorization on conductance voice feature data;
The pre-training of encoder: inputting using bone conduction phonetic feature as training, using conductance voice dictionary combination coefficient as training Target, use is non-negative, sparse long short-term memory recurrent neural network training encoder model, and stores trained depth nerve Network parameter, the initiation parameter as encoder in next step;
The joint training of codec: decoder model of the building based on local attention mechanism, using encoder output as solution The input of code device, using conductance phonetic feature as training objective, joint training codec models, and storage model parameter;
Speech enhan-cement: extracting bone conduction phonetic feature to be reinforced, real using encoding and decoding neural network trained in above-mentioned steps Existing Feature Conversion, then renormalization and feature inverse transformation are carried out to the output of neural network, finally obtain enhanced time domain language Sound.
2. according to the method described in claim 1, it is characterized in that extract conductance and bone conduction voice amplitudes spectrum signature, and to extraction Phonetic feature carry out data prediction to meet the input demand of neural network, wherein in order to reduce the amplitude spectrum signature of extraction Dynamic range, using log-magnitude spectrum signature:
(1) voice data is by the same person while to wear AC and BC voice pair that AC and BC microphone apparatus are recorded, AC voice It is expressed as A, BC voice is expressed as B, using Short Time Fourier Transform respectively by AC and BC voice time domain signal y (A), y (B) difference Transform to time-frequency domain, specific steps are as follows:
1. carrying out framing windowing process respectively to voice time domain signal y (A), y (B), window function is Hamming window, frame length N, and N takes For 2 integral number power, interframe movable length is H;
2. carrying out leaf transformation in K point discrete Fourier to the speech frame after framing, the time-frequency spectrum Y of voice is obtainedA(k,t)、YB(k, t), meter It is as follows to calculate formula:
Wherein, k=0,1 ..., K-1 indicate discrete point in frequency, and K indicates frequency points when discrete Fourier transform, K=N, t= 0,1 ..., T-1 indicate that frame number, T are the totalframes of framing, and h (n) is Hamming window function;
(2) it takes absolute value to frequency spectrum Y (k, t), amplitude spectrum M is calculatedA、MB, calculation formula is as follows:
M (k, t)=| Y (k, t) |
(3) logarithm (ln) using e the bottom of as is taken to amplitude spectrum M (k, t), obtains log-magnitude spectrum LA、LB, calculation formula is as follows:
L (k, t)=lnM (k, t)
(4) conductance voice word is calculated on pure gas lead sound log-magnitude spectrum eigenmatrix using sparse Non-negative Matrix Factorization Allusion quotation D.
3. according to the method described in claim 1, it is characterized in that pre-training encoder network is by three in the pre-training of encoder Layer composition: linear layer Linear, long short-term memory Recursive Networks layer LSTM and the long short-term memory recurrent neural network of non-negative sparse Layer NS-LSTM when training, is inputted using the log-magnitude spectrum signature after the normalization of bone conduction voice as training, with conductance voice Log-magnitude spectrum signature using time reversal propagation algorithm training neural network model, and is stored and is trained as training objective Neural network parameter, NS-LSTM neural network cellular construction and encoder network pre-training process be as follows:
(1) the long Memory Neural Networks model in short-term of non-negative sparse is a kind of deformation of long memory models LSTM in short-term, passes through introducing Non-negative, sparse control variable can generate the output vector for meeting constraint condition, component units following formula subrepresentation:
ft=σ (WfxXt+Wfhht-1+bf)
it=σ (WixXt+Wihht-1+bi)
gt=σ (WgxXt+Wghht-1+bg)
ot=σ (WoxXt+Wohht-1+bo)
Wherein,φ (x)=tanh (x), ψ (x)=ReLU (x)=max (0, x) are nonnegativity restrictions, sh(D,u) (x)=D (tanh (x+u)+tanh (x-u)) is sparse activation primitive;
(2) abandon Regularization Technique: it is p that setting, which abandons ratio, abandons regularization formula are as follows:
Wherein,Indicate the existing probability of l layers of j-th of neuron, Bernoulli (p) refers to the Bernoulli Jacob point that probability is p Cloth, the distribution are that occur 1 with Probability p, occur 0 with probability 1-p,It is the output valve of l layers of j-th of neuron,It isMultiplied byValue afterwards, the i.e. value are equal toOr 0,It is network weight,It is biasing, f indicates activation unit, It is to be exported by the neuron of activation primitive;
(3) training of encoder neural network: training loss objective function is network output valve and corresponding A C voice log-magnitude The mean square deviation of spectrum:
Wherein, c is dictionary coefficient, and M=[D, I ,-I] is conductance voice dictionary and the set for compensating dictionary, and I is that diagonal element is 1 The diagonal matrix that remaining element is 0 plays the role of compensation and is promoted to indicate precision in dictionary element linear combination;It trained The training data of b% is as verifying collection data in journey, minimizes loss function in training process, the random initial weight of network [- 0.1,0.1], specifically using a kind of deformation root mean square propagation algorithm of stochastic gradient descent algorithm, learning rate initial value is set For lr, when verifying collection loss function value does not decline, learning rate is multiplied by factor ratio, momentum momentum, when verifying collection damage Deconditioning when functional value continuous i trained bout does not decline is lost, the smallest neural network of loss function value of verifying collection is saved Model parameter is denoted as S '.
4. according to the method described in claim 1, it is characterized in that decoder includes two-tier network in codec joint training Structure is Recursive Networks layer LSTM and linear layer Linear, a respectivelyiIndicate the decoder network based on local attention mechanism Input, formula are as follows:
ejIt is j-th of encoder output, the neighborhood output of i-th of device of N (i) presentation code output, ωijIndicate these neighbouring inputs Weighted array coefficient, its calculation formula is:
scoreijIt is j-th of output e of encoderjTo i-th of input a of decoderiWeighted score, by normalization after obtain line Property combination weight, WaIt is the parameter matrix of a linear layer, linear layer and e are passed through in the output at the (i-1)-th moment of decoderjIn doing Product, for calculating weighted score (scoreij);
The effect of decoder is obtained by training closer to true voice on the synthesis speech basic by dictionary encoding Signal;Decoder using with optimized by the way of encoder joint training, by the period one by one optimization in the way of, with true The log-magnitude spectrum latent structure mean square error loss function of voice signal obtains optimal network parameter by gradient decline, And it is stored in local, it is denoted as S.
5. according to the method described in claim 1, it is characterized in that extract bone conduction phonetic feature to be reinforced, and according to acquisition Log-magnitude spectrum LQ after alignmentBData statistical characteristics, including mean valueAnd varianceCarry out data normalization:
Firstly, to BC voice data B to be reinforcedE, using Fourier transformation by voice time domain waveform convertion to time-frequency domain The process of BC phonetic feature to be reinforced is extracted on the basis of feature extraction to phase extraction, that is, is obtaining time domain speech frequency spectrumAfterwards, amplitude spectrum is not only calculated, phase is also calculated, according to time-frequency spectrumIts amplitude spectrum is calculatedAnd phase PositionCalculation formula are as follows:
Wherein, atan2 (x) is four-quadrant arctan function, and imag (x) and real (x) respectively represent the imaginary part and reality of time-frequency spectrum Portion, according to amplitude spectrumLog-magnitude spectrum is calculatedThen, the BC voice log-magnitude obtained according to the training stage The mean value of spectrumAnd varianceIt is rightIt is normalized:
6. according to the method described in claim 1, it is characterized in that utilizing first stage trained codec mind when enhancing It is converted through bone conduction phonetic feature of the network to extraction, then renormalization and feature inversion is carried out to the output of neural network It changes, finally obtains enhanced time domain speech:
Firstly, by after normalizationIt is input in trained encoding and decoding neural network model S, network output is calculated, I.e. enhanced feature
Secondly, by enhanced featureRenormalization and inverse transformation are carried out, enhanced time domain speech is finally obtained, is walked It is rapid as follows:
(1) according to the mean value of training stage AC voice log-magnitude spectrumAnd varianceBidirectional gate recurrent neural network is obtained The output arrivedRenormalization is carried out, log-magnitude spectrum is obtainedCalculation formula is as follows:
(2) by log-magnitude spectrumExponent arithmetic is carried out, amplitude spectrum is obtainedCalculation formula is as follows:
(3) amplitude spectrum is utilizedAnd phase informationTime-frequency spectrum is calculatedCalculation formula is as follows:
(4) using duplicate removal Superposition Formula after inverse Fourier transform and voice framing, by frequency spectrumIt is transformed into time domain, most Enhanced time domain speech signal y (B is obtained eventuallyE)。
CN201810960512.0A 2018-08-22 2018-08-22 Bone conduction voice blind enhancement method based on codec framework and recurrent neural network Active CN108986834B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810960512.0A CN108986834B (en) 2018-08-22 2018-08-22 Bone conduction voice blind enhancement method based on codec framework and recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810960512.0A CN108986834B (en) 2018-08-22 2018-08-22 Bone conduction voice blind enhancement method based on codec framework and recurrent neural network

Publications (2)

Publication Number Publication Date
CN108986834A true CN108986834A (en) 2018-12-11
CN108986834B CN108986834B (en) 2023-04-07

Family

ID=64547287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810960512.0A Active CN108986834B (en) 2018-08-22 2018-08-22 Bone conduction voice blind enhancement method based on codec framework and recurrent neural network

Country Status (1)

Country Link
CN (1) CN108986834B (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109793511A (en) * 2019-01-16 2019-05-24 成都蓝景信息技术有限公司 Electrocardiosignal noise detection algorithm based on depth learning technology
CN109975702A (en) * 2019-03-22 2019-07-05 华南理工大学 A kind of DC gear decelerating motor product examine method based on recirculating network disaggregated model
CN110085249A (en) * 2019-05-09 2019-08-02 南京工程学院 The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate
CN110111803A (en) * 2019-05-09 2019-08-09 南京工程学院 Based on the transfer learning sound enhancement method from attention multicore Largest Mean difference
CN110136731A (en) * 2019-05-13 2019-08-16 天津大学 Empty cause and effect convolution generates the confrontation blind Enhancement Method of network end-to-end bone conduction voice
CN110164465A (en) * 2019-05-15 2019-08-23 上海大学 A kind of sound enhancement method and device based on deep layer Recognition with Recurrent Neural Network
CN110648684A (en) * 2019-07-02 2020-01-03 中国人民解放军陆军工程大学 Bone conduction voice enhancement waveform generation method based on WaveNet
CN110675888A (en) * 2019-09-25 2020-01-10 电子科技大学 Speech enhancement method based on RefineNet and evaluation loss
CN110808063A (en) * 2019-11-29 2020-02-18 北京搜狗科技发展有限公司 Voice processing method and device for processing voice
CN110867192A (en) * 2019-10-23 2020-03-06 北京计算机技术及应用研究所 Speech enhancement method based on gated cyclic coding and decoding network
CN110931031A (en) * 2019-10-09 2020-03-27 大象声科(深圳)科技有限公司 Deep learning voice extraction and noise reduction method fusing bone vibration sensor and microphone signals
CN111242976A (en) * 2020-01-08 2020-06-05 北京天睿空间科技股份有限公司 Aircraft detection tracking method using attention mechanism
CN111833843A (en) * 2020-07-21 2020-10-27 苏州思必驰信息科技有限公司 Speech synthesis method and system
CN111899757A (en) * 2020-09-29 2020-11-06 南京蕴智科技有限公司 Single-channel voice separation method and system for target speaker extraction
CN112185405A (en) * 2020-09-10 2021-01-05 中国科学技术大学 Bone conduction speech enhancement method based on differential operation and joint dictionary learning
WO2021012403A1 (en) * 2019-07-25 2021-01-28 华南理工大学 Dual sensor speech enhancement method and implementation device
CN112562704A (en) * 2020-11-17 2021-03-26 中国人民解放军陆军工程大学 BLSTM-based frequency division spectrum expansion anti-noise voice conversion method
WO2021068120A1 (en) * 2019-10-09 2021-04-15 大象声科(深圳)科技有限公司 Deep learning speech extraction and noise reduction method fusing signals of bone vibration sensor and microphone
CN112786064A (en) * 2020-12-30 2021-05-11 西北工业大学 End-to-end bone-qi-conduction speech joint enhancement method
WO2021159772A1 (en) * 2020-02-10 2021-08-19 腾讯科技(深圳)有限公司 Speech enhancement method and apparatus, electronic device, and computer readable storage medium
CN113642714A (en) * 2021-08-27 2021-11-12 国网湖南省电力有限公司 Insulator pollution discharge state identification method and system based on small sample learning
CN114495909A (en) * 2022-02-20 2022-05-13 西北工业大学 End-to-end bone-qi-guide voice joint identification method
CN115033734A (en) * 2022-08-11 2022-09-09 腾讯科技(深圳)有限公司 Audio data processing method and device, computer equipment and storage medium
WO2023102930A1 (en) * 2021-12-10 2023-06-15 清华大学深圳国际研究生院 Speech enhancement method, electronic device, program product, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003264883A (en) * 2002-03-08 2003-09-19 Denso Corp Voice processing apparatus and voice processing method
CN102915742A (en) * 2012-10-30 2013-02-06 中国人民解放军理工大学 Single-channel monitor-free voice and noise separating method based on low-rank and sparse matrix decomposition
CN103559888A (en) * 2013-11-07 2014-02-05 航空电子系统综合技术重点实验室 Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle
CN106030705A (en) * 2014-02-27 2016-10-12 高通股份有限公司 Systems and methods for speaker dictionary based speech modeling
CN107886967A (en) * 2017-11-18 2018-04-06 中国人民解放军陆军工程大学 A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
US20180166066A1 (en) * 2016-12-14 2018-06-14 International Business Machines Corporation Using long short-term memory recurrent neural network for speaker diarization segmentation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003264883A (en) * 2002-03-08 2003-09-19 Denso Corp Voice processing apparatus and voice processing method
CN102915742A (en) * 2012-10-30 2013-02-06 中国人民解放军理工大学 Single-channel monitor-free voice and noise separating method based on low-rank and sparse matrix decomposition
CN103559888A (en) * 2013-11-07 2014-02-05 航空电子系统综合技术重点实验室 Speech enhancement method based on non-negative low-rank and sparse matrix decomposition principle
CN106030705A (en) * 2014-02-27 2016-10-12 高通股份有限公司 Systems and methods for speaker dictionary based speech modeling
US20180166066A1 (en) * 2016-12-14 2018-06-14 International Business Machines Corporation Using long short-term memory recurrent neural network for speaker diarization segmentation
CN107886967A (en) * 2017-11-18 2018-04-06 中国人民解放军陆军工程大学 A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘斌 等: "联合长短时记忆递归神经网络和非负矩阵分解的语音混响消除方法", 《信号处理》 *

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109793511A (en) * 2019-01-16 2019-05-24 成都蓝景信息技术有限公司 Electrocardiosignal noise detection algorithm based on depth learning technology
CN109975702A (en) * 2019-03-22 2019-07-05 华南理工大学 A kind of DC gear decelerating motor product examine method based on recirculating network disaggregated model
CN109975702B (en) * 2019-03-22 2021-08-10 华南理工大学 Direct-current gear reduction motor quality inspection method based on circulation network classification model
CN110085249A (en) * 2019-05-09 2019-08-02 南京工程学院 The single-channel voice Enhancement Method of Recognition with Recurrent Neural Network based on attention gate
CN110111803A (en) * 2019-05-09 2019-08-09 南京工程学院 Based on the transfer learning sound enhancement method from attention multicore Largest Mean difference
CN110136731A (en) * 2019-05-13 2019-08-16 天津大学 Empty cause and effect convolution generates the confrontation blind Enhancement Method of network end-to-end bone conduction voice
CN110164465A (en) * 2019-05-15 2019-08-23 上海大学 A kind of sound enhancement method and device based on deep layer Recognition with Recurrent Neural Network
CN110164465B (en) * 2019-05-15 2021-06-29 上海大学 Deep-circulation neural network-based voice enhancement method and device
CN110648684A (en) * 2019-07-02 2020-01-03 中国人民解放军陆军工程大学 Bone conduction voice enhancement waveform generation method based on WaveNet
WO2021012403A1 (en) * 2019-07-25 2021-01-28 华南理工大学 Dual sensor speech enhancement method and implementation device
CN110675888A (en) * 2019-09-25 2020-01-10 电子科技大学 Speech enhancement method based on RefineNet and evaluation loss
KR102429152B1 (en) * 2019-10-09 2022-08-03 엘레복 테크놀로지 컴퍼니 리미티드 Deep learning voice extraction and noise reduction method by fusion of bone vibration sensor and microphone signal
CN110931031A (en) * 2019-10-09 2020-03-27 大象声科(深圳)科技有限公司 Deep learning voice extraction and noise reduction method fusing bone vibration sensor and microphone signals
JP2022505997A (en) * 2019-10-09 2022-01-17 大象声科(深セン)科技有限公司 Deep learning voice extraction and noise reduction method that fuses bone vibration sensor and microphone signal
KR20210043485A (en) * 2019-10-09 2021-04-21 엘레복 테크놀로지 컴퍼니 리미티드 Deep learning speech extraction and noise reduction method that combines bone vibration sensor and microphone signal
WO2021068120A1 (en) * 2019-10-09 2021-04-15 大象声科(深圳)科技有限公司 Deep learning speech extraction and noise reduction method fusing signals of bone vibration sensor and microphone
CN110867192A (en) * 2019-10-23 2020-03-06 北京计算机技术及应用研究所 Speech enhancement method based on gated cyclic coding and decoding network
CN110808063A (en) * 2019-11-29 2020-02-18 北京搜狗科技发展有限公司 Voice processing method and device for processing voice
CN111242976A (en) * 2020-01-08 2020-06-05 北京天睿空间科技股份有限公司 Aircraft detection tracking method using attention mechanism
WO2021159772A1 (en) * 2020-02-10 2021-08-19 腾讯科技(深圳)有限公司 Speech enhancement method and apparatus, electronic device, and computer readable storage medium
US11842722B2 (en) 2020-07-21 2023-12-12 Ai Speech Co., Ltd. Speech synthesis method and system
CN111833843A (en) * 2020-07-21 2020-10-27 苏州思必驰信息科技有限公司 Speech synthesis method and system
CN112185405B (en) * 2020-09-10 2024-02-09 中国科学技术大学 Bone conduction voice enhancement method based on differential operation and combined dictionary learning
CN112185405A (en) * 2020-09-10 2021-01-05 中国科学技术大学 Bone conduction speech enhancement method based on differential operation and joint dictionary learning
CN111899757B (en) * 2020-09-29 2021-01-12 南京蕴智科技有限公司 Single-channel voice separation method and system for target speaker extraction
CN111899757A (en) * 2020-09-29 2020-11-06 南京蕴智科技有限公司 Single-channel voice separation method and system for target speaker extraction
CN112562704B (en) * 2020-11-17 2023-08-18 中国人民解放军陆军工程大学 Frequency division topological anti-noise voice conversion method based on BLSTM
CN112562704A (en) * 2020-11-17 2021-03-26 中国人民解放军陆军工程大学 BLSTM-based frequency division spectrum expansion anti-noise voice conversion method
CN112786064B (en) * 2020-12-30 2023-09-08 西北工业大学 End-to-end bone qi conduction voice joint enhancement method
CN112786064A (en) * 2020-12-30 2021-05-11 西北工业大学 End-to-end bone-qi-conduction speech joint enhancement method
CN113642714A (en) * 2021-08-27 2021-11-12 国网湖南省电力有限公司 Insulator pollution discharge state identification method and system based on small sample learning
CN113642714B (en) * 2021-08-27 2024-02-09 国网湖南省电力有限公司 Insulator pollution discharge state identification method and system based on small sample learning
WO2023102930A1 (en) * 2021-12-10 2023-06-15 清华大学深圳国际研究生院 Speech enhancement method, electronic device, program product, and storage medium
CN114495909A (en) * 2022-02-20 2022-05-13 西北工业大学 End-to-end bone-qi-guide voice joint identification method
CN114495909B (en) * 2022-02-20 2024-04-30 西北工业大学 End-to-end bone-qi guiding voice joint recognition method
CN115033734A (en) * 2022-08-11 2022-09-09 腾讯科技(深圳)有限公司 Audio data processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN108986834B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN108986834A (en) The blind Enhancement Method of bone conduction voice based on codec framework and recurrent neural network
CN107886967B (en) A kind of bone conduction sound enhancement method of depth bidirectional gate recurrent neural network
Dave Feature extraction methods LPC, PLP and MFCC in speech recognition
CN103531205B (en) The asymmetrical voice conversion method mapped based on deep neural network feature
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN105070293B (en) Audio bandwidth expansion coding-decoding method based on deep neural network and device
CN103065629A (en) Speech recognition system of humanoid robot
CN110085245B (en) Voice definition enhancing method based on acoustic feature conversion
CN110379412A (en) Method, apparatus, electronic equipment and the computer readable storage medium of speech processes
CN102800316A (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN105023580A (en) Unsupervised noise estimation and speech enhancement method based on separable deep automatic encoding technology
CN110648684B (en) Bone conduction voice enhancement waveform generation method based on WaveNet
CN110867192A (en) Speech enhancement method based on gated cyclic coding and decoding network
Lavrynenko et al. Method of voice control functions of the UAV
CN106024010A (en) Speech signal dynamic characteristic extraction method based on formant curves
CN102237083A (en) Portable interpretation system based on WinCE platform and language recognition method thereof
CN105448302A (en) Environment adaptive type voice reverberation elimination method and system
Chetouani et al. Investigation on LP-residual representations for speaker identification
Shah et al. Novel MMSE DiscoGAN for cross-domain whisper-to-speech conversion
CN111326170A (en) Method and device for converting ear voice into normal voice by combining time-frequency domain expansion convolution
CN106875944A (en) A kind of system of Voice command home intelligent terminal
Brucal et al. Female voice recognition using artificial neural networks and MATLAB voicebox toolbox
CN109215635B (en) Broadband voice frequency spectrum gradient characteristic parameter reconstruction method for voice definition enhancement
CN103854655A (en) Low-bit-rate voice coder and decoder
CN103886859A (en) Voice conversion method based on one-to-many codebook mapping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant