Automatic starting control method, device, electronic equipment and the storage medium of earphone
Technical field
The present invention relates to field of speech recognition, especially a kind of automatic starting control method of earphone, device, electronic equipment
And storage medium.
Background technique
During daily fire-fighting operation, it is one that rescue personnel and security personnel exchange in time with the information of command system
A problem is proposed in the prior art using bone conduction earphone and is solved between command system and rescue personnel and security personnel
In time the problem of exchange.Due to the characteristic of bone conduction earphone, it is driven sound using the vibration signal of face's bone transmitting, so
It can hear sound more clear than general earphone in a noisy environment, and due to the wearing mode of bone conduction earphone,
Ear-drum will not be blocked and receive external sound wave, so will not influence user hears the sound from surrounding, enabled a user to
It is enough rapidly to be judged in the scene of a fire according to the environmental change of surrounding, it avoids coming to harm.Existing many fire-fighting intercommunications
Equipment, voice call function could be opened by needing user to press after PTT key, in some special occasions, be made
User can not liberate the unusual inconvenience that just seems when pressing PTT key of selling;And the prior art, which also discloses, passes through language
Justice identification is uttered a word by identification user to control the technical solution that earphone starts and automatically wakes up earphone unlatching work, will made
User what is said or talked about automatically issue, thus preferably liberate user both hands.But the bone conduction earphone for having semantics recognition
For, bone conduction earphone includes osteoacusis loudspeaker and bone-conduction microphone, although bone conduction earphone has the sound wave in air
Good noise reduction effect, when the bone-conduction microphone of bone conduction earphone is in use because external force touching headset body causes to shake
The raw irregular frequency vibration of movable property, to also can frequently start the hair of earphone after making bone-conduction microphone receive noise
Module is penetrated, causes unnecessary kwh loss, and has gone out outside voice to be also mingled in the sound that passes out of bone conduction earphone and make an uproar
Sound prevents the reciever of sound from hearing that clearly voice or even noise completely cover original voice, so as to cause cannot
Normally judge the information passed over, instruction cannot be assigned in time and cannot link up in time, causes to miss rescuing machine meeting.
Summary of the invention
Present invention aims at above-mentioned technical problem is solved, the automatic starting control method, device, electricity of a kind of earphone are provided
Sub- equipment and storage medium.Method, apparatus, electronic equipment and storage medium of the invention is based on sense of hearing field in the prior art
Scape analysis theories CASA and AI deep learning technology helps bone-conduction microphone to avoid generating because its shell is by contact-impact
Noise and frequent automatic starting, generate unnecessary kwh loss.
In order to achieve the above object, technical solution of the present invention has:
A kind of automatic starting control method of earphone, the method are based on auditory scene analysis theory CASA and deep learning
Technology the described method comprises the following steps:
Receive original sound signal input;
The separate section voice signal from original sound signal;
Based on the neural network model that noise training obtains, will learn in the partial sound signal and neural network model
The noise signal comparison crossed judges that same or similar label is, otherwise label is;
Only after recognizing the partial sound signal labeled as voice, starting sending function will be labeled as the part sound of voice
Sound signal issues automatically.
A kind of automatic starting control method of earphone according to the present invention avoids bone by the processing to voice signal
The sound for the cover attack that conduction microphone is included not will start the sending function of earphone, save earphone electricity, increase user
Activity time at fire-fighting scene, the personnel that can preferably dredge carry out the work such as evacuating.Using traditional bone conduction earphone sound
Sound signal processing technique carries out signal processing to the sound that bone-conduction microphone is included, and theoretical based on auditory scene analysis
CASA and deep learning technology and semantics recognition technology, propose it is a kind of for bone-conduction microphone distinctive earphone from
Dynamic starting control method;Bone-conduction microphone itself will not be interfered by external sound, be had strong decrease of noise functions, but for bone
Conduction earphone is appointed with generated noise after extraneous contact or collision can so be included by bone-conduction microphone, if by noise
Also it is inconjunction with and sends, can make the personnel for receiving sound that can hear the information transmitted in sound, cause mistake therein
Key message and the live state of affairs is judged by accident, assign mistake instruction, and because accidentally touching shell generate noise can also lead
The frequent starting for causing communication module, leads to the kwh loss of bone conduction earphone;Based on auditory scene analysis theory CASA and depth
The neural network model that the long duration noise training of learning art obtains, takes the middle partial sound for the original sound signal included to believe
Number, being compared with the noise signal that learnt in neural network model in batches, if it is determined that the partial sound signal and nerve
The noise signal once learnt in network model is same or similar seemingly, is just labeled as noise, will be labeled as the portion of noise later
Voice signal is divided to make inhibition or filtration treatment, conversely, recognizing by semantics recognition technology labeled as voice labeled as voice
Partial sound signal after, will be issued automatically labeled as the partial sound signal of voice, recognize the part sound labeled as noise
Sound signal does not start voice sending function then.
Further, the method also includes:
Inhibit or filter out the partial sound signal labeled as noise.
Further, the method also includes:
The original sound signal is received voice signal in COMPLEX MIXED sound source.
Further, the method also includes:
The partial sound signal is the voice signal of the single sound source separated from COMPLEX MIXED sound source.
Contain the voice signal of noise in the voice signal of the COMPLEX MIXED sound source received, and by its according to sound source into
Row separation, then to carry out the noise signal for judging once to learn in the partial sound signal and neural network model same or similar
Seemingly, so that more convenient realization is to the label of voice signal.
Further, the method also includes:
The voice signal of the single sound source is inputted in the function of the neural network model, judging result is obtained, sentences
The same or similar label that breaks is, otherwise label is.
The method also includes:
Similarity threshold is set, the partial sound signal is believed with the noise learnt in neural network model in batches
Number comparison, obtains the similarity of partial sound signal and noise signal, similarity is greater than similarity threshold and is then judged as noise, instead
Be then judged as voice.
Further, described to recognize starting bluetooth module after the partial sound signal labeled as voice and mark as people
The partial sound signal of sound issues automatically.
A kind of automatic starting control device of earphone, comprising:
Receiving module, for receiving original sound signal input;
Separation module, for the separate section voice signal from original sound signal;
Contrast module, for learning in the neural network model by obtaining the partial sound signal with noise training
The noise signal comparison crossed judges that same or similar label is, otherwise label is;
Judge sending module, only after recognizing the partial sound signal labeled as voice, starting sending function will be marked
It is issued automatically for the partial sound signal of voice.
A kind of electronic equipment comprising processor, storage medium and computer program, the computer program are stored in
In storage medium, the computer program realizes the automatic starting control method of above-mentioned earphone when being executed by processor.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
The automatic starting control method of above-mentioned earphone is realized when row.
Detailed description of the invention
Fig. 1 is the flow chart of the embodiment one of the automatic starting control method of earphone of the invention;
Fig. 2 is the flow chart of the embodiment two of the automatic starting control method of earphone of the invention.
Specific embodiment
Automatic starting control method, device, electronic equipment and the storage that a kind of earphone of the invention is described with reference to the drawings are situated between
Matter is described in detail, protection scope of the present invention to be explained and illustrated.
Embodiment one
In conjunction with Fig. 1, a kind of automatic starting control method of earphone, the method be based on auditory scene analysis theory CASA and
Deep learning technology the described method comprises the following steps:
Receive original sound signal input.
Bone conduction earphone includes bone-conduction microphone, and after wearing the bone conduction earphone, bone-conduction microphone can
Sound is included in the vibration of face bone when being spoken according to human body, and is processed into original sound letter according to classical signal processing method
Number.Auditory scene analysis theory CASA is based on sound processing techniques, especially Harbin Institute of Technology are ground in the prior art
The auditory system of the front end processing techniques for the sound based on Computational auditory scene analysis (CASA) studied carefully, people can be in noise circumstance
Middle differentiation simultaneously tracks oneself interested voice signal, content required for " capable of listening to " muli-sounds exist simultaneously.
Auditory scene analysis (CASA) is exactly the theory proposed in this auditory physiology phenomenon.The neural sense of hearing system of CASA simulation human ear
System, to the processing of voice signal closer to people to the Auditory Perception process of mixed sound signal.Therefore can be used to noise
It is separated from voice signal, obtains purer voice signal, before being added one actually in speech recognition process
End processing, to reach the accuracy rate for improving Noise speech recognition.It is that selection is closed using the emphasis that CASA carries out speech enhan-cement
Suitable feature separates target voice and background noise, and available feature includes language spectrum energy, gene frequency and channel cross-correlation
Characteristic threshold value.
Deep learning technology is a kind of based on the method for carrying out representative learning to data in machine learning.Observation (such as
The frequency of noise sound wave, amplitude variation) various ways can be used to indicate, such as the vector of each pixel intensity value, Huo Zhegeng
Abstractively it is expressed as a series of sides, region of specific shape etc..By the noise of auditory scene analysis theory CASA front-end processing,
By deep learning technology, the treatment process of computer mould apery class audible signal is modeled, and admission bone-conduction microphone exists
The noise that can be contacted in actual environment, integrates noise, specific by carrying out long-time study and comparison formation one
Neural network model.
By auditory scene analysis theory CASA by that can be generated in environment locating for analog acquisition to bone-conduction microphone
The various noises in addition to voice the electronic audio frequency number for study is transformed by traditional audio data processing mode
According to the electronic audio data using these noises more than deep learning technology is learnt, and the function of formation is exactly corresponding nerve
Network model, by writing for program, by function write-in program, when needing to judge noise, input needs the sound judged
The result of judgement can be obtained by functional operation in data.
The separate section voice signal from original sound signal.
By original sound signal according to certain law separation at multiple portions voice signal, the comparison of noise is carried out respectively
And judgement.
Based on the neural network model that noise training obtains, will learn in the partial sound signal and neural network model
The noise signal comparison crossed judges that same or similar label is, otherwise label is.
After recognizing the partial sound signal labeled as voice, it will be issued automatically labeled as the partial sound signal of voice;
After recognizing the partial sound signal labeled as noise, the voice sending function of earphone is not started.
The automatic starting control method of a kind of earphone according to the present invention, at traditional bone conduction earphone voice signal
Reason technology carries out signal processing to the sound that bone-conduction microphone is included, and is based on auditory scene analysis theory CASA and depth
Learning art and semantics recognition technology are spent, a kind of automatic starting control for the distinctive earphone of bone-conduction microphone is proposed
Method processed;Bone-conduction microphone itself will not be interfered by external sound, be had strong decrease of noise functions, but for bone conduction earphone
Noise caused by after the contact with extraneous or collision is appointed can so be included by bone-conduction microphone, if noise is also inconjunction with
It sends, can make the personnel for receiving sound that can hear the information transmitted in sound, lead to mistake key message therein
And the live state of affairs is judged by accident, the instruction of mistake is assigned, and because the noise that accidentally touching shell generates also results in communication mould
The frequent starting of block leads to the kwh loss of bone conduction earphone;Based on auditory scene analysis theory CASA and deep learning technology
The obtained neural network model of long duration noise training, take the middle part voice signal for the original sound signal included, in batches
Compared with the noise signal that learnt in neural network model, if it is determined that in the partial sound signal and neural network model
The noise signal once learnt is same or similar seemingly, is just labeled as noise, will be labeled as the partial sound signal of noise later
Make inhibition or filtration treatment, conversely, recognizing the partial sound labeled as voice by semantics recognition technology labeled as voice
After signal, it will be issued automatically labeled as the partial sound signal of voice, recognize the partial sound signal labeled as noise, then not
Start voice sending function.
The method also includes:
It is described to recognize starting bluetooth module after the partial sound signal labeled as voice and be labeled as the part sound of voice
Sound signal issues automatically.The bone conduction earphone is connected to other blue-tooth devices, such as shoulder miaow intercom by bluetooth module,
It is sent to remote command end by intercom, or remote command end is directly sent to by bone conduction earphone.
The method also includes:
Inhibit or filter out the partial sound signal labeled as noise.It can be realized the elimination earphone of bone conduction earphone
The noise function of being generated after shell collision.
The method also includes:
Similarity threshold is set, the partial sound signal is believed with the noise learnt in neural network model in batches
Number comparison, obtains the similarity of partial sound signal and noise signal, similarity is greater than similarity threshold and is then judged as noise, instead
Be then judged as voice.
Embodiment two
As shown in Fig. 2, on the basis of the automatic starting control method of the earphone described in embodiment one, it is further clear
Its original sound signal received is received voice signal in COMPLEX MIXED sound source.Specific step is as follows:
A kind of automatic starting control method of earphone, the method are based on auditory scene analysis theory CASA and deep learning
Technology the described method comprises the following steps:
Receive the voice signal input of COMPLEX MIXED sound source;
The voice signal of the single sound source separated from the voice signal of COMPLEX MIXED sound source;
Based on the obtained neural network model of noise training, by the voice signal of the single sound source in batches with nerve net
The noise signal comparison learnt in network model judges that same or similar label is, otherwise label is;
After recognizing the partial sound signal labeled as voice, it will be issued automatically labeled as the partial sound signal of voice;
After recognizing the partial sound signal labeled as noise, the voice sending function of earphone is not started.
Neural network model is formed by long-time training, and corresponding algorithm is all the training by up to ten thousand hours,
So the algorithm has stronger robustness, it is not only restricted to Sounnd source direction, when passing through long by extracting different noise source study
Between study comparison, separate voice and ring noise in real time, inhibit stable state and dynamic noise, can accurately differentiate different sound sources
Sound is noise or voice, and makes corresponding label, and subsequent execution is facilitated to correspond to step.
That is, using auditory scene analysis theory CASA and deep learning technology, based on obtaining osteoacusis in advance
Then the various noises in addition to voice that can be generated in environment locating for microphone collect these noises and carry out deep learning,
The neural network model of formation, then the voice signal of the single sound source separated in the voice signal of COMPLEX MIXED sound source is input to
In the function of this neural network model, judging result is obtained, judge that same or similar label is, otherwise label is people
Sound.
Embodiment three
A kind of automatic starting control device of earphone, comprising:
Receiving module, for receiving original sound signal input;
Separation module, for the separate section voice signal from original sound signal;
Contrast module, for learning in the neural network model by obtaining the partial sound signal with noise training
The noise signal comparison crossed judges that same or similar label is, otherwise label is;
Judge sending module, only after recognizing the partial sound signal labeled as voice, starting sending function will be marked
It is issued automatically for the partial sound signal of voice.
Example IV
A kind of electronic equipment comprising processor, storage medium and computer program, the computer program are stored in
In storage medium, the computer program realizes the automatic starting control method of above-mentioned earphone when being executed by processor.It calculates
The quantity of processor can be one or more in machine equipment;Processor, memory, input unit and output in electronic equipment
Device can be connected by bus or other modes.
Embodiment five
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor
The automatic starting control method of above-mentioned earphone is realized when row.This method includes ear described in above-described embodiment one to embodiment two
The automatic starting control method of machine.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
The method operation that executable instruction is not limited to the described above can also be performed provided by any embodiment of the invention based on earphone
Automatic starting control method in relevant operation.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions use so that an electronic equipment
(can be mobile phone, personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.
It is worth noting that, in the embodiment of the automatic starting control device of above-mentioned earphone, included each unit and
Module is only divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized
?;In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection being not intended to restrict the invention
Range.
According to the disclosure and teachings of the above specification, those skilled in the art in the invention can also be to above-mentioned embodiment party
Formula is changed and is modified.Therefore, the invention is not limited to the specific embodiments disclosed and described above, to of the invention
Some modifications and changes should also be as falling into the scope of the claims of the present invention.In addition, although being used in this specification
Some specific terms, these terms are merely for convenience of description, does not limit the present invention in any way.