CN110191387A

CN110191387A - Automatic starting control method, device, electronic equipment and the storage medium of earphone

Info

Publication number: CN110191387A
Application number: CN201910470506.1A
Authority: CN
Inventors: 廖春生; 吕凯; 胡峰; 苏纯剑
Original assignee: Guangdong Globeez Fire Fighting Technology Co ltd; Shenzhen Rongsheng Intelligent Equipment Co ltd
Current assignee: Guangdong Globeez Fire Fighting Technology Co ltd; Shenzhen Rongsheng Intelligent Equipment Co ltd
Priority date: 2019-05-31
Filing date: 2019-05-31
Publication date: 2019-08-30

Abstract

The present invention provides a kind of automatic starting control method of earphone, the method is based on auditory scene analysis theory CASA and depth learning technology, the described method comprises the following steps: receiving original sound signal input；Multiple portions voice signal is separated into from original sound signal；Based on the neural network model that noise training obtains, the partial sound signal is compared with the noise signal learnt in neural network model in batches, judges that same or similar label is, otherwise label is；Only after recognizing the partial sound signal labeled as voice, starting sending function will issue automatically labeled as the partial sound signal of voice.The present invention is based on auditory scene analysis theory CASA and AI depth learning technologies in the prior art, help bone-conduction microphone to avoid the noise generated because its shell is by contact-impact and frequently start automatically, generate unnecessary kwh loss.

Description

Automatic starting control method, device, electronic equipment and the storage medium of earphone

Technical field

The present invention relates to field of speech recognition, especially a kind of automatic starting control method of earphone, device, electronic equipment And storage medium.

Background technique

During daily fire-fighting operation, it is one that rescue personnel and security personnel exchange in time with the information of command system A problem is proposed in the prior art using bone conduction earphone and is solved between command system and rescue personnel and security personnel In time the problem of exchange.Due to the characteristic of bone conduction earphone, it is driven sound using the vibration signal of face's bone transmitting, so It can hear sound more clear than general earphone in a noisy environment, and due to the wearing mode of bone conduction earphone, Ear-drum will not be blocked and receive external sound wave, so will not influence user hears the sound from surrounding, enabled a user to It is enough rapidly to be judged in the scene of a fire according to the environmental change of surrounding, it avoids coming to harm.Existing many fire-fighting intercommunications Equipment, voice call function could be opened by needing user to press after PTT key, in some special occasions, be made User can not liberate the unusual inconvenience that just seems when pressing PTT key of selling；And the prior art, which also discloses, passes through language Justice identification is uttered a word by identification user to control the technical solution that earphone starts and automatically wakes up earphone unlatching work, will made User what is said or talked about automatically issue, thus preferably liberate user both hands.But the bone conduction earphone for having semantics recognition For, bone conduction earphone includes osteoacusis loudspeaker and bone-conduction microphone, although bone conduction earphone has the sound wave in air Good noise reduction effect, when the bone-conduction microphone of bone conduction earphone is in use because external force touching headset body causes to shake The raw irregular frequency vibration of movable property, to also can frequently start the hair of earphone after making bone-conduction microphone receive noise Module is penetrated, causes unnecessary kwh loss, and has gone out outside voice to be also mingled in the sound that passes out of bone conduction earphone and make an uproar Sound prevents the reciever of sound from hearing that clearly voice or even noise completely cover original voice, so as to cause cannot Normally judge the information passed over, instruction cannot be assigned in time and cannot link up in time, causes to miss rescuing machine meeting.

Summary of the invention

Present invention aims at above-mentioned technical problem is solved, the automatic starting control method, device, electricity of a kind of earphone are provided Sub- equipment and storage medium.Method, apparatus, electronic equipment and storage medium of the invention is based on sense of hearing field in the prior art Scape analysis theories CASA and AI deep learning technology helps bone-conduction microphone to avoid generating because its shell is by contact-impact Noise and frequent automatic starting, generate unnecessary kwh loss.

In order to achieve the above object, technical solution of the present invention has:

A kind of automatic starting control method of earphone, the method are based on auditory scene analysis theory CASA and deep learning Technology the described method comprises the following steps:

Receive original sound signal input；

The separate section voice signal from original sound signal；

Based on the neural network model that noise training obtains, will learn in the partial sound signal and neural network model The noise signal comparison crossed judges that same or similar label is, otherwise label is；

Only after recognizing the partial sound signal labeled as voice, starting sending function will be labeled as the part sound of voice Sound signal issues automatically.

A kind of automatic starting control method of earphone according to the present invention avoids bone by the processing to voice signal The sound for the cover attack that conduction microphone is included not will start the sending function of earphone, save earphone electricity, increase user Activity time at fire-fighting scene, the personnel that can preferably dredge carry out the work such as evacuating.Using traditional bone conduction earphone sound Sound signal processing technique carries out signal processing to the sound that bone-conduction microphone is included, and theoretical based on auditory scene analysis CASA and deep learning technology and semantics recognition technology, propose it is a kind of for bone-conduction microphone distinctive earphone from Dynamic starting control method；Bone-conduction microphone itself will not be interfered by external sound, be had strong decrease of noise functions, but for bone Conduction earphone is appointed with generated noise after extraneous contact or collision can so be included by bone-conduction microphone, if by noise Also it is inconjunction with and sends, can make the personnel for receiving sound that can hear the information transmitted in sound, cause mistake therein Key message and the live state of affairs is judged by accident, assign mistake instruction, and because accidentally touching shell generate noise can also lead The frequent starting for causing communication module, leads to the kwh loss of bone conduction earphone；Based on auditory scene analysis theory CASA and depth The neural network model that the long duration noise training of learning art obtains, takes the middle partial sound for the original sound signal included to believe Number, being compared with the noise signal that learnt in neural network model in batches, if it is determined that the partial sound signal and nerve The noise signal once learnt in network model is same or similar seemingly, is just labeled as noise, will be labeled as the portion of noise later Voice signal is divided to make inhibition or filtration treatment, conversely, recognizing by semantics recognition technology labeled as voice labeled as voice Partial sound signal after, will be issued automatically labeled as the partial sound signal of voice, recognize the part sound labeled as noise Sound signal does not start voice sending function then.

Further, the method also includes:

Inhibit or filter out the partial sound signal labeled as noise.

Further, the method also includes:

The original sound signal is received voice signal in COMPLEX MIXED sound source.

Further, the method also includes:

The partial sound signal is the voice signal of the single sound source separated from COMPLEX MIXED sound source.

Contain the voice signal of noise in the voice signal of the COMPLEX MIXED sound source received, and by its according to sound source into Row separation, then to carry out the noise signal for judging once to learn in the partial sound signal and neural network model same or similar Seemingly, so that more convenient realization is to the label of voice signal.

Further, the method also includes:

The voice signal of the single sound source is inputted in the function of the neural network model, judging result is obtained, sentences The same or similar label that breaks is, otherwise label is.

The method also includes:

Similarity threshold is set, the partial sound signal is believed with the noise learnt in neural network model in batches Number comparison, obtains the similarity of partial sound signal and noise signal, similarity is greater than similarity threshold and is then judged as noise, instead Be then judged as voice.

Further, described to recognize starting bluetooth module after the partial sound signal labeled as voice and mark as people The partial sound signal of sound issues automatically.

A kind of automatic starting control device of earphone, comprising:

Receiving module, for receiving original sound signal input；

Separation module, for the separate section voice signal from original sound signal；

Contrast module, for learning in the neural network model by obtaining the partial sound signal with noise training The noise signal comparison crossed judges that same or similar label is, otherwise label is；

Judge sending module, only after recognizing the partial sound signal labeled as voice, starting sending function will be marked It is issued automatically for the partial sound signal of voice.

A kind of electronic equipment comprising processor, storage medium and computer program, the computer program are stored in In storage medium, the computer program realizes the automatic starting control method of above-mentioned earphone when being executed by processor.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The automatic starting control method of above-mentioned earphone is realized when row.

Detailed description of the invention

Fig. 1 is the flow chart of the embodiment one of the automatic starting control method of earphone of the invention；

Fig. 2 is the flow chart of the embodiment two of the automatic starting control method of earphone of the invention.

Specific embodiment

Automatic starting control method, device, electronic equipment and the storage that a kind of earphone of the invention is described with reference to the drawings are situated between Matter is described in detail, protection scope of the present invention to be explained and illustrated.

Embodiment one

In conjunction with Fig. 1, a kind of automatic starting control method of earphone, the method be based on auditory scene analysis theory CASA and Deep learning technology the described method comprises the following steps:

Receive original sound signal input.

Bone conduction earphone includes bone-conduction microphone, and after wearing the bone conduction earphone, bone-conduction microphone can Sound is included in the vibration of face bone when being spoken according to human body, and is processed into original sound letter according to classical signal processing method Number.Auditory scene analysis theory CASA is based on sound processing techniques, especially Harbin Institute of Technology are ground in the prior art The auditory system of the front end processing techniques for the sound based on Computational auditory scene analysis (CASA) studied carefully, people can be in noise circumstance Middle differentiation simultaneously tracks oneself interested voice signal, content required for " capable of listening to " muli-sounds exist simultaneously. Auditory scene analysis (CASA) is exactly the theory proposed in this auditory physiology phenomenon.The neural sense of hearing system of CASA simulation human ear System, to the processing of voice signal closer to people to the Auditory Perception process of mixed sound signal.Therefore can be used to noise It is separated from voice signal, obtains purer voice signal, before being added one actually in speech recognition process End processing, to reach the accuracy rate for improving Noise speech recognition.It is that selection is closed using the emphasis that CASA carries out speech enhan-cement Suitable feature separates target voice and background noise, and available feature includes language spectrum energy, gene frequency and channel cross-correlation Characteristic threshold value.

Deep learning technology is a kind of based on the method for carrying out representative learning to data in machine learning.Observation (such as The frequency of noise sound wave, amplitude variation) various ways can be used to indicate, such as the vector of each pixel intensity value, Huo Zhegeng Abstractively it is expressed as a series of sides, region of specific shape etc..By the noise of auditory scene analysis theory CASA front-end processing, By deep learning technology, the treatment process of computer mould apery class audible signal is modeled, and admission bone-conduction microphone exists The noise that can be contacted in actual environment, integrates noise, specific by carrying out long-time study and comparison formation one Neural network model.

By auditory scene analysis theory CASA by that can be generated in environment locating for analog acquisition to bone-conduction microphone The various noises in addition to voice the electronic audio frequency number for study is transformed by traditional audio data processing mode According to the electronic audio data using these noises more than deep learning technology is learnt, and the function of formation is exactly corresponding nerve Network model, by writing for program, by function write-in program, when needing to judge noise, input needs the sound judged The result of judgement can be obtained by functional operation in data.

The separate section voice signal from original sound signal.

By original sound signal according to certain law separation at multiple portions voice signal, the comparison of noise is carried out respectively And judgement.

Based on the neural network model that noise training obtains, will learn in the partial sound signal and neural network model The noise signal comparison crossed judges that same or similar label is, otherwise label is.

After recognizing the partial sound signal labeled as voice, it will be issued automatically labeled as the partial sound signal of voice；

After recognizing the partial sound signal labeled as noise, the voice sending function of earphone is not started.

The automatic starting control method of a kind of earphone according to the present invention, at traditional bone conduction earphone voice signal Reason technology carries out signal processing to the sound that bone-conduction microphone is included, and is based on auditory scene analysis theory CASA and depth Learning art and semantics recognition technology are spent, a kind of automatic starting control for the distinctive earphone of bone-conduction microphone is proposed Method processed；Bone-conduction microphone itself will not be interfered by external sound, be had strong decrease of noise functions, but for bone conduction earphone Noise caused by after the contact with extraneous or collision is appointed can so be included by bone-conduction microphone, if noise is also inconjunction with It sends, can make the personnel for receiving sound that can hear the information transmitted in sound, lead to mistake key message therein And the live state of affairs is judged by accident, the instruction of mistake is assigned, and because the noise that accidentally touching shell generates also results in communication mould The frequent starting of block leads to the kwh loss of bone conduction earphone；Based on auditory scene analysis theory CASA and deep learning technology The obtained neural network model of long duration noise training, take the middle part voice signal for the original sound signal included, in batches Compared with the noise signal that learnt in neural network model, if it is determined that in the partial sound signal and neural network model The noise signal once learnt is same or similar seemingly, is just labeled as noise, will be labeled as the partial sound signal of noise later Make inhibition or filtration treatment, conversely, recognizing the partial sound labeled as voice by semantics recognition technology labeled as voice After signal, it will be issued automatically labeled as the partial sound signal of voice, recognize the partial sound signal labeled as noise, then not Start voice sending function.

The method also includes:

It is described to recognize starting bluetooth module after the partial sound signal labeled as voice and be labeled as the part sound of voice Sound signal issues automatically.The bone conduction earphone is connected to other blue-tooth devices, such as shoulder miaow intercom by bluetooth module, It is sent to remote command end by intercom, or remote command end is directly sent to by bone conduction earphone.

The method also includes:

Inhibit or filter out the partial sound signal labeled as noise.It can be realized the elimination earphone of bone conduction earphone The noise function of being generated after shell collision.

The method also includes:

Embodiment two

As shown in Fig. 2, on the basis of the automatic starting control method of the earphone described in embodiment one, it is further clear Its original sound signal received is received voice signal in COMPLEX MIXED sound source.Specific step is as follows:

Receive the voice signal input of COMPLEX MIXED sound source；

The voice signal of the single sound source separated from the voice signal of COMPLEX MIXED sound source；

Based on the obtained neural network model of noise training, by the voice signal of the single sound source in batches with nerve net The noise signal comparison learnt in network model judges that same or similar label is, otherwise label is；

After recognizing the partial sound signal labeled as voice, it will be issued automatically labeled as the partial sound signal of voice； After recognizing the partial sound signal labeled as noise, the voice sending function of earphone is not started.

Neural network model is formed by long-time training, and corresponding algorithm is all the training by up to ten thousand hours, So the algorithm has stronger robustness, it is not only restricted to Sounnd source direction, when passing through long by extracting different noise source study Between study comparison, separate voice and ring noise in real time, inhibit stable state and dynamic noise, can accurately differentiate different sound sources Sound is noise or voice, and makes corresponding label, and subsequent execution is facilitated to correspond to step.

That is, using auditory scene analysis theory CASA and deep learning technology, based on obtaining osteoacusis in advance Then the various noises in addition to voice that can be generated in environment locating for microphone collect these noises and carry out deep learning, The neural network model of formation, then the voice signal of the single sound source separated in the voice signal of COMPLEX MIXED sound source is input to In the function of this neural network model, judging result is obtained, judge that same or similar label is, otherwise label is people Sound.

Embodiment three

A kind of automatic starting control device of earphone, comprising:

Receiving module, for receiving original sound signal input；

Example IV

A kind of electronic equipment comprising processor, storage medium and computer program, the computer program are stored in In storage medium, the computer program realizes the automatic starting control method of above-mentioned earphone when being executed by processor.It calculates The quantity of processor can be one or more in machine equipment；Processor, memory, input unit and output in electronic equipment Device can be connected by bus or other modes.

Embodiment five

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor The automatic starting control method of above-mentioned earphone is realized when row.This method includes ear described in above-described embodiment one to embodiment two The automatic starting control method of machine.

Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above can also be performed provided by any embodiment of the invention based on earphone Automatic starting control method in relevant operation.

By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions use so that an electronic equipment (can be mobile phone, personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

It is worth noting that, in the embodiment of the automatic starting control device of above-mentioned earphone, included each unit and Module is only divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized ?；In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection being not intended to restrict the invention Range.

According to the disclosure and teachings of the above specification, those skilled in the art in the invention can also be to above-mentioned embodiment party Formula is changed and is modified.Therefore, the invention is not limited to the specific embodiments disclosed and described above, to of the invention Some modifications and changes should also be as falling into the scope of the claims of the present invention.In addition, although being used in this specification Some specific terms, these terms are merely for convenience of description, does not limit the present invention in any way.

Claims

1. a kind of automatic starting control method of earphone, which is characterized in that the method is based on auditory scene analysis theory CASA And deep learning technology, it the described method comprises the following steps:

Receive original sound signal input；

Multiple portions voice signal is separated into from original sound signal；

Based on the obtained neural network model of noise training, by the partial sound signal in batches with neural network model middle school The noise signal comparison practised judges that same or similar label is, otherwise label is；

Only after recognizing the partial sound signal labeled as voice, starting sending function will be believed labeled as the partial sound of voice It is number automatic to issue.

2. the automatic starting control method of earphone according to claim 1, which is characterized in that the method also includes:

Inhibit or filter out the partial sound signal labeled as noise.

3. the automatic starting control method of earphone according to claim 1, which is characterized in that the method also includes:

4. the automatic starting control method of earphone according to claim 3, which is characterized in that the method also includes:

The partial sound signal is the voice signal of the single sound source separated from the received voice signal of COMPLEX MIXED sound source.

5. the automatic starting control method of earphone according to claim 4, which is characterized in that the method also includes:

The voice signal of the single sound source is inputted in the function of the neural network model, judging result is obtained, judges phase Same or similar marker is noise, otherwise label is.

6. the automatic starting control method of earphone according to any one of claims 1 to 5, which is characterized in that the method Further include:

Set similarity threshold, by the partial sound signal in batches with the noise signal pair that learnt in neural network model Than, obtain the similarity of partial sound signal and noise signal, similarity is greater than similarity threshold and is then judged as noise, it is on the contrary then It is judged as voice.

7. the automatic starting control method of earphone according to any one of claims 1 to 5, which is characterized in that the identification Starting bluetooth module will issue automatically labeled as the partial sound signal of voice to after labeled as the partial sound signal of voice.

8. a kind of automatic starting control device of earphone characterized by comprising

Receiving module, for receiving original sound signal input；

Contrast module, for what is learnt in the neural network model by obtaining the partial sound signal and noise training Noise signal comparison judges that same or similar label is, otherwise label is；

Judge sending module, only after recognizing the partial sound signal labeled as voice, starting sending function will be marked as people The partial sound signal of sound issues automatically.

9. a kind of electronic equipment comprising processor, storage medium and computer program, the computer program are stored in In storage media, which is characterized in that realize that claim 1 to 7 is described in any item when the computer program is executed by processor The automatic starting control method of earphone.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The automatic starting control method of the described in any item earphones of claim 1 to 7 is realized when being executed by processor.