CN107799124A

CN107799124A - A kind of VAD detection methods applied to intelligent sound mouse

Info

Publication number: CN107799124A
Application number: CN201710948479.5A
Authority: CN
Inventors: 冯海洪; 朱国冉; 许成亮
Original assignee: Anhui Mic Technology Co Ltd
Current assignee: Anhui Mic Technology Co Ltd
Priority date: 2017-10-12
Filing date: 2017-10-12
Publication date: 2018-03-13

Abstract

The invention discloses a kind of VAD detection methods applied to intelligent sound mouse, each audio block that a section audio is collected to intelligent sound mouse carries out sub-frame processing, calculate the average energy value of each audio block framing, and compared with pre-set threshold value, if being more than or equal to predetermined threshold value, the result of VAD detections puts 1, if being less than predetermined threshold value, the result of VAD detections is set to 0, and judges to collect the volume and continuity of sound according to continuous 1 number in a section audio or how many block continuous 1.The present invention is detected by gathering the volume of voice and the continuity of sound to intelligent sound mouse, can correctly judge whether the mode that user interacts with machine is correct, and improves the accuracy of intelligent sound mouse performance to user reasonably to prompt.

Description

A kind of VAD detection methods applied to intelligent sound mouse

Technical field

The invention belongs to technical field of voice recognition, is related to a kind of VAD detection methods, is specifically that one kind is applied to intelligent language The VAD detection methods of sound mouse.

Background technology

When carrying out man-machine interaction using intelligent sound mouse, when the volume that intelligent sound mouse collects audio is too small, It can not correctly identify that prompting speaker increases as a result, it is desirable to which intelligent sound mouse gives correct prompting on uploading onto the server Big one's voice in speech.

VAD detection techniques are that the audio of collection is detected, and by handling audio, judge whether audio meets Identification requires.Existing VAD detections mainly collect the energy value and short-time zero-crossing rate of a section audio by calculating at present, and sentence Disconnected energy value and zero-crossing rate whether be more than threshold value judge whether be voice end points, it can only judge the end of a section audio Point, and the size of this section audio volume can not be judged, prompting user can not be played and correctly interacted with voice and machine.

The content of the invention

The invention provides a kind of VAD detection methods applied to intelligent sound mouse, by the volume to voice and The continuity of sound is detected, and judges whether the mode that user interacts with machine is correct, lifts correct man-machine interaction mode.

The purpose of the present invention can be achieved through the following technical solutions：

A kind of VAD detection methods applied to intelligent sound mouse, comprise the following steps：

Step S1, intelligent sound mouse collect the voice data that user speaks by the high-performance microphone built in it, And it is divided into some audio blocks；

Step S2, any audio block collected to intelligent sound mouse in a section audio data carry out sub-frame processing；

Step S3, calculate the short-time energy value of each frame of audio block after sub-frame processing.

Step S4, calculate sub-frame processing after all frames of audio block energy and；

Step S5, calculate the average energy value of the frame of audio block one after sub-frame processing；

Step S6, by the average energy value calculated in step S5 compared with pre-set threshold value, judge the average energy value Whether predetermined threshold value is more than, if the average energy value is more than or equal to predetermined threshold value, the result of VAD detections puts 1, if the average energy value Less than predetermined threshold value, the result of VAD detections is set to 0；

Step S7, the other audio blocks collected to intelligent sound mouse in a section audio data carry out VAD detections successively, Repeat step S2 to S6；

Step S8, the VAD testing results progress that all audio blocks in a section audio are collected to intelligent sound mouse are secondary Processing, lists the VAD testing results of all audio blocks in a section audio, continuous according to wherein continuous 1 number or how many block 1 judge to collect the volume and continuity of sound, export corresponding feedback information, prompt user correctly to use voice and machine Device interacts.

Beneficial effects of the present invention：The VAD detection methods of intelligent sound mouse provided by the invention, by intelligent sound The volume of mouse collection voice and the continuity of sound are detected, and can correctly judge the mode that user interacts with machine It is whether correct, and improve the accuracy of intelligent sound mouse performance to user reasonably to prompt.

Brief description of the drawings

The present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.

Fig. 1 is flow chart of the method for the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained all other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.

As shown in figure 1, the invention provides a kind of VAD detection methods applied to intelligent sound mouse, including following step Suddenly：

Step S1, intelligent sound mouse collect the voice data that user speaks by the high-performance microphone built in it, And it is divided into some audio blocks.

Step S2, any block audio block collected to intelligent sound mouse in voice data carry out sub-frame processing；For Sample rate is 16000HZ, and in the case of 16, a frame is 512B.

Step S4, calculate sub-frame processing after all frames of audio block energy and；, can for the audio block that size is 4096B To be classified as 8 frames, the short-time energy value of this 8 frame is added, calculate this 8 frame energy and.

Step S5, calculate the average energy value of the frame of audio block one after sub-frame processing；Audio block for size for 4096B, 8 frames can be classified as, by the energy of this 8 frame and divided by 8, calculate its average value.

Step S6, by the average energy value calculated in step S5 compared with pre-set threshold value, judge the average energy value Whether predetermined threshold value is more than, if the average energy value is more than or equal to predetermined threshold value, the result of VAD detections puts 1, if the average energy value Less than predetermined threshold value, the result of VAD detections is set to 0.

Step S7, the other audio blocks collected to intelligent sound mouse in a section audio data carry out VAD detections successively, Repeat step S2 to S6.

The VAD detection methods of intelligent sound mouse provided by the invention, by the sound that voice is gathered to intelligent sound mouse Amount and the continuity of sound are detected, and can correctly judge whether the mode that user interacts with machine is correct, and to use The accuracy of intelligent sound mouse performance is improved reasonably to prompt in family.

In the description of this specification, the description of reference term " one embodiment ", " example ", " specific example " etc. means At least one implementation of the present invention is contained in reference to specific features, structure, material or the feature that the embodiment or example describe In example or example.In this manual, identical embodiment or example are not necessarily referring to the schematic representation of above-mentioned term. Moreover, specific features, structure, material or the feature of description can close in any one or more embodiments or example Suitable mode combines.

Above content is only to structure example of the present invention and explanation, affiliated those skilled in the art couple Described specific embodiment is made various modifications or supplement or substituted using similar mode, without departing from invention Structure surmounts scope defined in the claims, all should belong to protection scope of the present invention.

Claims

1. a kind of VAD detection methods applied to intelligent sound mouse, it is characterised in that comprise the following steps：

Step S1, intelligent sound mouse collect the voice data that user speaks by the high-performance microphone built in it, and will It is divided into some audio blocks；

Step S6, by the average energy value calculated in step S5 compared with pre-set threshold value, whether judge the average energy value More than predetermined threshold value, if the average energy value is more than or equal to predetermined threshold value, the result of VAD detections puts 1, if the average energy value is less than Predetermined threshold value, the result of VAD detections are set to 0；

Step S7, other audio blocks in a section audio data are collected to intelligent sound mouse and carry out VAD detections successively, repeated Step S2 to S6；

Step S8, the VAD testing results that all audio blocks in a section audio are collected to intelligent sound mouse carry out after-treatment, List the VAD testing results of all audio blocks in a section audio, according to wherein continuous 1 number or how many block continuous 1 come Judgement collects the volume and continuity of sound, exports corresponding feedback information, prompts user correctly to enter with voice and machine Row interaction.