CN109377982A - A kind of efficient voice acquisition methods - Google Patents
A kind of efficient voice acquisition methods Download PDFInfo
- Publication number
- CN109377982A CN109377982A CN201810956017.2A CN201810956017A CN109377982A CN 109377982 A CN109377982 A CN 109377982A CN 201810956017 A CN201810956017 A CN 201810956017A CN 109377982 A CN109377982 A CN 109377982A
- Authority
- CN
- China
- Prior art keywords
- voice
- frequency
- sampling
- point
- energy value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 15
- 238000005070 sampling Methods 0.000 claims abstract description 47
- 230000009466 transformation Effects 0.000 claims abstract description 6
- 241001269238 Data Species 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
Abstract
The invention discloses a kind of efficient voice acquisition methods, comprising the following steps: obtains the starting point and end point of voice to be identified;Voice to be identified is successively sampled according to preset sample frequency and sample size, the sampled audio data corresponds to several sampled points of voice to be identified;All sampled audio datas are passed sequentially through into FFT Fourier transformation and obtain several sampling frequencies;When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is greater than preset energy value n1, and the energy variance obtained is greater than preset energy value n2, then judge the corresponding sample of the sampling frequency in the range of efficient voice;Conversely, then judging the corresponding sample of the sampling frequency in the range of noise;Using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;Using the first sampled point in the sampling point sequence of noise as the end point of efficient voice.It is able to achieve from voice to be identified and accurately obtains efficient voice.
Description
Technical field
The present invention relates to field of voice signal, and in particular to a kind of acquisition methods of efficient voice.
Background technique
In recent ten years, it is obtained in the adaptive technique of the design of refined model, parameter extraction and optimization and system
Some key developments.Speech recognition technology is more and more mature, and accuracy rate is gradually improved, and has corresponding language in the market
Sound product.
In intelligent recording and broadcasting system, continuous raising man-machine interaction experience easily facilitates teacher and does not need management recorded broadcast system
System, voice command words identify and then control the common function of recording and broadcasting system, and teacher can forget the presence of recording and broadcasting system, more specially
The heart and teaching.Teacher only needs to say " starting to record " upper class hour, and recording and broadcasting system begins to recorded video.It is said at the end of after class
The recording of a class hall can be completed in " stopping recording ".
There is corresponding order word identification module currently on the market, but most application must all network and just be able to achieve order
The identification of word, which hinders order word identification functions in the application of embedded recording and broadcasting system, and the order word of small-sized efficient, which identifies, to exist
It is very promising in embedded system.
The order word identifying system of small-sized efficient carries out detection processing, Cong Zhongti firstly the need of the one section of voice said to teacher
Efficient voice is taken out, to identify to efficient voice.
Summary of the invention
In view of the above technical problem, the purpose of the present invention is to provide a kind of acquisition methods of efficient voice, are able to achieve
Efficient voice is accurately obtained from voice to be identified.
The invention adopts the following technical scheme:
A kind of efficient voice acquisition methods, comprising the following steps:
Obtain the starting point and end point of voice to be identified;
Obtain the efficient voice of voice to be identified;The efficient voice of the voice to be identified is to be started with the starting point,
And the complete speech terminated with the end point;
Obtain the starting point and end point of voice to be identified the following steps are included:
Voice to be identified is successively sampled according to preset sample frequency and sample size, obtains several sampled audios
Data, the sampled audio data correspond to several sampled points of voice to be identified;All sampled audio datas are passed sequentially through into FFT
Fourier transformation obtains several sampling frequencies;
Obtain the energy value that all sampling frequency frequencies are located at 100~1000Hz;And by the energy value successively with it is default
Energy value n1 is compared;
Obtain the energy variance that all sampling frequency frequencies are located in 300~1000Hz frequency range;And by the energy variance
Successively compared with preset energy value n2;
When in sampling frequency frequency frequency be located at 300~1000Hz frequency range acquisition energy value be greater than preset energy value n1,
And obtain energy variance be greater than preset energy value n2 when, then judge the corresponding sample of the sampling frequency in efficient voice
Range;
When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is not greater than preset energy value n1
Or the energy variance obtained then judges the corresponding sample of the sampling frequency in the model of noise when not being greater than preset energy value n2
It encloses;
All sampled points for being located at the range of complete speech are chronologically arranged, the complete speech chronologically arranged
Point sequence is sampled, using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;
By the sampled point for being located at the range of noise and the sampled point sampling time is located at the institute after the starting point of efficient voice
There is sampled point chronologically to arrange, the sampling point sequence of the noise chronologically arranged, in the sampling point sequence of noise first
Sampled point is the end point of efficient voice.
Further, preset sample size is 2048 audio datas.
Further, the preset energy value n1 is 38000-60000J.
Further, the preset energy value n2 is 30-70J.
Compared with prior art, the beneficial effects of the present invention are:
The present invention is started, and with the knot by the starting point and end point of acquisition voice to be identified with the starting point
The complete speech that beam spot terminates is efficient voice, realizes and carries out detection processing to voice to be identified, is therefrom extracted effectively
Voice, to be identified to efficient voice.Further, by carrying out pair the energy variance of frequency range and preset energy value N2
Than improving the accuracy rate to voice starting point and end point to be identified judgement.
Detailed description of the invention
Fig. 1 is the flow diagram of efficient voice acquisition methods of the present invention.
Specific embodiment
In the following, being described further in conjunction with attached drawing and specific embodiment to the present invention, it should be noted that not
Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination
Example:
Embodiment:
Referring to FIG. 1, efficient voice acquisition methods, comprising the following steps:
Step S100: the starting point and end point of voice to be identified is obtained;
Step S200: the efficient voice of voice to be identified is obtained;The efficient voice of the voice to be identified is to be opened with described
Initial point starts, and the complete speech terminated with the end point;
Obtain the starting point and end point of voice to be identified the following steps are included:
Step S1001: voice to be identified is successively sampled according to preset sample frequency and sample size, if obtaining
Dry sampled audio data, the sampled audio data correspond to several sampled points of voice to be identified;And by all sampled audio numbers
Several sampling frequencies are obtained according to FFT Fourier transformation is passed sequentially through.Specific: voice to be identified has taken limit for length's discrete signal x
(n), n=0,1 ..., N-1, preferred sample size N preferably takes 2048 in the present invention.There to be limit for length discrete signal x (n)
It is divided into the sum of two sequences of even number and odd number, obtains: x (n)=x1(n)+x2(n);The length of x1 (n) and x2 (n) is all N/2,
X1 (n) is even order, and x2 (n) is odd numbered sequences.By the calculation formula of FFT Fourier transformation:
N number of plural number X (k) frequency domain is obtained, plural number X (k) modulus obtained above is obtained into N number of amplitude complx (N) (N=
0,1,...N);
Step S1002: the energy value that all sampling frequency frequencies are located at 100~1000Hz is obtained;And by the energy value
Successively compared with preset energy value n1;The energy value calculating method are as follows: according to FFT Fourier transformation obtain frequency domain is
In (N/2) symmetrical characteristic, i.e., only needing to calculate (N/2) a frequency can be according to following equation fs=i* (FS/N), (its
Middle fs is calculative frame per second, i=01 ... (N/2), N are number of samples, and FS is the sample frequency of this section audio, are somebody's turn to do
(N/2) a frequency of section frequency spectrum, corresponds with amplitude complx (N), the corresponding amplitude (energy of each frequency can be obtained
Amount);
Step S1003: the energy variance that all sampling frequency frequencies are located in 300~1000Hz frequency range is obtained;And by institute
Energy variance is stated successively to compare with preset energy value n2;Specifically, energy formula of variance(wherein, S is variance yields, and m is the number more than preset energy value N1,
Complx (i) is amplitude corresponding more than preset energy value N1, and averageComplx is all width more than preset energy value N1
The average of value);
Step S1004: it is preset when the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is greater than
Energy value n1, and obtain energy variance be greater than preset energy value n2 when, then judge the corresponding sample of the sampling frequency in
The range of efficient voice;
Step S1005: all sampled points for being located at the range of complete speech are chronologically arranged, are chronologically arranged
The sampling point sequence of complete speech, using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;
Step S1006: when in sampling frequency frequency frequency be located at 300~1000Hz frequency range acquisition energy value be not greater than it is pre-
If energy value n1 or the energy variance of acquisition are not greater than preset energy value n2, then the corresponding sample of the sampling frequency is judged
In the range of noise;
Step S1007: by the sampled point for being located at the range of noise and the sampled point sampling time is located at the beginning of efficient voice
All sampled points after point chronologically arrange, the sampling point sequence of the noise chronologically arranged, with the sampled point of noise
The first sampled point is the end point of efficient voice in sequence.The chronologically arrangement refers to according to sampled point in voice to be identified
Appearance chronological order.Sampled point sampling time sequencing is also the appearance with sampled point in voice to be identified
Chronological order is successively sampled.
Digitized voice data is exactly audio data.In digitized voice, there are two important indexs, i.e. sampling frequency
Rate and sample size.Sampling number in sample frequency, that is, unit time, sample frequency is bigger, and the interval between sampled point is got over
Small, the sound digitized is more true to nature, but corresponding data volume increases, and deals with more difficult;Sample size is remembered
The digit of the numerical value of each sample value size is recorded, it determines the dynamic range of sampling, and digit is more, can record sound
Variation degree is finer and smoother, and resulting data volume is also bigger.Preferably, preset sample size is 2048 audio datas.Such as
Fruit sample size is too small, this section audio obtained in this way can be inaccurate, and frequency resolution is too low, needs to become by FFT Fourier
Change zero padding, the case where zero padding can expend cpu resource and time-consuming, and sampling is excessive also can be time-consuming, therefore, uses sample size 2048
A audio data both ensure that the precision of resolution ratio, will not excessively expend cpu resource.
As soon as section voice is switched to frequency domain from time domain, at this moment this section of voice has quantifiable parameter, (the frequency model of voice
Enclose) judge whether there is whether this section of voice has the frequency of voice while corresponding frequency energy value is how many.Invention of the invention
Point further by the way that the energy variance of frequency range is compared with preset energy value N2, improve to voice starting point to be identified with
The accuracy rate of end point judgement, most of energy value in each frequency range of noise of 100-1000HZ are not much different, therefore these
Noise variance yields is smaller.
N1 and N2 value is adjusted, value is smaller, sensitiveer, it is easy to which that trigger is judged as that this section of voice is exactly that voice is not made an uproar
Sound, but the probability of false triggering can be bigger.According to the various tests of project, when preset energy value n1 is set as 38000-
60000J when preset energy value n2 is set as 30-70J, substantially increases the accuracy rate of starting point and end point detection.
It will be apparent to those skilled in the art that can make various other according to the above description of the technical scheme and ideas
Corresponding change and deformation, and all these changes and deformation all should belong to the protection scope of the claims in the present invention
Within.
Claims (4)
1. a kind of efficient voice acquisition methods, which comprises the following steps:
Obtain the starting point and end point of voice to be identified;
Obtain the efficient voice of voice to be identified;The efficient voice of the voice to be identified be started with the starting point, and with
The complete speech that the end point terminates;
Obtain the starting point and end point of voice to be identified the following steps are included:
Voice to be identified is successively sampled according to preset sample frequency and sample size, obtains several sampled audio numbers
According to the sampled audio data corresponds to several sampled points of voice to be identified;All sampled audio datas are passed sequentially through into Fu FFT
In leaf transformation obtain several sampling frequencies;
Obtain the energy value that all sampling frequency frequencies are located at 100~1000Hz;And by the energy value successively with preset energy
Value n1 is compared;
Obtain the energy variance that all sampling frequency frequencies are located in 300~1000Hz frequency range;And successively by the energy variance
It is compared with preset energy value n2;
When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is greater than preset energy value n1, and obtain
When the energy variance taken is greater than preset energy value n2, then judge the corresponding sample of the sampling frequency in the model of efficient voice
It encloses;
When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is not greater than preset energy value n1 or obtains
When the energy variance taken is not greater than preset energy value n2, then judge the corresponding sample of the sampling frequency in the range of noise;
All sampled points for being located at the range of complete speech are chronologically arranged, the sampling of the complete speech chronologically arranged
Point sequence, using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;
By the sampled point for being located at the range of noise and the sampled point sampling time is located at after the starting point of efficient voice all adopts
Sampling point chronologically arranges, the sampling point sequence of the noise chronologically arranged, with the first sampling in the sampling point sequence of noise
Point is the end point of efficient voice.
2. efficient voice acquisition methods according to claim 1, which is characterized in that preset sample size is 2048 sounds
Frequency evidence.
3. efficient voice acquisition methods according to claim 1, which is characterized in that the preset energy value n1 is 38000-
60000J。
4. efficient voice acquisition methods according to claim 3, which is characterized in that the preset energy value n2 is 30-
70J。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810956017.2A CN109377982B (en) | 2018-08-21 | 2018-08-21 | Effective voice obtaining method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810956017.2A CN109377982B (en) | 2018-08-21 | 2018-08-21 | Effective voice obtaining method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109377982A true CN109377982A (en) | 2019-02-22 |
CN109377982B CN109377982B (en) | 2022-07-05 |
Family
ID=65404358
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810956017.2A Active CN109377982B (en) | 2018-08-21 | 2018-08-21 | Effective voice obtaining method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109377982B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110210893A (en) * | 2019-05-09 | 2019-09-06 | 秒针信息技术有限公司 | Generation method, device, storage medium and the electronic device of report |
CN110365555A (en) * | 2019-08-08 | 2019-10-22 | 广州虎牙科技有限公司 | Audio delay test method, device, electronic equipment and readable storage medium storing program for executing |
CN110428853A (en) * | 2019-08-30 | 2019-11-08 | 北京太极华保科技股份有限公司 | Voice activity detection method, Voice activity detection device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5579431A (en) * | 1992-10-05 | 1996-11-26 | Panasonic Technologies, Inc. | Speech detection in presence of noise by determining variance over time of frequency band limited energy |
CN101625857A (en) * | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Self-adaptive voice endpoint detection method |
CN104021789A (en) * | 2014-06-25 | 2014-09-03 | 厦门大学 | Self-adaption endpoint detection method using short-time time-frequency value |
CN105467428A (en) * | 2015-11-17 | 2016-04-06 | 南京航空航天大学 | Seismic wave warning method based on short-time energy detection and spectrum feature analysis |
US20170004840A1 (en) * | 2015-06-30 | 2017-01-05 | Zte Corporation | Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof |
CN106601230A (en) * | 2016-12-19 | 2017-04-26 | 苏州金峰物联网技术有限公司 | Logistics sorting place name speech recognition method, system and logistics sorting system based on continuous Gaussian mixture HMM |
-
2018
- 2018-08-21 CN CN201810956017.2A patent/CN109377982B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5579431A (en) * | 1992-10-05 | 1996-11-26 | Panasonic Technologies, Inc. | Speech detection in presence of noise by determining variance over time of frequency band limited energy |
CN101625857A (en) * | 2008-07-10 | 2010-01-13 | 新奥特(北京)视频技术有限公司 | Self-adaptive voice endpoint detection method |
CN104021789A (en) * | 2014-06-25 | 2014-09-03 | 厦门大学 | Self-adaption endpoint detection method using short-time time-frequency value |
US20170004840A1 (en) * | 2015-06-30 | 2017-01-05 | Zte Corporation | Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof |
CN105467428A (en) * | 2015-11-17 | 2016-04-06 | 南京航空航天大学 | Seismic wave warning method based on short-time energy detection and spectrum feature analysis |
CN106601230A (en) * | 2016-12-19 | 2017-04-26 | 苏州金峰物联网技术有限公司 | Logistics sorting place name speech recognition method, system and logistics sorting system based on continuous Gaussian mixture HMM |
Non-Patent Citations (2)
Title |
---|
刘玉珍等: "基于频谱方差的抗噪声语音端点检测算法", 《计算机仿真》 * |
蔡萍: "一种结合短时过零率的快速语音端点检测算法", 《厦门理工学院学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110210893A (en) * | 2019-05-09 | 2019-09-06 | 秒针信息技术有限公司 | Generation method, device, storage medium and the electronic device of report |
CN110365555A (en) * | 2019-08-08 | 2019-10-22 | 广州虎牙科技有限公司 | Audio delay test method, device, electronic equipment and readable storage medium storing program for executing |
CN110428853A (en) * | 2019-08-30 | 2019-11-08 | 北京太极华保科技股份有限公司 | Voice activity detection method, Voice activity detection device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN109377982B (en) | 2022-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
CN103280220B (en) | A kind of real-time recognition method for baby cry | |
CN104900235B (en) | Method for recognizing sound-groove based on pitch period composite character parameter | |
CN103310789B (en) | A kind of sound event recognition method of the parallel model combination based on improving | |
CN102968990B (en) | Speaker identifying method and system | |
CN110232933B (en) | Audio detection method and device, storage medium and electronic equipment | |
AU2017341161A1 (en) | Voiceprint recognition method, device, storage medium and background server | |
CN109065043B (en) | Command word recognition method and computer storage medium | |
CN104021789A (en) | Self-adaption endpoint detection method using short-time time-frequency value | |
CN109377982A (en) | A kind of efficient voice acquisition methods | |
CN109243497A (en) | The control method and device that voice wakes up | |
CN110600048B (en) | Audio verification method and device, storage medium and electronic equipment | |
CN109524011A (en) | A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition | |
CN102789779A (en) | Speech recognition system and recognition method thereof | |
CN110428853A (en) | Voice activity detection method, Voice activity detection device and electronic equipment | |
CN110797031A (en) | Voice change detection method, system, mobile terminal and storage medium | |
CN110299141A (en) | The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition | |
CN110570870A (en) | Text-independent voiceprint recognition method, device and equipment | |
CN106548786A (en) | A kind of detection method and system of voice data | |
CN110890087A (en) | Voice recognition method and device based on cosine similarity | |
US10522160B2 (en) | Methods and apparatus to identify a source of speech captured at a wearable electronic device | |
CN109920447B (en) | Recording fraud detection method based on adaptive filter amplitude phase characteristic extraction | |
CN110689887A (en) | Audio verification method and device, storage medium and electronic equipment | |
CN109545226A (en) | A kind of audio recognition method, equipment and computer readable storage medium | |
CN110070891B (en) | Song identification method and device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |