CN109377982A - A kind of efficient voice acquisition methods - Google Patents

A kind of efficient voice acquisition methods Download PDF

Info

Publication number
CN109377982A
CN109377982A CN201810956017.2A CN201810956017A CN109377982A CN 109377982 A CN109377982 A CN 109377982A CN 201810956017 A CN201810956017 A CN 201810956017A CN 109377982 A CN109377982 A CN 109377982A
Authority
CN
China
Prior art keywords
voice
frequency
sampling
point
energy value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810956017.2A
Other languages
Chinese (zh)
Other versions
CN109377982B (en
Inventor
赵定金
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Baolun Electronics Co Ltd
Original Assignee
Guangzhou Baolun Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Baolun Electronics Co Ltd filed Critical Guangzhou Baolun Electronics Co Ltd
Priority to CN201810956017.2A priority Critical patent/CN109377982B/en
Publication of CN109377982A publication Critical patent/CN109377982A/en
Application granted granted Critical
Publication of CN109377982B publication Critical patent/CN109377982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

The invention discloses a kind of efficient voice acquisition methods, comprising the following steps: obtains the starting point and end point of voice to be identified;Voice to be identified is successively sampled according to preset sample frequency and sample size, the sampled audio data corresponds to several sampled points of voice to be identified;All sampled audio datas are passed sequentially through into FFT Fourier transformation and obtain several sampling frequencies;When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is greater than preset energy value n1, and the energy variance obtained is greater than preset energy value n2, then judge the corresponding sample of the sampling frequency in the range of efficient voice;Conversely, then judging the corresponding sample of the sampling frequency in the range of noise;Using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;Using the first sampled point in the sampling point sequence of noise as the end point of efficient voice.It is able to achieve from voice to be identified and accurately obtains efficient voice.

Description

A kind of efficient voice acquisition methods
Technical field
The present invention relates to field of voice signal, and in particular to a kind of acquisition methods of efficient voice.
Background technique
In recent ten years, it is obtained in the adaptive technique of the design of refined model, parameter extraction and optimization and system Some key developments.Speech recognition technology is more and more mature, and accuracy rate is gradually improved, and has corresponding language in the market Sound product.
In intelligent recording and broadcasting system, continuous raising man-machine interaction experience easily facilitates teacher and does not need management recorded broadcast system System, voice command words identify and then control the common function of recording and broadcasting system, and teacher can forget the presence of recording and broadcasting system, more specially The heart and teaching.Teacher only needs to say " starting to record " upper class hour, and recording and broadcasting system begins to recorded video.It is said at the end of after class The recording of a class hall can be completed in " stopping recording ".
There is corresponding order word identification module currently on the market, but most application must all network and just be able to achieve order The identification of word, which hinders order word identification functions in the application of embedded recording and broadcasting system, and the order word of small-sized efficient, which identifies, to exist It is very promising in embedded system.
The order word identifying system of small-sized efficient carries out detection processing, Cong Zhongti firstly the need of the one section of voice said to teacher Efficient voice is taken out, to identify to efficient voice.
Summary of the invention
In view of the above technical problem, the purpose of the present invention is to provide a kind of acquisition methods of efficient voice, are able to achieve Efficient voice is accurately obtained from voice to be identified.
The invention adopts the following technical scheme:
A kind of efficient voice acquisition methods, comprising the following steps:
Obtain the starting point and end point of voice to be identified;
Obtain the efficient voice of voice to be identified;The efficient voice of the voice to be identified is to be started with the starting point, And the complete speech terminated with the end point;
Obtain the starting point and end point of voice to be identified the following steps are included:
Voice to be identified is successively sampled according to preset sample frequency and sample size, obtains several sampled audios Data, the sampled audio data correspond to several sampled points of voice to be identified;All sampled audio datas are passed sequentially through into FFT Fourier transformation obtains several sampling frequencies;
Obtain the energy value that all sampling frequency frequencies are located at 100~1000Hz;And by the energy value successively with it is default Energy value n1 is compared;
Obtain the energy variance that all sampling frequency frequencies are located in 300~1000Hz frequency range;And by the energy variance Successively compared with preset energy value n2;
When in sampling frequency frequency frequency be located at 300~1000Hz frequency range acquisition energy value be greater than preset energy value n1, And obtain energy variance be greater than preset energy value n2 when, then judge the corresponding sample of the sampling frequency in efficient voice Range;
When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is not greater than preset energy value n1 Or the energy variance obtained then judges the corresponding sample of the sampling frequency in the model of noise when not being greater than preset energy value n2 It encloses;
All sampled points for being located at the range of complete speech are chronologically arranged, the complete speech chronologically arranged Point sequence is sampled, using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;
By the sampled point for being located at the range of noise and the sampled point sampling time is located at the institute after the starting point of efficient voice There is sampled point chronologically to arrange, the sampling point sequence of the noise chronologically arranged, in the sampling point sequence of noise first Sampled point is the end point of efficient voice.
Further, preset sample size is 2048 audio datas.
Further, the preset energy value n1 is 38000-60000J.
Further, the preset energy value n2 is 30-70J.
Compared with prior art, the beneficial effects of the present invention are:
The present invention is started, and with the knot by the starting point and end point of acquisition voice to be identified with the starting point The complete speech that beam spot terminates is efficient voice, realizes and carries out detection processing to voice to be identified, is therefrom extracted effectively Voice, to be identified to efficient voice.Further, by carrying out pair the energy variance of frequency range and preset energy value N2 Than improving the accuracy rate to voice starting point and end point to be identified judgement.
Detailed description of the invention
Fig. 1 is the flow diagram of efficient voice acquisition methods of the present invention.
Specific embodiment
In the following, being described further in conjunction with attached drawing and specific embodiment to the present invention, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example:
Embodiment:
Referring to FIG. 1, efficient voice acquisition methods, comprising the following steps:
Step S100: the starting point and end point of voice to be identified is obtained;
Step S200: the efficient voice of voice to be identified is obtained;The efficient voice of the voice to be identified is to be opened with described Initial point starts, and the complete speech terminated with the end point;
Obtain the starting point and end point of voice to be identified the following steps are included:
Step S1001: voice to be identified is successively sampled according to preset sample frequency and sample size, if obtaining Dry sampled audio data, the sampled audio data correspond to several sampled points of voice to be identified;And by all sampled audio numbers Several sampling frequencies are obtained according to FFT Fourier transformation is passed sequentially through.Specific: voice to be identified has taken limit for length's discrete signal x (n), n=0,1 ..., N-1, preferred sample size N preferably takes 2048 in the present invention.There to be limit for length discrete signal x (n) It is divided into the sum of two sequences of even number and odd number, obtains: x (n)=x1(n)+x2(n);The length of x1 (n) and x2 (n) is all N/2, X1 (n) is even order, and x2 (n) is odd numbered sequences.By the calculation formula of FFT Fourier transformation:
N number of plural number X (k) frequency domain is obtained, plural number X (k) modulus obtained above is obtained into N number of amplitude complx (N) (N= 0,1,...N);
Step S1002: the energy value that all sampling frequency frequencies are located at 100~1000Hz is obtained;And by the energy value Successively compared with preset energy value n1;The energy value calculating method are as follows: according to FFT Fourier transformation obtain frequency domain is In (N/2) symmetrical characteristic, i.e., only needing to calculate (N/2) a frequency can be according to following equation fs=i* (FS/N), (its Middle fs is calculative frame per second, i=01 ... (N/2), N are number of samples, and FS is the sample frequency of this section audio, are somebody's turn to do (N/2) a frequency of section frequency spectrum, corresponds with amplitude complx (N), the corresponding amplitude (energy of each frequency can be obtained Amount);
Step S1003: the energy variance that all sampling frequency frequencies are located in 300~1000Hz frequency range is obtained;And by institute Energy variance is stated successively to compare with preset energy value n2;Specifically, energy formula of variance(wherein, S is variance yields, and m is the number more than preset energy value N1, Complx (i) is amplitude corresponding more than preset energy value N1, and averageComplx is all width more than preset energy value N1 The average of value);
Step S1004: it is preset when the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is greater than Energy value n1, and obtain energy variance be greater than preset energy value n2 when, then judge the corresponding sample of the sampling frequency in The range of efficient voice;
Step S1005: all sampled points for being located at the range of complete speech are chronologically arranged, are chronologically arranged The sampling point sequence of complete speech, using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;
Step S1006: when in sampling frequency frequency frequency be located at 300~1000Hz frequency range acquisition energy value be not greater than it is pre- If energy value n1 or the energy variance of acquisition are not greater than preset energy value n2, then the corresponding sample of the sampling frequency is judged In the range of noise;
Step S1007: by the sampled point for being located at the range of noise and the sampled point sampling time is located at the beginning of efficient voice All sampled points after point chronologically arrange, the sampling point sequence of the noise chronologically arranged, with the sampled point of noise The first sampled point is the end point of efficient voice in sequence.The chronologically arrangement refers to according to sampled point in voice to be identified Appearance chronological order.Sampled point sampling time sequencing is also the appearance with sampled point in voice to be identified Chronological order is successively sampled.
Digitized voice data is exactly audio data.In digitized voice, there are two important indexs, i.e. sampling frequency Rate and sample size.Sampling number in sample frequency, that is, unit time, sample frequency is bigger, and the interval between sampled point is got over Small, the sound digitized is more true to nature, but corresponding data volume increases, and deals with more difficult;Sample size is remembered The digit of the numerical value of each sample value size is recorded, it determines the dynamic range of sampling, and digit is more, can record sound Variation degree is finer and smoother, and resulting data volume is also bigger.Preferably, preset sample size is 2048 audio datas.Such as Fruit sample size is too small, this section audio obtained in this way can be inaccurate, and frequency resolution is too low, needs to become by FFT Fourier Change zero padding, the case where zero padding can expend cpu resource and time-consuming, and sampling is excessive also can be time-consuming, therefore, uses sample size 2048 A audio data both ensure that the precision of resolution ratio, will not excessively expend cpu resource.
As soon as section voice is switched to frequency domain from time domain, at this moment this section of voice has quantifiable parameter, (the frequency model of voice Enclose) judge whether there is whether this section of voice has the frequency of voice while corresponding frequency energy value is how many.Invention of the invention Point further by the way that the energy variance of frequency range is compared with preset energy value N2, improve to voice starting point to be identified with The accuracy rate of end point judgement, most of energy value in each frequency range of noise of 100-1000HZ are not much different, therefore these Noise variance yields is smaller.
N1 and N2 value is adjusted, value is smaller, sensitiveer, it is easy to which that trigger is judged as that this section of voice is exactly that voice is not made an uproar Sound, but the probability of false triggering can be bigger.According to the various tests of project, when preset energy value n1 is set as 38000- 60000J when preset energy value n2 is set as 30-70J, substantially increases the accuracy rate of starting point and end point detection.
It will be apparent to those skilled in the art that can make various other according to the above description of the technical scheme and ideas Corresponding change and deformation, and all these changes and deformation all should belong to the protection scope of the claims in the present invention Within.

Claims (4)

1. a kind of efficient voice acquisition methods, which comprises the following steps:
Obtain the starting point and end point of voice to be identified;
Obtain the efficient voice of voice to be identified;The efficient voice of the voice to be identified be started with the starting point, and with The complete speech that the end point terminates;
Obtain the starting point and end point of voice to be identified the following steps are included:
Voice to be identified is successively sampled according to preset sample frequency and sample size, obtains several sampled audio numbers According to the sampled audio data corresponds to several sampled points of voice to be identified;All sampled audio datas are passed sequentially through into Fu FFT In leaf transformation obtain several sampling frequencies;
Obtain the energy value that all sampling frequency frequencies are located at 100~1000Hz;And by the energy value successively with preset energy Value n1 is compared;
Obtain the energy variance that all sampling frequency frequencies are located in 300~1000Hz frequency range;And successively by the energy variance It is compared with preset energy value n2;
When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is greater than preset energy value n1, and obtain When the energy variance taken is greater than preset energy value n2, then judge the corresponding sample of the sampling frequency in the model of efficient voice It encloses;
When the energy value that frequency is located at the acquisition of 300~1000Hz frequency range in sampling frequency frequency is not greater than preset energy value n1 or obtains When the energy variance taken is not greater than preset energy value n2, then judge the corresponding sample of the sampling frequency in the range of noise;
All sampled points for being located at the range of complete speech are chronologically arranged, the sampling of the complete speech chronologically arranged Point sequence, using the first sampled point in the sampling point sequence of efficient voice as the starting point of efficient voice;
By the sampled point for being located at the range of noise and the sampled point sampling time is located at after the starting point of efficient voice all adopts Sampling point chronologically arranges, the sampling point sequence of the noise chronologically arranged, with the first sampling in the sampling point sequence of noise Point is the end point of efficient voice.
2. efficient voice acquisition methods according to claim 1, which is characterized in that preset sample size is 2048 sounds Frequency evidence.
3. efficient voice acquisition methods according to claim 1, which is characterized in that the preset energy value n1 is 38000- 60000J。
4. efficient voice acquisition methods according to claim 3, which is characterized in that the preset energy value n2 is 30- 70J。
CN201810956017.2A 2018-08-21 2018-08-21 Effective voice obtaining method Active CN109377982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810956017.2A CN109377982B (en) 2018-08-21 2018-08-21 Effective voice obtaining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810956017.2A CN109377982B (en) 2018-08-21 2018-08-21 Effective voice obtaining method

Publications (2)

Publication Number Publication Date
CN109377982A true CN109377982A (en) 2019-02-22
CN109377982B CN109377982B (en) 2022-07-05

Family

ID=65404358

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810956017.2A Active CN109377982B (en) 2018-08-21 2018-08-21 Effective voice obtaining method

Country Status (1)

Country Link
CN (1) CN109377982B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210893A (en) * 2019-05-09 2019-09-06 秒针信息技术有限公司 Generation method, device, storage medium and the electronic device of report
CN110365555A (en) * 2019-08-08 2019-10-22 广州虎牙科技有限公司 Audio delay test method, device, electronic equipment and readable storage medium storing program for executing
CN110428853A (en) * 2019-08-30 2019-11-08 北京太极华保科技股份有限公司 Voice activity detection method, Voice activity detection device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5579431A (en) * 1992-10-05 1996-11-26 Panasonic Technologies, Inc. Speech detection in presence of noise by determining variance over time of frequency band limited energy
CN101625857A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Self-adaptive voice endpoint detection method
CN104021789A (en) * 2014-06-25 2014-09-03 厦门大学 Self-adaption endpoint detection method using short-time time-frequency value
CN105467428A (en) * 2015-11-17 2016-04-06 南京航空航天大学 Seismic wave warning method based on short-time energy detection and spectrum feature analysis
US20170004840A1 (en) * 2015-06-30 2017-01-05 Zte Corporation Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof
CN106601230A (en) * 2016-12-19 2017-04-26 苏州金峰物联网技术有限公司 Logistics sorting place name speech recognition method, system and logistics sorting system based on continuous Gaussian mixture HMM

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5579431A (en) * 1992-10-05 1996-11-26 Panasonic Technologies, Inc. Speech detection in presence of noise by determining variance over time of frequency band limited energy
CN101625857A (en) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 Self-adaptive voice endpoint detection method
CN104021789A (en) * 2014-06-25 2014-09-03 厦门大学 Self-adaption endpoint detection method using short-time time-frequency value
US20170004840A1 (en) * 2015-06-30 2017-01-05 Zte Corporation Voice Activity Detection Method and Method Used for Voice Activity Detection and Apparatus Thereof
CN105467428A (en) * 2015-11-17 2016-04-06 南京航空航天大学 Seismic wave warning method based on short-time energy detection and spectrum feature analysis
CN106601230A (en) * 2016-12-19 2017-04-26 苏州金峰物联网技术有限公司 Logistics sorting place name speech recognition method, system and logistics sorting system based on continuous Gaussian mixture HMM

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘玉珍等: "基于频谱方差的抗噪声语音端点检测算法", 《计算机仿真》 *
蔡萍: "一种结合短时过零率的快速语音端点检测算法", 《厦门理工学院学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210893A (en) * 2019-05-09 2019-09-06 秒针信息技术有限公司 Generation method, device, storage medium and the electronic device of report
CN110365555A (en) * 2019-08-08 2019-10-22 广州虎牙科技有限公司 Audio delay test method, device, electronic equipment and readable storage medium storing program for executing
CN110428853A (en) * 2019-08-30 2019-11-08 北京太极华保科技股份有限公司 Voice activity detection method, Voice activity detection device and electronic equipment

Also Published As

Publication number Publication date
CN109377982B (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN103280220B (en) A kind of real-time recognition method for baby cry
CN104900235B (en) Method for recognizing sound-groove based on pitch period composite character parameter
CN103310789B (en) A kind of sound event recognition method of the parallel model combination based on improving
CN102968990B (en) Speaker identifying method and system
CN110232933B (en) Audio detection method and device, storage medium and electronic equipment
AU2017341161A1 (en) Voiceprint recognition method, device, storage medium and background server
CN109065043B (en) Command word recognition method and computer storage medium
CN104021789A (en) Self-adaption endpoint detection method using short-time time-frequency value
CN109377982A (en) A kind of efficient voice acquisition methods
CN109243497A (en) The control method and device that voice wakes up
CN110600048B (en) Audio verification method and device, storage medium and electronic equipment
CN109524011A (en) A kind of refrigerator awakening method and device based on Application on Voiceprint Recognition
CN102789779A (en) Speech recognition system and recognition method thereof
CN110428853A (en) Voice activity detection method, Voice activity detection device and electronic equipment
CN110797031A (en) Voice change detection method, system, mobile terminal and storage medium
CN110299141A (en) The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition
CN110570870A (en) Text-independent voiceprint recognition method, device and equipment
CN106548786A (en) A kind of detection method and system of voice data
CN110890087A (en) Voice recognition method and device based on cosine similarity
US10522160B2 (en) Methods and apparatus to identify a source of speech captured at a wearable electronic device
CN109920447B (en) Recording fraud detection method based on adaptive filter amplitude phase characteristic extraction
CN110689887A (en) Audio verification method and device, storage medium and electronic equipment
CN109545226A (en) A kind of audio recognition method, equipment and computer readable storage medium
CN110070891B (en) Song identification method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant