CN109389993A

CN109389993A - A kind of data under voice method, apparatus, equipment and storage medium

Info

Publication number: CN109389993A
Application number: CN201811533747.8A
Authority: CN
Inventors: 舒涛; 段旭恒
Original assignee: Guangzhou Speakin Network Technology Co Ltd
Current assignee: Guangzhou Speakin Network Technology Co Ltd
Priority date: 2018-12-14
Filing date: 2018-12-14
Publication date: 2019-02-26

Abstract

The invention discloses a kind of data under voice methods, method includes the following steps: receiving the data under voice triggering command of user；Acquire the tone testing data of user；Voice activity detection is carried out to tone testing data, obtains the testing result of each voice metrics；Adjust the corresponding parameter threshold of the unacceptable voice metrics of testing result；The step of repeating the tone testing data of acquisition user, until saving parameter threshold adjusted when the testing result of each voice metrics passes through；During acquiring the voice sample data of user, voice activity detection is carried out to each voice sample data using parameter threshold adjusted.Using technical solution provided by the embodiment of the present invention, the best recording result of terminal is reached by adjusting parameter threshold, guarantees that effective voice sample data can be collected by different terminals.The invention also discloses a kind of data under voice device, equipment and storage mediums, have relevant art effect.

Description

A kind of data under voice method, apparatus, equipment and storage medium

Technical field

The present invention relates to data acquisition technology field, more particularly to a kind of data under voice method, apparatus, equipment and Storage medium.

Background technique

With the raising of the degree of concern to information security, sound groove recognition technology in e is had gradually developed.In voice print database Process of construction in, in the training process of Application on Voiceprint Recognition model, require to acquire a large amount of voice sample data.

Currently, being mostly the acquisition for carrying out voice sample data by terminals such as mobile phone, tablet computer, laptops.For Effective voice sample data can be collected, screening and Weeding after acquisition to invalid voice is avoided, is adopting Will increase voice activity detection, i.e. VAD during collection voice sample data, full name is Voice Activity Detection, Also known as speech terminals detection, speech endpoint detection refer to the presence or absence that voice is detected in noise circumstance.Difference is mobile eventually End uses identical voice activity detection standard.

But because the microphone sound quality of different terminals is different, the audio frequency effect for recording out is also different, using same Voice activity detection standard detects voice sample data, for the terminal having, may be such that a large amount of speech samples Data are removed, and are increased collecting work amount, are influenced collecting efficiency.

Summary of the invention

The object of the present invention is to provide a kind of data under voice method, apparatus, equipment and storage mediums, to reduce acquisition Workload improves collecting efficiency.

In order to solve the above technical problems, the invention provides the following technical scheme:

A kind of data under voice method, comprising:

Receive the data under voice triggering command of user；

Acquire the tone testing data of the user；

Voice activity detection is carried out to the tone testing data, obtains the testing result of each voice metrics；

Adjust the corresponding parameter threshold of the unacceptable voice metrics of testing result；

The step of repeating the tone testing data of the acquisition user, until the detection knot of each voice metrics When fruit passes through, parameter threshold adjusted is saved；

During acquiring the voice sample data of the user, using parameter threshold adjusted to each voice sample Notebook data carries out voice activity detection.

In a kind of specific embodiment of the invention, voice activity detection is carried out to the tone testing data described Before, further includes:

Determine whether the corresponding parameter threshold of each voice metrics is default value；

If it is not, then the corresponding parameter threshold of each voice metrics is adjusted to default value.

In a kind of specific embodiment of the invention, it is described receive user data under voice triggering command it Afterwards, before the tone testing data of the acquisition user, further includes:

Export test prompts information.

In a kind of specific embodiment of the invention, the voice metrics include energy value, signal-to-noise ratio and effective voice.

In a kind of specific embodiment of the invention, the corresponding ginseng of the unacceptable voice metrics of the adjustment testing result Number threshold value, comprising:

If the unacceptable voice metrics of testing result are energy value, reduce energy parameter threshold value.

If the unacceptable voice metrics of testing result are signal-to-noise ratio, reduce signal-to-noise ratio parameter threshold.

If the unacceptable voice metrics of testing result are effective voice, reduce effective voice parameter threshold.

A kind of data under voice device, comprising:

Triggering command receiving module, for receiving the data under voice triggering command of user；

Test data collection module, for acquiring the tone testing data of the user；

Testing result obtains module, for carrying out voice activity detection to the tone testing data, obtains each voice The testing result of index；

Parameter threshold adjusts module, for adjusting the corresponding parameter threshold of the unacceptable voice metrics of testing result；

Module is repeated, for the step of repeating the tone testing data of the acquisition user, until each When the testing result of a voice metrics passes through, trigger parameter threshold value preserving module；

The parameter threshold preserving module, for saving parameter threshold adjusted；

Sample data acquisition module, for during acquiring the voice sample data of the user, after adjustment Parameter threshold to each voice sample data carry out voice activity detection.

A kind of data under voice equipment, comprising:

Memory, for storing computer program；

Processor realizes the step of data under voice method described in any of the above-described when for executing the computer program Suddenly.

A kind of computer readable storage medium is stored with computer program on the computer readable storage medium, described The step of data under voice method described in any of the above-described is realized when computer program is executed by processor.

Using technical solution provided by the embodiment of the present invention, in the data under voice triggering command for receiving user When, can tone testing data to collected user carry out voice activity detection, obtain the detection knot of each voice metrics Fruit, and the corresponding parameter threshold of the unacceptable voice metrics of testing result is adjusted, acquisition, detecting step are repeated, until each When the testing result of a voice metrics passes through, parameter threshold adjusted is saved, is acquiring the voice sample data of user In the process, voice activity detection is carried out to each voice sample data using parameter threshold adjusted.By adjusting parameter threshold Value reaches the best recording result of terminal, guarantees that effective voice sample data can be collected by different terminals.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of implementation flow chart of data under voice method in the embodiment of the present invention；

Fig. 2 is a kind of structural schematic diagram of data under voice device in the embodiment of the present invention；

Fig. 3 is a kind of structural schematic diagram of data under voice equipment in the embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than Whole embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not making creative work premise Under every other embodiment obtained, shall fall within the protection scope of the present invention.

Core of the invention is to provide a kind of data under voice method, this method can be applied to mobile phone, tablet computer, The terminals such as desktop computer, laptop, terminal can acquire user in the data under voice triggering command for receiving user Tone testing data, to tone testing data carry out voice activity detection, by testing result adjust relevant parameter threshold value, and During acquiring the voice sample data of user, it is living that voice is carried out to voice sample data using parameter threshold adjusted Dynamic detection, the best recording result of terminal is reached by adjusting parameter threshold, guarantees to collect effectively by different terminals Voice sample data.

Specifically, terminal can carry out above-mentioned processing operation by the parameter threshold to voice activity detection automatically, can also lead to The a certain application program crossed in terminal carries out above-mentioned processing operation, for example carries out above-mentioned processing operation by recording application program. For convenience of description, the explanation of step each in the embodiment of the present invention is accordingly described in terminal point.

It is shown in Figure 1, it is a kind of implementation flow chart of data under voice method provided by the embodiment of the present invention, it should Method may comprise steps of:

S110: the data under voice triggering command of user is received.

In practical applications, user can carry out the recording of voice data by terminal.When detecting that user enters voice When activity detection parameter configures the page, the data under voice triggering command for receiving user can be considered as, show that user will The recording of using terminal progress voice data.

After the data under voice triggering command for receiving user, the operation of step S120 can be continued to execute.

S120: the tone testing data of user are acquired.

After receiving the data under voice triggering command of user, the tone testing data of user can be acquired.

Specifically, test prompts information can be exported first, prompt user with normal word speed, sound size, quiet One section of word is read in environment.After user completes voice data recording, using the voice data as tone testing data.

S130: voice activity detection is carried out to tone testing data, obtains the testing result of each voice metrics.

After the tone testing data for acquiring user, tone testing data can be carried out with voice activity detection, it is specific to detect Method is the prior art, and the embodiment of the present invention repeats no more this.

The voice metrics of voice activity detection include energy value, signal-to-noise ratio and effective voice.

Energy value is the sound size quantified；

Signal-to-noise ratio is the energy ratio for measuring voice signal and noise signal in a segment of audio, the i.e. clean level of voice；

Effective voice is the phonological component spoken in a segment of audio comprising people.

After carrying out voice activity detection to tone testing data, the testing result of each voice metrics can be obtained.Detection It as a result include passing through or not passing through.

S140: the corresponding parameter threshold of the adjustment unacceptable voice metrics of testing result.

After carrying out voice activity detection to tone testing data, the testing result of each voice metrics can get.For every A voice metrics show the corresponding parameter threshold setting of the voice metrics if the testing result of the voice metrics is to pass through It is relatively reasonable, if the testing result of the voice metrics is not pass through, the corresponding parameter threshold of the adjustable voice metrics. The initial value of the corresponding parameter threshold of each voice metrics can be the preset value with higher evaluation criteria.

Specifically, can reduce energy parameter threshold value if the unacceptable voice metrics of testing result are energy value.

Tone testing data are one section that collected user is read with normal word speed, sound size in quiet environment Words, if determining that the corresponding voice metrics detection of energy value does not pass through by voice activity detection, then showing the voice metrics pair The energy parameter threshold value answered may be arranged larger, can reduce energy parameter threshold value.Specifically, prompt information can be exported, Prompt user manually adjusts, or is spaced automatic reduce according to setting.If present energy parameter threshold is 10,2 are divided between setting, Then energy parameter threshold can be reduced to 8.

If the unacceptable voice metrics of testing result are signal-to-noise ratio, signal-to-noise ratio parameter threshold can reduce.

As described, tone testing data be collected user with normal word speed, sound size quiet One section of word that environment is read determine the corresponding voice metrics inspection of signal-to-noise ratio if carrying out voice activity detection to tone testing data Survey does not pass through, then it is larger to show that the corresponding signal-to-noise ratio parameter threshold of the voice metrics may be arranged, and can reduce signal-to-noise ratio ginseng Number threshold value.Specifically, prompt information can be exported, user is prompted to manually adjust, or is spaced automatic reduce according to setting.Such as work as Preceding signal-to-noise ratio parameter threshold is 10, and 2 are divided between setting, then signal-to-noise ratio parameter threshold can be reduced to 8.

If the unacceptable voice metrics of testing result are effective voice, effective voice parameter threshold can reduce.

As described, tone testing data be collected user with normal word speed, sound size quiet One section of word that environment is read determine the corresponding voice metrics of effective voice if carrying out voice activity detection to tone testing data Detection does not pass through, then it is larger to show that the corresponding effective voice parameter threshold of the voice metrics may be arranged, and can reduce effectively Voice parameter threshold.Specifically, prompt information can be exported, user is prompted to manually adjust, or subtracted automatically according to setting interval It is small.If the currently active voice parameter threshold is 10 seconds, it is divided between setting 2 seconds, then effective voice parameter threshold can be reduced to 8 Second.

After adjusting the corresponding parameter threshold of the unacceptable voice metrics of testing result, can repeat step S120 and its It operates below.

S150: when the testing result of each voice metrics passes through, parameter threshold adjusted is saved.

Voice activity detection is carried out to tone testing data, the testing result of each voice metrics is obtained, if there is inspection The unacceptable voice metrics of result to be surveyed, then adjust the corresponding parameter threshold of the voice metrics, repeated acquisition, detection operate, if The testing result of each voice metrics passes through, then can save the parameter threshold after final adjustment.

S160: during acquiring the voice sample data of user, using parameter threshold adjusted to each voice Sample data carries out voice activity detection.

By the operation of above-mentioned several steps, the corresponding parameter threshold of each voice metrics of voice activity detection is carried out After corresponding adjustment, the testing result of each voice metrics can be made equal when carrying out voice activity detection to tone testing data Pass through.

During acquiring the voice sample data of user, parameter threshold adjusted can be used to each voice sample Notebook data carries out voice activity detection, to screen and reject invalid voice data.

It can in the data under voice triggering command for receiving user using method provided by the embodiment of the present invention Voice activity detection is carried out with the tone testing data to collected user, obtains the testing result of each voice metrics, and The corresponding parameter threshold of the unacceptable voice metrics of testing result is adjusted, acquisition, detecting step are repeated, until each voice When the testing result of index passes through, parameter threshold adjusted is saved, during acquiring the voice sample data of user, Voice activity detection is carried out to each voice sample data using parameter threshold adjusted.Reach eventually by adjusting parameter threshold The best recording result at end guarantees that effective voice sample data can be collected by different terminals.

In one embodiment of the invention, before carrying out voice activity detection to tone testing data, this method is also It may comprise steps of:

In embodiments of the present invention, in the data under voice triggering command for receiving user, voice can be first determined Whether the corresponding parameter threshold of each voice metrics of activity detection is default value, which can be based on to compared with multiple terminals Microphone location detection setting an empirical value, can also be it is corresponding with present terminal model it is preset one warp Test value.

If it is, can directly be executed, if it is not, then can the step of carrying out voice activity detection to tone testing data The corresponding parameter threshold of each voice metrics is first adjusted to default value, then executes and speech activity is carried out to tone testing data The step of detection.Percent of pass is detected to improve.

The embodiment of the present invention can carry out the adjustment of voice activity detection parameter threshold to different terminals, and end is effectively ensured The voice for recording out is held to meet acquisition standard.

Corresponding to above method embodiment, the embodiment of the invention also provides a kind of data under voice devices, hereafter A kind of data under voice device of description can correspond to each other reference with a kind of above-described data under voice method.

Shown in Figure 2, which comprises the following modules:

Triggering command receiving module 210, for receiving the data under voice triggering command of user；

Test data collection module 220, for acquiring the tone testing data of user；

Testing result obtains module 230, for carrying out voice activity detection to tone testing data, obtains each voice and refers to Target testing result；

Parameter threshold adjusts module 240, for adjusting the corresponding parameter threshold of the unacceptable voice metrics of testing result；

Module 250 is repeated, for the step of repeating the tone testing data of acquisition user, until each voice When the testing result of index passes through, trigger parameter threshold value preserving module 260；

Parameter threshold preserving module 260, for saving parameter threshold adjusted；

Sample data acquisition module 270, for during acquiring the voice sample data of user, use to be adjusted Parameter threshold carries out voice activity detection to each voice sample data.

It can in the data under voice triggering command for receiving user using device provided by the embodiment of the present invention Voice activity detection is carried out with the tone testing data to collected user, obtains the testing result of each voice metrics, and The corresponding parameter threshold of the unacceptable voice metrics of testing result is adjusted, acquisition, detecting step are repeated, until each voice When the testing result of index passes through, parameter threshold adjusted is saved, during acquiring the voice sample data of user, Voice activity detection is carried out to each voice sample data using parameter threshold adjusted.Reach eventually by adjusting parameter threshold The best recording result at end guarantees that effective voice sample data can be collected by different terminals.

In a kind of specific embodiment of the invention, further includes default value setup module, is used for:

Before carrying out voice activity detection to tone testing data, determine that the corresponding parameter threshold of each voice metrics is No is default value；

In a kind of specific embodiment of the invention, further includes prompt information output module, is used for:

After the data under voice triggering command for receiving user, before the tone testing data of acquisition user, output Test prompts information.

In a kind of specific embodiment of the invention, voice metrics include energy value, signal-to-noise ratio and effective voice.

In a kind of specific embodiment of the invention, parameter threshold adjusts module 240, is specifically used for:

Corresponding to above method embodiment, the embodiment of the invention also provides a kind of data under voice equipment, such as Fig. 3 It is shown, comprising:

Memory 310, for storing computer program；

Processor 320, the step of above-mentioned data under voice method is realized when for executing computer program.

Corresponding to above method embodiment, the embodiment of the invention also provides a kind of computer readable storage medium, meters It is stored with computer program on calculation machine readable storage medium storing program for executing, realizes that above-mentioned voice data is adopted when computer program is executed by processor The step of set method.

Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with it is other The difference of embodiment, same or similar part may refer to each other between each embodiment.

Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered Think beyond the scope of this invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.

Used herein a specific example illustrates the principle and implementation of the invention, and above embodiments are said It is bright to be merely used to help understand technical solution of the present invention and its core concept.It should be pointed out that for the common of the art , without departing from the principle of the present invention, can be with several improvements and modifications are made to the present invention for technical staff, these Improvement and modification are also fallen within the protection scope of the claims of the present invention.

Claims

1. a kind of data under voice method characterized by comprising

Receive the data under voice triggering command of user；

Acquire the tone testing data of the user；

The step of repeating the tone testing data of the acquisition user, until the testing result of each voice metrics is equal By when, save parameter threshold adjusted；

During acquiring the voice sample data of the user, using parameter threshold adjusted to each speech samples number According to progress voice activity detection.

2. the method according to claim 1, wherein carrying out speech activity to the tone testing data described Before detection, further includes:

3. the method according to claim 1, wherein in the data under voice triggering command for receiving user Later, before the tone testing data of the acquisition user, further includes:

Export test prompts information.

4. according to claim 1 to described in any item methods among 3, which is characterized in that the voice metrics include energy value, Signal-to-noise ratio and effective voice.

5. according to the method described in claim 4, it is characterized in that, the unacceptable voice metrics of the adjustment testing result are corresponding Parameter threshold, comprising:

6. according to the method described in claim 4, it is characterized in that, the unacceptable voice metrics of the adjustment testing result are corresponding Parameter threshold, comprising:

7. according to the method described in claim 4, it is characterized in that, the unacceptable voice metrics of the adjustment testing result are corresponding Parameter threshold, comprising:

8. a kind of data under voice device characterized by comprising

Test data collection module, for acquiring the tone testing data of the user；

Testing result obtains module, for carrying out voice activity detection to the tone testing data, obtains each voice metrics Testing result；

Module is repeated, for the step of repeating the tone testing data of the acquisition user, until each language When the testing result of sound index passes through, trigger parameter threshold value preserving module；

Sample data acquisition module, for using ginseng adjusted during acquiring the voice sample data of the user Number threshold value carries out voice activity detection to each voice sample data.

9. a kind of data under voice equipment characterized by comprising

Memory, for storing computer program；

Processor realizes the data under voice side as described in any one of claim 1 to 7 when for executing the computer program The step of method.

10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program realizes the data under voice method as described in any one of claim 1 to 7 when the computer program is executed by processor The step of.