CN108460081A

CN108460081A - Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium

Info

Publication number: CN108460081A
Application number: CN201810031164.9A
Authority: CN
Inventors: 张丝潆; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2018-08-28
Anticipated expiration: 2038-01-12
Also published as: WO2019136801A1; CN108460081B

Abstract

The invention discloses a kind of voice data base establishing method, voiceprint registration method, apparatus, equipment and media.The voice data base establishing method includes：Primary voice data is obtained, the primary voice data includes original user mark and voice collecting time；Primary voice data is pre-processed, efficient voice data are obtained；Obtain the corresponding signal-to-noise ratio of the efficient voice data；Efficient voice data are stored in speech database, and index is established for the efficient voice data in speech database, index includes original user mark, voice collecting time and signal-to-noise ratio.By the signal-to-noise ratio of pretreatment, calculating efficient voice data to primary voice data and after creating speech database, foundation includes user identifier, the index of voice collecting time and signal-to-noise ratio to the voice data base establishing method, improves database data treatment effeciency.

Description

Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium

Technical field

The present invention relates to data processing field more particularly to a kind of voice data base establishing method, voiceprint registration method, dresses It sets, equipment and medium.

Background technology

With the development of artificial intelligence technology, face, voice and fingerprint etc. are gradually applied with the relevant technology of characteristics of human body In real life.Vocal print is the sound wave spectrum for the carrying verbal information that electricity consumption acoustic instrument is shown, with specificity and relatively The characteristics of stability.The generation of human language is a complicated physiology physics mistake between Body Languages maincenter and vocal organs Journey, phonatory organ -- the difference of tongue, tooth, larynx, lung, nasal cavity in terms of size and form that everyone uses in speech Very big, the voiceprint map of any two people is all variant, therefore can be verified to the identity of user by vocal print.In vocal print It is typically all using real-time recording voice data and to carry out vocal print that identification process, which needs vocal print registered in advance, current voiceprint registration process, The mode of extraction is registered.The consumption long period is required to from recorded speech data to voiceprint extraction, this causes entirely to note Volume during take it is longer, registration it is less efficient.Moreover, when registering vocal print using real-time recording voice data, because when recording Ambient condition and user's body health status so that the language acquired when for extracting the recorded speech data of vocal print with other There are larger differences for sound data, to influence accuracy of the vocal print of real-time recording voice data extraction in Application on Voiceprint Recognition.

Invention content

A kind of voice data base establishing method of offer of the embodiment of the present invention, device, equipment and medium, to solve at database Manage less efficient problem.

A kind of voiceprint registration method, apparatus of offer of the embodiment of the present invention, equipment and medium, to solve vocal print feature accuracy Not high problem.

In a first aspect, the embodiment of the present invention provides a kind of voice data base establishing method, including：

Primary voice data is obtained, the primary voice data includes original user mark and voice collecting time；

The primary voice data is pre-processed, efficient voice data are obtained；

Obtain the corresponding signal-to-noise ratio of the efficient voice data；

The efficient voice data are stored in speech database, and are effective language in the speech database Sound data establish index, and the index includes original user mark, voice collecting time and signal-to-noise ratio.

Second aspect, the embodiment of the present invention provide a kind of speech database creating device, including：

Primary voice data acquisition module, for obtaining primary voice data, the primary voice data includes original use Family identifies and the voice collecting time；

Data preprocessing module obtains efficient voice data for being pre-processed to the primary voice data；

Signal-to-noise ratio acquisition module, for obtaining the corresponding signal-to-noise ratio of the efficient voice data；

Speech database index establishes module, for the efficient voice data to be stored in speech database, and is The efficient voice data in the speech database establish index, and the index includes original user mark, voice collecting Time and signal-to-noise ratio.

The third aspect, the embodiment of the present invention provide a kind of voiceprint registration method, including：

Voiceprint registration request is obtained, the voiceprint registration request includes registration user identifier and current time；

Based on the registration user identifier voice inquirement database, obtain corresponding original with the registration user identifier The corresponding target index of user identifier, the speech database are using the voice data base establishing method wound described in first aspect The speech database built；

According to voice collecting time, signal-to-noise ratio and the current time that the target indexes, each target is obtained Index corresponding composite index；

It chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data；

Based on the registration voice data, corresponding vocal print feature is obtained as registration vocal print.

Fourth aspect, the embodiment of the present invention provide a kind of voiceprint registration device, including：

Voiceprint registration acquisition request module, for obtaining voiceprint registration request, the voiceprint registration request includes that registration is used Family identifies and current time；

Target index obtains certain block, for being based on the registration user identifier voice inquirement database, obtains and the note The corresponding original user of volume user identifier identifies corresponding target index, and the speech database is using described in first aspect Voice data base establishing method create speech database；

Composite index acquisition module, voice collecting time, signal-to-noise ratio for index according to the target and it is described currently Time obtains each target and indexes corresponding composite index；

Voice data acquisition module is registered, corresponding efficient voice number is indexed for choosing the highest target of composite index According to as registration voice data；

Vocal print acquisition module is registered, for being based on the registration voice data, obtains corresponding vocal print feature as registration Vocal print.

Fifth aspect present invention provides a kind of terminal device, including memory, processor and is stored in the memory In and the computer program that can run on the processor, the processor realize such as this hair when executing the computer program Described in bright first aspect the step of voice data base establishing method；Alternatively, reality when the processor executes the computer program Now as described in third aspect present invention the step of voice data base establishing method.

Sixth aspect present invention provides a kind of computer readable storage medium, and the computer-readable recording medium storage has Computer program realizes that speech database creates as described in the first aspect of the invention when the computer program is executed by processor The step of method；Alternatively, the processor realizes the voice number as described in third aspect present invention when executing the computer program The step of according to base establishing method.

In voice data base establishing method provided in an embodiment of the present invention, device, equipment and storage medium, pass through obtain it is former Beginning voice data provides data source to create speech database.Primary voice data is pre-processed again, to obtain effectively Voice data saves data processing time to improve subsequent treatment effeciency.The corresponding signal-to-noise ratio of efficient voice data is obtained, By the signal-to-noise ratio, the noise level of efficient voice data can be intuitively judged, to know the language of efficient voice data Sound quality.Finally efficient voice data are stored in speech database, and are built for the efficient voice data in speech database Lithol draws, and index includes original user mark, voice collecting time and signal-to-noise ratio.The voice data base establishing method passes through to original The pretreatment of beginning voice data, the signal-to-noise ratio for calculating efficient voice data and to be established after creating speech database include use Family mark, the index of voice collecting time and signal-to-noise ratio, improve database data treatment effeciency, also increase vocal print feature Accuracy.Further, it is also possible to which the follow-up voiceprint registration stage is facilitated quickly to navigate to suitable efficient voice data.Pass through voice number According to reasonable setting of library during establishment, the accuracy of the vocal print feature extraction in follow-up voiceprint registration stage is improved, is reduced The registion time of voiceprint registration.

In voiceprint registration method, apparatus provided in an embodiment of the present invention, equipment and storage medium, which adopts The speech database that the voice data base establishing method provided with first aspect present invention creates carries out voiceprint registration, improves sound The accuracy of line registration phase vocal print feature extraction, the registion time for reducing voiceprint registration.Mesh is based on during voiceprint registration Mark indexes to obtain the composite index of corresponding effective voice data, in favor of quickly navigating to suitable efficient voice data, with Guarantee extracts the vocal print feature the most identical with user, further improves the accuracy of voiceprint registration.

Description of the drawings

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the present invention Example, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these attached drawings Obtain other attached drawings.

Fig. 1 is a flow chart of the voice data base establishing method provided in the embodiment of the present invention 1；

Fig. 2 is a flow chart of a specific implementation mode of step S12 in Fig. 1；

Fig. 3 is a flow chart of another specific implementation mode of step S12 in Fig. 1；

Fig. 4 is a functional block diagram of the speech database creating device provided in the embodiment of the present invention 2；

Fig. 5 is a flow chart of the voiceprint registration method provided in the embodiment of the present invention 3；

Fig. 6 is a flow chart of a specific implementation mode in the embodiment of the present invention 3；

Fig. 7 is a functional block diagram of the voiceprint registration device provided in the embodiment of the present invention 4；

Fig. 8 is a schematic diagram of the terminal device provided in the embodiment of the present invention 6.

Specific implementation mode

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts Example, shall fall within the protection scope of the present invention.

Embodiment 1

Fig. 1 shows the flow chart of voice data base establishing method in the present embodiment.The voice data base establishing method application In various terminal equipment or server, for creating speech database, to solve the problems, such as that database processing is less efficient.Such as Shown in Fig. 1, which includes the following steps：

S11：Primary voice data is obtained, primary voice data includes original user mark and voice collecting time.

Wherein, primary voice data refers to untreated voice data after acquisition.Original user mark is for distinguishing The mark of different user, a corresponding unique subscriber of original user mark.In a specific embodiment, original user mark Knowledge can be subscriber phone number, user account or identification card number etc..The voice collecting time refers to what primary voice data acquired Time.

Preferably, primary voice data can be obtained from the database that acquisition has a large number of users voice data.For example, portion Point enterprise can set up customer service hotline, and user solves it using the enterprise by dialing this customer service hotline The problem of being encountered during product or service, enterprise also can carry out product promotion by this customer service hotline to client Or pay a return visit etc..Usually, enterprise can record to above-mentioned call, and the voice data of recording is stored in a database In.Alternatively, in some application programs, when carrying out interactive voice between user or between user and customer service, the number of application program The voice data of user can be stored with according to library.

S12：Primary voice data is pre-processed, efficient voice data are obtained.

Primary voice data is untreated data after acquisition, therefore may include in primary voice data In vain, the voice data of redundancy.For example, voice duration does not reach requirement in primary voice data, primary voice data includes It is not belonging to the voice data of user, it is invalid, redundancy language that the voice quality of primary voice data is undesirable etc. Sound data.Alternatively, can have the speech period of partial invalidity or redundancy in a primary voice data, this partial redundance or The presence of invalid speech period can bring deleterious effect to subsequent language data process process, therefore need to remove this part Redundancy or invalid speech period, wherein speech period is the part in primary voice data.By to primary voice data It is pre-processed, to obtain efficient voice data, to improve the treatment effeciency of subsequent voice data, to save the time.

S13：Obtain the corresponding signal-to-noise ratio of efficient voice data.

Signal-to-noise ratio (signal-to-noise ratio, SNR) is the ratio for describing active ingredient and noise contribution in signal Relation Parameters.Signal-to-noise ratio is higher to illustrate that noise is relatively fewer, by obtaining the signal-to-noise ratio of efficient voice data, can intuitively sentence Break and the size of noise in efficient voice data, to know the voice quality of efficient voice data.Specifically, meter can be passed through The mode of calculation obtains the corresponding signal-to-noise ratio of efficient voice data.

When obtaining the corresponding signal-to-noise ratio of efficient voice data by the way of calculating, the calculation formula of signal-to-noise ratio can be with For：SNR=10Lg (P_S/P_N), wherein P_SAnd P_NRespectively represent active ingredient and the effective power of noise contribution.Optionally, It can also be converted into the ratio of voltage amplitude, i.e. the calculation formula of signal-to-noise ratio can also be expressed as：SNR=20Lg (V_S/ V_N), wherein V_SAnd V_NRespectively represent the virtual value of active ingredient voltage and noise contribution voltage.

In one embodiment, the corresponding signal-to-noise ratio of efficient voice data is obtained, following steps are specifically included：

First, the fundamental tone data in efficient voice data are extracted using Pitch-Synchronous OLA algorithm.Fundamental tone data are effective language Normal voice data in sound data, and noise data are opposite.Preferably, may be used spectrum-subtraction, Wiener Filter Method or Minimum Mean-Square Error Short-Time Spectral Estimation method extracts fundamental tone data from voice data.

Then, according to the noise data in fundamental tone data acquisition efficient voice data.Fundamental tone number is extracted from voice data According to rear, the voice data of remaining part is exactly the noise data in voice data.

Finally, the signal-to-noise ratio of voice data is calculated according to fundamental tone data and noise data.It is obtained from efficient voice data After the fundamental tone data and the noise data that obtain efficient voice data, you can calculate efficient voice according to fundamental tone data and noise data The signal-to-noise ratio of data.Specifically, can first calculate fundamental tone data and noise data effective power or calculate fundamental tone data and The voltage magnitude of noise data, then the ratio of the two is calculated, to obtain the signal-to-noise ratio of efficient voice data.

In a specific embodiment, after the step of obtaining efficient voice data corresponding signal-to-noise ratio, further include： Remove the efficient voice data that signal-to-noise ratio is less than snr threshold.

After the signal-to-noise ratio for getting efficient voice data, can the efficient voice data too low to signal-to-noise ratio go Except processing, to reduce data volume, play the role of the pressure for alleviating data processing and storage.Specifically, a letter can be set It makes an uproar than threshold value, when the signal-to-noise ratio of efficient voice data is less than this snr threshold, illustrates making an uproar for this section of efficient voice data Sound is very high, therefore this section of efficient voice data are not suitable as a voice data for being used for carrying out voiceprint extraction.Pass through The efficient voice data that signal-to-noise ratio is less than snr threshold are removed, data volume is reduced, to alleviate the pressure of data processing and storage, Subsequent data processing time can also be reduced, treatment effeciency is improved.

S14：Efficient voice data are stored in speech database, and are built for the efficient voice data in speech database Lithol draws, and index includes original user mark, voice collecting time and signal-to-noise ratio.

Wherein, speech database is the database for storing effective voice data.To through and pretreatment and calculate noise It is stored in speech database than efficient voice data later, and index is established for every section of efficient voice data, after raising The continuous efficiency that data processing is carried out using the speech database.Moreover, can by way of search index in voiceprint registration To be directly targeted to suitable efficient voice data, and go out vocal print feature from corresponding efficient voice extracting data, can be improved The accuracy of vocal print feature.

Specifically, index includes original user mark, voice collecting time and signal-to-noise ratio.Original user is identified for distinguishing The efficient voice data of different user.The voice collecting time represents the recording time of voice, in general, the sound meeting of user Small variation is had with the migration of time.The voice collecting time is closer from current time, then represents this section of efficient voice number According to being more close with the voice that user currently records, to which vocal print feature is also more identical.And it can be straight by signal-to-noise ratio The noise level that efficient voice data are judged on ground is seen, to know the voice quality of efficient voice data.

In one embodiment, be the index that efficient voice data in speech database are established it is brin indexes.

Brin indexes store the consecutive data block section of table and corresponding data value range, are existed using brin indexes Saving on system space has big advantage, needs to store the corresponding efficient voice of a large amount of original users marks in speech database Data, it is more demanding to the memory space of database, by using brin indexes, a large amount of index space can be saved.

Therefore, the foundation indexed by speech database, improves database processing efficiency, also increases vocal print feature Accuracy.Moreover, in the voiceprint registration stage, the mark of the original user in index, voice collecting time and signal-to-noise ratio can be passed through Three considers, in favor of quickly navigating to most suitable efficient voice data, and according to the vocal print of the efficient voice data Feature is registered, and is greatly reduced and is formed the time of vocal print feature in voiceprint registration stage, and is most suitable by selecting Efficient voice data, the accuracy of the voiceprint registration also improved.

In voice data base establishing method provided in an embodiment of the present invention, by obtaining primary voice data, to create language Sound database provides data source.Primary voice data is pre-processed again, it is follow-up to improve to obtain efficient voice data Treatment effeciency, save data processing time.The corresponding signal-to-noise ratio of efficient voice data is obtained, the signal-to-noise ratio, Ke Yizhi are passed through The noise level that efficient voice data are judged on ground is seen, to know the voice quality of efficient voice data.Finally by effective language Sound data are stored in speech database, and establish index for the efficient voice data in speech database, and index includes original User identifier, voice collecting time and signal-to-noise ratio.The voice data base establishing method by pretreatment to primary voice data, It includes user identifier, voice collecting time to calculate the signal-to-noise ratio of efficient voice data and established after creating speech database With the index of signal-to-noise ratio, the accuracy for improving database processing efficiency, also increasing vocal print feature.Further, it is also possible to convenient The follow-up voiceprint registration stage quickly navigates to suitable efficient voice data.It is reasonable during establishment by speech database Setting, when improving the accuracy of the vocal print feature extraction in follow-up voiceprint registration stage, greatly reducing the registration of voiceprint registration Between.

In a specific embodiment, primary voice data is pre-processed, obtains efficient voice data, it is specific to wrap Include following steps：Corresponding primary voice data is identified to each original user and is filtered processing and mute removal processing, is obtained Take efficient voice data.

In same original user identifies corresponding primary voice data, it is possible to which there are minorities to be not belonging to the original user The primary voice data (the case where i.e. other people use) of corresponding user is identified, the primary voice data preserves just at this time Be not the voice data that the original user identifies corresponding user, need the removal of this part primary voice data, to avoid There is deviation when subsequently based on primary voice data extraction vocal print feature.

Therefore, corresponding primary voice data is identified to each original user and is filtered processing, be from raw tone It is found out in data and is not belonging to the primary voice data that the original user identifies corresponding user, and by the original language in this part Sound data remove.Specifically, clustering algorithm may be used or comparison and matched mode are not belonging to user's sheet to find out one by one The primary voice data of people.

In one section of primary voice data, it is possible to there can be voice data in partial period and be in the mute stage, such as Waiting period in communication process.The corresponding voice data of this partial period belongs to invalid or redundancy voice data, needs Carry out mute removal processing.

Preferably, voiced activity detection (VAD, Voice Activity Detection) may be used to raw tone number According to being detected, to identify that phonological component and non-speech portion, non-speech portion are mute part, mute part is gone It removes, mute primary voice data is removed to obtain.

Voiced activity detection, whether the purpose is to detect comprising voice signal presence in current speech signal, i.e., to input Voice data is judged, the voice signal in voice data is distinguished with various ambient noise signals, respectively to two kinds Signal uses different processing methods.By voiced activity detection, identify phonological component in one section of primary voice data and Mute part, and mute part is removed, remove mute primary voice data to obtain.

It is to be appreciated that identifying corresponding primary voice data to each original user is filtered processing and mute removal The execution sequence of processing can be replaced, and passing through filtration treatment and mute removal, treated that voice data is known as effective language Sound data.It can first carry out carrying out mute removal again after filtration treatment, can also first carry out carrying out again after mute removal Filtration treatment.

In this embodiment, the corresponding user's sheet of original user mark is not belonging in primary voice data by removing The data of people improve the accuracy of the data stored in speech database.And primary voice data is carried out at mute removal After reason, the processing time of follow-up data processing is reduced, treatment effeciency is improved.

In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place Reason, as shown in Fig. 2, specifically including following steps：

S121：Extract the vocal print feature that same original user identifies corresponding primary voice data；

It is identified based on original user, identifying corresponding primary voice data progress vocal print feature to same original user carries It takes.Vocal print feature refers to the essential characteristic of characterization people in primary voice data, such as the frequency bandwidth of the profile of fundamental tone, formant And track, spectrum envelop parameter, auditory properties parameter, linear prediction are washed one's face and rinsed one's mouth and its derive from parameter or hybrid parameter etc..Specifically, It can be based on linear predictive coding (LPC, Linear Predictive Coding) or mel cepstrum coefficients (MFCC, Mel Frequency Cepstral Coefficient) carry out vocal print feature extraction.

S122：Based on vocal print feature, same original user is identified into corresponding primary voice data and is clustered using k-means Algorithm carries out clustering, obtains target's center's point.

Wherein, clustering is also known as cluster analysis, it is a kind of statistical analysis side for studying (sample or index) classification problem Method, while being also an important analysis method of data mining.Cluster point is carried out using k-means algorithms to primary voice data Analysis obtains target's center's point.Specifically, the quantity set K values of corresponding primary voice data are identified according to same original user, And set the initial center point of each clustering cluster.After all the points (primary voice data) are all assigned, to this clustering cluster In all the points recalculate (such as calculating average value) and obtain the new central point of the cluster.Then again by way of iteration into The step of row distributing center point and the central point of update clustering cluster, until the central point of clustering cluster varies less, or reach Specified iterations.Using the central point of the clustering cluster corresponding to the most point (primary voice data) of quantity as target's center Point.

S123：Using distance algorithm, calculates same original user and identify in corresponding each primary voice data and target The distance of heart point.

Distance algorithm refers to the algorithm of the similarity measurement between the different samples of estimation.It in one embodiment, can be with Each raw tone number is calculated using manhatton distance, Minkowski Distance, cosine similarity or Euclidean distance scheduling algorithm According at a distance from target's center point.

In one embodiment, each primary voice data and target's center's point are calculated using Euclidean distance algorithm Euclidean distance.

Euclidean distance algorithm refers to the actual distance between two points in m-dimensional space, or the natural length of vector (i.e. should Distance of the point to origin).Any two n-dimensional vector a (X_i1,X_i2,...,X_in) and b (X_j1,X_j2,...,X_jn) Euclidean distance beBased on the vocal print feature of each primary voice data, calculated by Euclidean distance algorithm each The Euclidean distance of primary voice data and target's center's point.

S124：Remove same original user identify it is big at a distance from target's center point in corresponding each primary voice data In the primary voice data of distance threshold.

After clustering, in same original user identifies corresponding primary voice data, belong to same user Primary voice data can cluster near heart point in the target, be at a distance from this part primary voice data and target's center point Very little.And be not belonging to user primary voice data can far from target's center point, i.e., this part primary voice data with The distance of target's center's point is bigger.It therefore, can be by same original user by the way that a rational distance threshold is arranged It identifies and is not belonging to the primary voice data of user in corresponding primary voice data and screens and be removed, to protect Demonstrate,prove the accuracy of data.

In this embodiment, same original user corresponding primary voice data is identified to carry out using clustering algorithm Clustering, and calculate same original user identify corresponding each primary voice data and clustering cluster target's center's point away from From, then remove the primary voice data that distance is more than distance threshold.By the removal of the primary voice data to mistake, ensure that The accuracy of data, while data volume is reduced, also improve data-handling efficiency.

In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place Reason, as shown in figure 3, specifically including following steps：

S121’：Extract the vocal print feature that same original user identifies corresponding primary voice data.

It is identified based on original user, identifying corresponding primary voice data progress vocal print feature to same original user carries It takes.Specifically, linear predictive coding (LPC, Linear Predictive Coding) or mel cepstrum coefficients can be based on (MFCC, Mel Frequency Cepstral Coefficient) carries out the extraction of vocal print feature.

S122’：It will be in the corresponding vocal print feature of each primary voice data and same user identifier in same user identifier The corresponding vocal print feature of remaining primary voice data compare and match one by one, and according to matching result, statistics is each original Voice data it fails to match number.

Wherein, matching result includes successful match and it fails to match two kinds of results.It is corresponding original in same user identifier In voice data, when there is the primary voice data for being not belonging to user, the vocal print feature of the part primary voice data Vocal print feature with the primary voice data for belonging to user is unmatched (i.e. it fails to match).Therefore, by will be same Remaining primary voice data pair in each corresponding vocal print feature of primary voice data and same user identifier in user identifier The vocal print feature answered compare and match one by one, wherein being not belonging to the primary voice data of user and belonging to user Primary voice data carry out vocal print feature comparison when, matching result will be that it fails to match.

S123’：When one section of primary voice data it fails to match number is more than matching threshold, the raw tone number is removed According to.

When one section of primary voice data it fails to match number is more, illustrate the vocal print feature of this section of primary voice data Vocal print feature with other most of primary voice datas is unmatched.In this way, it may determine that going out this section of raw tone number What it is according to middle storage is the primary voice data for being not belonging to user.Therefore, a matching threshold can be preset, when one section When primary voice data it fails to match number is more than the matching threshold, the primary voice data is removed, ensure that data Accuracy, while data volume is reduced, also improve data-handling efficiency.

In a specific embodiment, corresponding primary voice data is identified to each original user and is filtered place Reason, further includes following specific steps：

Judge that same original user identifies whether corresponding primary voice data amount is greater than or equal to cluster threshold value；If same Original user identifies corresponding primary voice data amount and is greater than or equal to cluster threshold value, thens follow the steps S121-S124；If same Original user identifies corresponding primary voice data amount and is less than cluster threshold value, thens follow the steps S121 '-S123 '.

For clustering algorithm, the accuracy and data volume of clustering are proportionate.When data volume is little When, cluster accuracy decreases, and is handled using clustering algorithm in the case where data volume is little, and calculating can be increased Complexity.Therefore, a cluster threshold value can be set, the concrete numerical value of the cluster threshold value can be according to algorithm characteristic and actual demand Adjustment.Preferably, which is 10.It is more than or equal to cluster when same original user identifies corresponding primary voice data amount When threshold value, just the embodiment of step S121-S124 is used to be filtered processing to primary voice data.And when data volume is less than When clustering threshold value, then processing is filtered to primary voice data using step S121 '-S123 '.

In this embodiment, suitable Processing Algorithm is selected to carry out primary voice data by the size of data volume Filtration treatment improves the accuracy of data processing.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

Embodiment 2

Fig. 4 shows the original with the one-to-one speech database creating device of voice data base establishing method in embodiment 1 Manage block diagram.As shown in figure 4, the speech database creating device includes primary voice data acquisition module 11, data prediction mould Block 12, signal-to-noise ratio acquisition module 13 and speech database index establish module 14.Wherein, primary voice data acquisition module 11, Data preprocessing module 12, signal-to-noise ratio acquisition module 13 and speech database index establish the realization function and embodiment of module 14 The corresponding step of voice data base establishing method corresponds in 1, and to avoid repeating, the present embodiment is not described in detail one by one.

Primary voice data acquisition module 11, for obtaining primary voice data, primary voice data includes original user Mark and voice collecting time.

Data preprocessing module 12 obtains efficient voice data for being pre-processed to primary voice data.

Signal-to-noise ratio acquisition module 13, for obtaining the corresponding signal-to-noise ratio of efficient voice data.

Speech database index establishes module 14, for efficient voice data to be stored in speech database, and is language Efficient voice data in sound database establish index, and index includes original user mark, voice collecting time and signal-to-noise ratio.

Preferably, data preprocessing module 12 includes vocal print feature extraction unit 121, cluster analysis unit 122, distance meter Calculate unit 123, the first data removal unit 124.

Vocal print feature extraction unit 121 identifies the vocal print of corresponding primary voice data for extracting same original user Feature.

Same original user is identified corresponding primary voice data by cluster analysis unit 122 for being based on vocal print feature Clustering is carried out using k-means clustering algorithms, obtains target's center's point.

Metrics calculation unit 123 calculates same original user and identifies corresponding each original language for using distance algorithm Sound data are at a distance from target's center point.

First data removal unit 124 is identified for removing same original user in corresponding each primary voice data It is more than the primary voice data of distance threshold at a distance from target's center point.

Preferably, data preprocessing module 12 further includes data comparison and matching unit 122 ' and the second data removal unit 123’。

Data comparison and matching unit 122 ' are used for the corresponding vocal print of each primary voice data in same user identifier The corresponding vocal print feature of remaining primary voice data compare and match one by one in feature and same user identifier, according to With as a result, counting each primary voice data it fails to match number.

Second data removal unit 123 ', for being more than matching threshold in one section of primary voice data it fails to match number When, remove the primary voice data.

Preferably, data preprocessing module 12 further includes primary voice data amount judging unit 120.

Primary voice data amount judging unit 120, for judging that same original user identifies corresponding primary voice data Whether amount is greater than or equal to cluster threshold value.

Embodiment 3

Fig. 5 shows the flow chart of voiceprint registration method in the present embodiment.The voiceprint registration method is applied to be set in various terminals In standby and server, for carrying out voiceprint registration, to solve to take longer, vocal print feature accuracy not during voiceprint registration High problem.As shown in figure 5, the voiceprint registration method includes the following steps：

S21：Voiceprint registration request is obtained, voiceprint registration request includes registration user identifier and current time.

Wherein, voiceprint registration request refers to the request registered using vocal print feature that user proposes.Register user's mark Know the mark for identifying the user for proposing voiceprint registration request.In a specific embodiment, registration user identifier can be with It is subscriber phone number, user account or identification card number.Preferably, registration user identifier is corresponding with original user mark , for example, when original user is identified as phone number, user identifier is registered also as phone number.Current time refers to obtaining The current time of system when being asked to voiceprint registration.

S22：Based on registration user identifier voice inquirement database, the original user to match with registration user identifier is obtained Corresponding target index is identified, speech database is the voice data created using the voice data base establishing method of embodiment 1 Library.

Registration user identifier in being asked based on voiceprint registration, is inquired in speech database, and speech database It is the speech database created using the voice data base establishing method of embodiment 1.When the original user mark in an index When matching with registration user identifier, which is target index.Original user identifies to match with registration user identifier It is identical with registration user identifier to refer to original user mark.Specifically, by efficient voice data foundation in speech database Index is inquired, and inquiry includes the index of the original user mark to match with registration user identifier, obtains target index.

S23：The voice collecting time indexed according to current time, target and signal-to-noise ratio obtain each target index and correspond to Composite index.

Wherein, the voice collecting time general proxy recording time of voice, the sound of user can be with the migration of time There is small variation.The voice collecting time is closer from current time, then represents this section of efficient voice data and the current language of user Sound is closer, to which vocal print feature is also more identical.And efficient voice data can intuitively be judged by signal-to-noise ratio Noise level, signal-to-noise ratio is higher, then the noise of efficient voice data is smaller, can know the voice of efficient voice data accordingly Quality.

Based on current time, voice collecting time and signal-to-noise ratio are considered, it is corresponding can to obtain each target index Composite index.

S24：It chooses the highest target of composite index and indexes corresponding efficient voice data, as registration voice data.

It refers to vocal print feature and the most identical efficient voice data of user to register voice data.In target index, Target indexes the vocal print feature and use that corresponding composite index is higher, is obtained in the efficient voice data corresponding from target index Family is just more identical.Therefore, the highest target of composite index can be chosen and index corresponding efficient voice data, as note Volume voice data, improves the accuracy of registration vocal print.

In a specific embodiment, it according to current time, the voice collecting time of target index and signal-to-noise ratio, obtains Each target indexes corresponding composite index, specifically includes：The voice collecting time indexed according to current time, target and noise Than calculating each target using composite index calculation formula and indexing corresponding composite index.The composite index calculation formula is：

Composite index=a* signal-to-noise ratio+(1-a) * [1/ (current time-voice collecting time)]；

Wherein, a is default weight, and 0≤a≤1.

In efficient voice data, signal-to-noise ratio is higher, and noise signal is fewer in the efficient voice data.And when voice collecting Between it is closer from current time, then it is closer to represent this section of efficient voice data and the current voice of user, to which vocal print is special Sign is also more close.Therefore, the two factors are based on, further according to the demand of practical application scene, are equipped with for the two factors Default weight, the composite index of each efficient voice data is can be obtained by composite index calculation formula.It obtains each effective After the composite index of voice data, each efficient voice data can be weighed by this intuitive numerical value by composite index, To select target effective voice data the most suitable.

For example, it is 0.7 that default weight a, which can be arranged, composite index calculation formula is at this time：Composite index=0.7* noises Than+0.3* [1/ (current time-voice collecting time)].After getting any voiceprint registration request, noted according to the vocal print Registration user identifier inquiry in volume request obtains the efficient voice data being stored in speech database, and refers to according to the terminal The composite index of each efficient voice data of number calculation formula.

S25：Based on registration voice data, corresponding vocal print feature is obtained as registration vocal print.

After getting registration voice data, it is based on the registration voice data, corresponding vocal print feature is obtained, as registration Vocal print.

In a specific embodiment, the vocal print feature of efficient voice data can be extracted in advance, and this can be made to have The vocal print feature of effect phonetic feature is associated with the index in step S14, to be based on the index fast search to corresponding vocal print Feature.In the voiceprint registration stage, after obtaining registration voice data, so that it may corresponding to directly acquire the registration voice data Vocal print feature further reduces the time of voiceprint registration as registration vocal print.

In voiceprint registration method provided in an embodiment of the present invention, voiceprint registration request is obtained, to trigger voiceprint registration.Base again In registration user identifier voice inquirement database, target corresponding with the original user mark that registration user identifier matches is obtained Index, wherein speech database are the speech databases created using the voice data base establishing method of embodiment 1.According to current Time, the voice collecting time of target index and signal-to-noise ratio, obtain each target and index corresponding composite index, pass through target rope The composite index of corresponding efficient voice data can be obtained by drawing.It chooses the highest target of composite index and indexes corresponding effective language Sound data improve the accuracy of registration vocal print as registration voice data.After getting registration voice data, based on registration Voice data obtains corresponding vocal print feature as registration vocal print.The voiceprint registration method uses the voice data in embodiment 1 The speech database that base establishing method creates carries out voiceprint registration, improves the accurate of voiceprint registration stage vocal print feature extraction Property, the registion time for reducing voiceprint registration.Corresponding effective voice data is obtained during voiceprint registration based on target index Composite index ensure to get and coincide the most with user in favor of quickly navigating to suitable efficient voice data Vocal print feature further improves the accuracy of voiceprint registration.

In a specific embodiment, it is based on registration user identifier voice inquirement database, as shown in fig. 6, further including Following steps：

S221：If there is no the original users to match with registration user identifier to identify in speech database, language is sent Sound recording request.

In speech database, the efficient voice data that user identifier matches may be not present and be registered, are passed through at this time Voice recording request is sent, registration vocal print is obtained by the way of obtaining voice recording data in real time.Specifically, pass through registration User identifier is inquired in the index in speech database, if original there is no matching with registration user identifier in the index User identifier, then be not present and efficient voice data that the registration user identifier matches in the speech database, then sends language Sound recording request.

S222：It obtains voice recording and asks corresponding voice recording data.

After sending voice recording request, user can record according to its voice of prompt typing, recorded speech data recording After finishing, then the voice recording data are obtained.

S223：From the corresponding vocal print feature of voice recording extracting data as registration vocal print.

After obtaining the voice recording data that user records, make from the corresponding vocal print feature of the voice recording extracting data To register vocal print.Herein, vocal print feature refer in primary voice data characterize people essential characteristic, such as the profile of fundamental tone, altogether The frequency bandwidth at peak of shaking and track, spectrum envelop parameter, auditory properties parameter, linear prediction are washed one's face and rinsed one's mouth and its derive from parameter or mixing Parameter etc., extracting mode can refer to the step S121 in previous embodiment, and therefore not to repeat here.

In this embodiment, when there is no the corresponding efficient voice data of registration user identifier in speech database When, the case where registration vocal print is obtained by the way of real-time recording voice data, avoids the user that from can not registering, is occurred, and is improved The integrality and reasonability of voiceprint registration method.

Embodiment 4

Fig. 7 shows the functional block diagram with the one-to-one voiceprint registration device of voiceprint registration method in embodiment 3.Such as Fig. 7 Shown, which includes voiceprint registration acquisition request module 21, target indexes acquisition module 22, synthesis refers to Number acquisition module 23, registration voice data acquisition module 24 and registration vocal print acquisition module 25.Wherein, voiceprint registration acquisition request Module 21, target index acquisition module 22, composite index acquisition module 23, registration voice data acquisition module 24 and registration vocal print The realization function of acquisition module 25 step corresponding with voiceprint registration method in embodiment 3 corresponds, to avoid repeating, this reality Example is applied not to be described in detail one by one.

Voiceprint registration acquisition request module 21, for obtaining voiceprint registration request, voiceprint registration request includes registration user Mark and current time.

Target indexes acquisition module 22, for based on registration user identifier voice inquirement database, obtaining and registration user It identifies corresponding original user and identifies corresponding target index, speech database is the voice data described using embodiment 1 The speech database that base establishing method creates.

Composite index acquisition module 23, the voice collecting time for being indexed according to current time, target and signal-to-noise ratio, are obtained Each target is taken to index corresponding composite index.

Voice data acquisition module 24 is registered, corresponding efficient voice number is indexed for choosing the highest target of composite index According to as registration voice data.

Vocal print acquisition module 25 is registered, for based on registration voice data, obtaining corresponding vocal print feature as registration sound Line.

Preferably, target index acquisition module 22 further includes that voice recording request transmitting unit 221, voice recording data obtain Take unit 222 and registration voiceprint extraction unit 223.

Voice recording request transmitting unit 221, for there is no match with registration user identifier in speech database Original user mark, then send voice recording request.

Voice recording data capture unit 222 asks corresponding voice recording data for obtaining voice recording.

Voiceprint extraction unit 223 is registered, is used for from the corresponding vocal print feature of voice recording extracting data as registration sound Line.

Embodiment 5

The present embodiment provides a computer readable storage medium, computer journey is stored on the computer readable storage medium Sequence realizes voice data base establishing method in embodiment 1, or realizes embodiment 3 when the computer program is executed by processor Middle voiceprint registration method, to avoid repeating, which is not described herein again.Alternatively, being realized when the computer program is executed by processor real The function of each module/unit in speech database creating device in example 2 is applied, or is realized in embodiment 4 in voiceprint registration device The function of each module/unit, to avoid repeating, which is not described herein again.

Embodiment 6

Fig. 8 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in figure 8, the terminal of the embodiment is set Standby 80 include：Processor 81, memory 82 and it is stored in the computer journey that can be run in memory 82 and on processor 81 Sequence 83.The step of processor 81 realizes voice data base establishing method in above-described embodiment 1 when executing computer program 83, such as Step S11 to S14 shown in FIG. 1.Alternatively, processor 81 realizes each module/unit in embodiment 2 when executing computer program 83 Function, such as primary voice data acquisition module shown in Fig. 4 11, data preprocessing module 12,13 and of signal-to-noise ratio acquisition module Speech database indexes the function of establishing module 14.Alternatively, processor 81 realizes above-described embodiment 3 when executing computer program 83 The step of middle voiceprint registration method, such as step S21 to S25 shown in fig. 5.Alternatively, processor 81 executes computer program 83 The function of each module/unit in Shi Shixian embodiments 4, such as the module of voiceprint registration acquisition request shown in Fig. 7 21, target index obtain The function of modulus block 22, composite index acquisition module 23, registration voice data acquisition module 24 and registration vocal print acquisition module 25.

Illustratively, computer program 83 can be divided into one or more module/units, one or more mould Block/unit is stored in memory 82, and is executed by processor 81, to complete the present invention.One or more module/units can To be the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing computer program 83 at end Implementation procedure in end equipment 80.For example, computer program 83, which can be divided into primary voice data shown in Fig. 4, obtains mould Block 11, data preprocessing module 12, signal-to-noise ratio acquisition module 13 and speech database index establish module 14, each specific work(of module It can not one by one be repeated herein as described in Example 2.Alternatively, computer program 83 can be divided into vocal print note shown in fig. 6 Volume acquisition request module 21, target index acquisition module 22, composite index acquisition module 23, registration voice data acquisition module 24 With registration vocal print acquisition module 25, each module concrete function is for example as described in Example 4, does not repeat one by one herein.

Terminal device 80 can be the computing devices such as desktop PC, notebook, palm PC and cloud server.Eventually End equipment may include, but be not limited only to, processor 81, memory 82.It will be understood by those skilled in the art that Fig. 8 is only eventually The example of end equipment 80 does not constitute the restriction to terminal device 80, may include components more more or fewer than diagram, or Combine certain components or different components, for example, terminal device can also include input-output equipment, network access equipment, Bus etc..

Alleged processor 81 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.

Memory 82 can be the internal storage unit of terminal device 80, such as the hard disk or memory of terminal device 80.It deposits Reservoir 82 can also be the plug-in type hard disk being equipped on the External memory equipment of terminal device 80, such as terminal device 80, intelligence Storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) Deng.Further, memory 82 can also both include terminal device 80 internal storage unit and also including External memory equipment.It deposits Reservoir 82 is used to store other programs and the data needed for computer program and terminal device.Memory 82 can be also used for temporarily When store the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completion The all or part of function of description.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of flow in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium May include：Any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic of the computer program code can be carried Dish, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to legislation in jurisdiction and the requirement of patent practice Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and Telecommunication signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although with reference to aforementioned reality Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that：It still can be to aforementioned each Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features；And these are changed Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of voice data base establishing method, which is characterized in that including：

The primary voice data is pre-processed, efficient voice data are obtained；

Obtain the corresponding signal-to-noise ratio of the efficient voice data；

The efficient voice data are stored in speech database, and are the efficient voice number in the speech database It is indexed according to establishing, the index includes original user mark, voice collecting time and signal-to-noise ratio.

2. voice data base establishing method as described in claim 1, which is characterized in that it is described to the primary voice data into Row pretreatment, obtains efficient voice data, specifically includes：

Corresponding primary voice data is identified to each original user and is filtered processing and mute removal processing, obtains effective language Sound data.

3. voice data base establishing method as claimed in claim 2, which is characterized in that described to each original user mark pair The primary voice data answered is filtered processing, specifically includes：

Extract the vocal print feature that same original user identifies corresponding primary voice data；

Based on the vocal print feature, same original user is identified into corresponding primary voice data and uses k-means clustering algorithms Clustering is carried out, target's center's point is obtained；

Using distance algorithm, calculates same original user and identify corresponding each primary voice data and target's center's point Distance；

Remove same original user identify in corresponding each primary voice data be more than at a distance from target's center's point away from Primary voice data from threshold value.

4. a kind of voiceprint registration method, which is characterized in that including：

Based on the registration user identifier voice inquirement database, the original user to match with the registration user identifier is obtained Corresponding target index is identified, the speech database is created using claim 1-3 any one of them speech databases The speech database that method creates；

The voice collecting time indexed according to the current time, the target and signal-to-noise ratio obtain each target index Corresponding composite index；

5. voiceprint registration method as claimed in claim 4, which is characterized in that described according to the current time, the target The voice collecting time of index and signal-to-noise ratio obtain each target and index corresponding composite index, specifically include：

The voice collecting time indexed according to the current time, the target and signal-to-noise ratio, using composite index calculation formula It calculates each target and indexes corresponding composite index；

The composite index calculation formula is：

Wherein, a is default weight, and 0≤a≤1.

6. voiceprint registration method as claimed in claim 4, which is characterized in that described to inquire language based on the registration user identifier Sound database further includes：

If there is no the original users to match with the registration user identifier to identify in the speech database, voice is sent Recording request；

It obtains the voice recording and asks corresponding voice recording data；

From the corresponding vocal print feature of the voice recording extracting data as registration vocal print.

7. a kind of speech database creating device, which is characterized in that including：

Primary voice data acquisition module, for obtaining primary voice data, the primary voice data includes original user mark Know and the voice collecting time；

Speech database index establishes module, for the efficient voice data to be stored in speech database, and is described The efficient voice data in speech database establish index, and the index includes original user mark, voice collecting time And signal-to-noise ratio.

8. a kind of voiceprint registration device, which is characterized in that including：

Voiceprint registration acquisition request module, for obtaining voiceprint registration request, the voiceprint registration request includes registration user's mark Knowledge and current time；

Target indexes acquisition module, for being based on the registration user identifier voice inquirement database, obtains and is used with the registration Family identifies corresponding original user and identifies corresponding target index, and the speech database is any using claim 1-3 The speech database that the item voice data base establishing method creates；

Composite index acquisition module, the voice collecting time for being indexed according to the current time, the target and signal-to-noise ratio, It obtains each target and indexes corresponding composite index；

Voice data acquisition module is registered, corresponding efficient voice data is indexed for choosing the highest target of composite index, makees To register voice data；

9. a kind of terminal device, including memory, processor and it is stored in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claims 1 to 3 when executing the computer program The step of any one of them voice data base establishing method；Alternatively, the processor is realized when executing the computer program Such as the step of claim 4 to 6 any one of them voiceprint registration method.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist In the computer program realizes speech database establishment side as described in any one of claims 1 to 3 when being executed by processor The step of method；Alternatively, realizing such as claim 4 to 6 any one of them vocal print when the computer program is executed by processor The step of register method.