CN116259335A

CN116259335A - Speech recognition method, apparatus, computer device and storage medium for biometric authentication

Info

Publication number: CN116259335A
Application number: CN202310303316.7A
Authority: CN
Inventors: 王心月; 宁博; 黎明欣
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-03-27
Filing date: 2023-03-27
Publication date: 2023-06-13

Abstract

The application relates to a voice recognition method, a voice recognition device, a voice recognition computer device, a voice recognition storage medium and a voice recognition computer program product for biological verification, and relates to the technical field of biological recognition. The method comprises the following steps: after dividing the voice to be recognized provided by a user during biological verification into a plurality of voice segments to be recognized, determining a first voice recognition model matched with the language of the first voice segment in a voice recognition model library; sequentially carrying out voice recognition on a plurality of voice segments to be recognized by using a first voice recognition model; when the first voice recognition model fails to recognize a voice segment, determining a second voice recognition model matched with the language of the voice segment in the voice recognition model library; if the second speech recognition model is different from the first speech recognition model, the unrecognized speech segment is subjected to speech recognition by using all the speech recognition models in the speech recognition model library. The method can meet the voice recognition requirement during multilingual biological verification.

Description

Speech recognition method, apparatus, computer device and storage medium for biometric authentication

Technical Field

The present application relates to the field of biometric technology, and in particular, to a method, apparatus, computer device, storage medium, and computer program product for biometric voice recognition.

Background

As one of the biometric authentication technologies, voice recognition has an important auxiliary role in resource risk prevention and control in the financial field. The artificial intelligence voice recognition technology is widely applied to the scenes such as logging in and paying out of mobile banking, and adopts a text correlation mode (text correlation refers to the meaning and content of a speaker instead of the identity of the speaker), such as 8-bit random dynamic number strings, and after complete voice is acquired, each number is matched bit by bit for biological verification.

However, speech recognition models based on e.g. 8-bit random dynamic numbers provided by the prior art typically only support speech recognition in a single language, resulting in limited use scenarios for speech recognition.

Disclosure of Invention

Based on this, it is necessary to provide a voice recognition method, apparatus, computer device, computer readable storage medium and computer program product for biometric authentication in view of the above technical problems.

In a first aspect, the present application provides a method of voice recognition for biometric authentication. The method comprises the following steps:

dividing the voice to be recognized provided by the user during biological verification into a plurality of voice segments to be recognized;

determining a first voice recognition model matched with the language of the first voice segment in the voice recognition model library; the voice recognition model library comprises a plurality of voice recognition models which are respectively applicable to different languages;

Sequentially carrying out voice recognition on the plurality of voice segments to be recognized by utilizing the first voice recognition model;

when the first voice recognition model fails to recognize a voice segment, determining a second voice recognition model matched with the language of the voice segment in the voice recognition model library;

if the second voice recognition model is the same as the first voice recognition model, continuing to perform voice recognition on the unrecognized voice section by using the first voice recognition model;

and if the second voice recognition model is different from the first voice recognition model, performing voice recognition on the unrecognized voice section by using all voice recognition models in the voice recognition model library.

In one embodiment, the method further comprises:

and when the first voice recognition model successfully recognizes the current voice segment, continuing to use the first voice recognition model to perform voice recognition on the next voice segment.

In one embodiment, the method further comprises:

when the first voice recognition model fails to recognize a voice section, if the voice recognition model matched with the language of the voice section is not successfully obtained from the voice recognition model library, the voice recognition of the voice to be recognized is ended.

In one embodiment, the splitting the speech to be recognized provided by the user during biometric authentication into a plurality of speech segments to be recognized includes:

acquiring a biological verification text corresponding to the biological verification;

determining the segmentation number of the voice segments acting on the voice to be recognized according to the biological verification text;

and dividing the voice to be recognized into a plurality of voice segments to be recognized according to the voice segment dividing number.

In one embodiment, the segmenting the speech to be recognized into a plurality of speech segments to be recognized according to the number of speech segments, includes:

converting the voice to be recognized into a corresponding voice digital signal;

and segmenting the voice digital signal according to the segmentation number of the voice segments to obtain a plurality of voice segments to be recognized.

In one embodiment, the method further comprises:

acquiring a voice training set corresponding to a target language from a pre-established voice library; the target language is a language matched with a speech recognition model to be trained; the voice library comprises voice training sets corresponding to different languages respectively;

extracting each text required for biological verification from the voice training set to obtain a voice training set corresponding to each text;

And training the speech recognition model to be trained according to the speech training set corresponding to each text.

In one embodiment, the method further comprises:

collecting voices generated when different speakers respectively read texts required by the biological verification by using different languages;

and constructing the voice library according to voices generated when different speakers read texts required by the biological verification respectively by using different languages.

In a second aspect, the present application also provides a voice recognition apparatus for biometric authentication. The device comprises:

the voice segmentation module is used for segmenting voice to be recognized provided by a user during biological verification into a plurality of voice segments to be recognized;

the model matching module is used for determining a first voice recognition model matched with the language of the first voice section in the plurality of voice sections to be recognized in the voice recognition model library; the voice recognition model library comprises a plurality of voice recognition models which are respectively applicable to different languages;

the voice recognition module is used for sequentially carrying out voice recognition on the plurality of voice segments to be recognized by utilizing the first voice recognition model;

the first recognition processing module is used for determining a second voice recognition model matched with the language of the voice section in the voice recognition model library when the first voice recognition model fails to recognize the voice section;

The second recognition processing module is used for continuing to perform voice recognition on the unrecognized voice section by using the first voice recognition model if the second voice recognition model is the same as the first voice recognition model;

and the third recognition processing module is used for performing voice recognition on the unrecognized voice section by using all voice recognition models in the voice recognition model library if the second voice recognition model is different from the first voice recognition model.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The above-mentioned biological verification voice recognition method, device, computer equipment, storage medium and computer program product, cut the voice to be recognized into multiple voice segments, the recognition result of each voice segment provides the reference for the recognition of the subsequent voice segment; determining a first voice recognition model matched with the languages of the first voice section, and sequentially carrying out voice recognition on a plurality of voice sections by using the first voice recognition model, so that the language matching cost of the voice sections with the same language is reduced; when the first voice recognition model fails to recognize a voice segment, determining a second voice recognition model matched with the language of the voice segment; if the second speech recognition model is the same as the first speech recognition model, continuing to use the first speech recognition model to complete speech recognition. If the second speech recognition model is different from the first speech recognition model, the speech recognition model of all languages is used for carrying out speech recognition on the speech segment which is recognized. By the voice recognition method for biological verification, not only can the voice recognition of the user for biological verification by using multiple languages be realized, but also the voice recognition efficiency of the user for biological verification by using single language can be improved, and the voice recognition of biological verification under multiple scenes is satisfied.

Drawings

FIG. 1 is a diagram of an application environment for a voice recognition method of biometric authentication in one embodiment;

FIG. 2 is a flow diagram of a method of voice recognition for biometric authentication in one embodiment;

FIG. 3 is a flow chart of a speech recognition model training step in one embodiment;

FIG. 4 is a schematic diagram of a complete flow of a voice recognition method for biometric authentication in another embodiment;

FIG. 5 is a block diagram of a voice recognition device for biometric authentication in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In the prior art, a voice recognition model is used, but is usually a single-language voice recognition model, and a voice recognition model supporting Mandarin is taken as an example, so that a user can be correctly recognized only when using Mandarin, which is inconvenient for users in non-Mandarin areas or foreign language users, and the technology has the technical problem of limited use prospects.

Aiming at the technical problem, the application provides a voice recognition method for biological verification, which not only can realize the voice recognition of the user for biological verification by using multiple languages, but also can improve the voice recognition efficiency of the user for biological verification by using single language, thereby meeting the voice recognition of biological verification in multiple scenes. As shown in fig. 1, this method can be applied in the application environment as shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 obtains the voice to be recognized provided by the user during the biometric authentication, the server 104 obtains the voice to be recognized from the terminal 102, performs segmentation processing on the voice to be recognized, and then performs voice recognition processing on a plurality of voice segments to be recognized depending on a voice recognition model library in the data storage system. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

The following describes the voice recognition method of the biometric authentication of the present application in detail through the embodiments and the corresponding drawings.

In one embodiment, as shown in fig. 2, a voice recognition method of biometric authentication is provided, which may be applied to the server 104 as in fig. 1, and which may include the steps of:

step S201, the voice to be recognized provided by the user during the biological verification is segmented into a plurality of voice segments to be recognized.

When the user performs the biometric authentication, a randomly generated text is displayed, the user reads out the content of the text, and the voice read out by the user is used as the voice to be recognized to complete the biometric authentication.

Illustratively, the voice to be recognized read by the user during the biometric authentication is obtained, and the voice to be recognized is preprocessed before being recognized: and detecting the end point of the voice to be recognized based on the short-time energy of the voice, and carrying out segmentation processing on the voice to be recognized according to the detected end point to obtain a plurality of voice segments to be recognized.

Step S202, determining a first voice recognition model matched with the language of the first voice segment in a plurality of voice segments to be recognized in a voice recognition model library; the speech recognition model library includes a plurality of speech recognition models each adapted for a different language.

The first voice segment is a first time sequence voice segment in a plurality of voice segments to be recognized.

The languages may include Mandarin, english, and local dialects (e.g., yue-Cao), among others.

The method includes the steps of obtaining a first voice segment in a plurality of voices to be recognized, and performing voice feature extraction processing on the first voice segment to obtain corresponding feature information. According to the characteristic information, all voice recognition models in a voice recognition model library can be adopted to determine recognition results of the first voice segment in different languages, wherein the recognition results comprise text content of voice and corresponding recognition probability. And determining a target recognition result with highest recognition probability in the recognition results of the first voice segment, determining the language corresponding to the target recognition result as the language of the first voice segment, and determining the voice recognition model corresponding to the target recognition result as the first voice recognition model. Further, a probability threshold may be preset, and if the recognition probability in the target recognition result is lower than the probability threshold, it may be that the language used by the user does not have a corresponding speech recognition model in the speech recognition library, the speech recognition is determined to fail.

Step S203, sequentially performing voice recognition on the plurality of voice segments by using the first voice recognition model.

The plurality of voice segments to be recognized obtained by segmentation are subjected to voice recognition in sequence in time sequence by using a first voice recognition model.

In step S204, when the first speech recognition model fails to recognize a speech segment, a second speech recognition model matching the language of the speech segment in the speech recognition model library is determined.

In an exemplary embodiment, when a plurality of speech segments to be recognized are sequentially speech-recognized, if the first speech recognition model fails to recognize a certain speech segment to be recognized, a second speech recognition model matching the language of the speech segment to be recognized is determined from the speech recognition model library.

In step S205, if the second speech recognition model is the same as the first speech recognition model, the first speech recognition model is used to continue speech recognition on the unrecognized speech segment.

Illustratively, when the second speech recognition model is the same as the first speech recognition model, continuing to use the first speech recognition model to sequentially perform speech recognition on the unrecognized speech segments; if the recognition failure occurs again, step S204 is executed again.

In step S206, if the second speech recognition model is different from the first speech recognition model, the unrecognized speech segment is subjected to speech recognition by using all the speech recognition models in the speech recognition model library.

When the second speech recognition model is different from the first speech recognition model, the user is considered to perform biological verification by using multiple languages, and the unrecognized speech segments are simultaneously subjected to speech recognition by using all the speech recognition models in the speech recognition model library, so as to obtain recognition results of each speech segment in multiple languages, and the speech recognition results of each speech segment are determined according to the recognition results in multiple languages.

In the voice recognition method for biological verification, the voice to be recognized is divided into a plurality of voice segments, and the recognition result of each voice segment provides reference for the recognition of the subsequent voice segment; determining a first voice recognition model matched with the languages of the first voice section, and sequentially carrying out voice recognition on a plurality of voice sections by using the first voice recognition model, so that the language matching cost of the voice sections with the same language is reduced; when the first voice recognition model fails to recognize a voice segment, determining a second voice recognition model matched with the language of the voice segment; if the second speech recognition model is the same as the first speech recognition model, continuing to use the first speech recognition model to complete speech recognition. If the second speech recognition model is different from the first speech recognition model, the speech recognition model of all languages is used for carrying out speech recognition on the speech segment which is recognized. By the voice recognition method for biological verification, not only can the voice recognition of the user for biological verification by using multiple languages be realized, but also the voice recognition efficiency of the user for biological verification by using single language can be improved, and the voice recognition of biological verification under multiple scenes is satisfied.

In one embodiment, the voice recognition method for biometric authentication further includes the steps of:

and when the first voice recognition model is successful in recognizing the current voice segment, continuing to use the first voice recognition model to perform voice recognition on the next voice segment.

The current voice segment is a voice segment which is being recognized when a plurality of voice segments to be recognized are sequentially subjected to voice recognition by using the first voice recognition model.

Illustratively, when the first speech recognition model sequentially performs speech recognition on a plurality of speech segments to be recognized, and the current speech segment can be successfully recognized, the first speech recognition model is still used when the next speech segment is recognized, without the need for reselecting the speech recognition model.

In this embodiment, when a user uses a language to provide a voice to be recognized, after the language of the first voice segment is matched, other voice segments can directly use the same voice recognition model, so that the voice recognition efficiency is effectively improved.

when the first speech recognition model fails to recognize a speech segment, if the speech recognition model matched with the language of the speech segment is not successfully obtained from the speech recognition model library, ending the speech recognition of the speech to be recognized.

In an exemplary embodiment, when the first speech recognition model sequentially performs speech recognition on a plurality of speech segments to be recognized, the current speech segment fails to be recognized, and according to the current speech segment, a speech recognition model matching with the language of the current speech segment cannot be obtained from the speech recognition model library, the speech recognition of the speech to be recognized provided by the user is ended, and the speech recognition failure and the biometric verification failure are determined.

In one embodiment, the step S101 of segmenting the speech to be recognized provided by the user during biometric authentication into a plurality of speech segments to be recognized may be implemented by the following steps:

acquiring a biological verification text corresponding to biological verification; determining the segmentation number of the voice segments acting on the voice to be recognized according to the biological verification text; and dividing the voice to be recognized into a plurality of voice segments to be recognized according to the voice segment dividing number.

Illustratively, the user at the time of biometric authentication, the computer will display randomly generated text, i.e., biometric text. And determining the segmentation number of the voice segments acting on the voice to be recognized according to the number of characters in the biological verification text. For example, if the biometric text is "0154", the number of cuts is 4; the biometric authentication text is a "voice recognition method", and the number of cuts is 6. After determining the segmentation number, segmenting the voice to be recognized into a corresponding number of voice segments to be recognized.

Based on the above embodiment, further, the above splitting the speech to be recognized into a plurality of speech segments to be recognized according to the number of speech segments may be further implemented by the following steps:

converting the voice to be recognized into a corresponding voice digital signal; and segmenting the voice digital signal according to the segmentation number of the voice segments to obtain a plurality of voice segments to be recognized.

For example, the voice to be recognized provided by the user may be an analog signal in a time domain, and the analog signal of the voice to be recognized needs to be converted to obtain a digital signal of the voice to be recognized. Based on the digital signal of the voice to be recognized, extracting the characteristic information of the voice to be recognized, and segmenting the voice to be recognized according to the characteristic information and the segmentation number to obtain a plurality of voice segments to be recognized.

In this embodiment, according to the biometric authentication text voice digital signal, the voice to be recognized is segmented, so as to obtain a plurality of voice segments to be recognized. The voice to be recognized provided by the user is converted into the voice recognition of the isolated word, the context relation in the voice section to be recognized does not need to be concerned, and the voice recognition efficiency is improved. Meanwhile, language reference can be provided for the subsequent voice section based on the language recognition result of the previous voice section, so that the recognition efficiency of providing the voice to be recognized by the same language is improved.

In one embodiment, as shown in fig. 3, the above-mentioned voice recognition method for biometric authentication further includes the following steps for training a model:

step S301, a voice training set corresponding to a target language is obtained from a pre-established voice library; the target language is the language matched with the speech recognition model to be trained; the voice library comprises voice training sets respectively corresponding to different languages;

step S302, extracting each text required for biological verification from a voice training set to obtain a voice training set corresponding to each text;

step S303, training the speech recognition model to be trained according to the speech training set corresponding to each text.

Illustratively, the language of the speech recognition model to be trained is determined, and a speech training set of the target language is selected from a pre-established speech library. The speech training set includes speech data and text data corresponding to words expressed by the speech data, for example, text corresponding to "one" speech read out in english is "1", and text corresponding to "one" speech read out in chinese is also "1". And screening the voice training set according to the text data in the voice training set and the requirements of the biological verification on the biological verification text. And taking the required voice training set as a training set of the voice recognition model to be trained, and training the voice recognition model to be trained.

For example, a speech training set of languages such as mandarin chinese, english, and cantonese are stored in the speech library. Current biometric authentication is speech recognition of digital words, i.e., the random text displayed by the computer at the time of biometric authentication is a string of numbers, such as "0146", "456413", and "554546", etc. Screening a voice training set for reading digits by Mandarin from a voice library according to a Mandarin voice recognition model to be trained currently, and taking the voice training set as the training set of the Mandarin voice recognition model to be trained; similarly, other language speech recognition models are trained by the above steps.

collecting voices generated when different speakers read texts required by biological verification respectively by using different languages; and constructing a voice library according to voices generated when different speakers read texts required by biological verification respectively by using different languages.

In an exemplary embodiment, all possible biometric texts during biometric verification are determined, voices generated when different speakers read out the biometric texts respectively in different languages are collected, mapping relations are established between the voices and corresponding languages and between the voices and text data, and finally a voice library is constructed according to the voices and the mapping relations.

In this embodiment, a voice library is constructed by collecting voices of different speakers, and the voice library is used as a training set source of a subsequent voice recognition model, so that the trained voice recognition model ignores voiceprint information about the speakers in the voices, only focuses on characteristics of the voices about texts, and realizes text-related voice recognition.

In another embodiment, as shown in fig. 4, there is provided a voice recognition method of biometric authentication, comprising the steps of:

step S401, acquiring a biological verification text corresponding to biological verification, and determining the number of voice segment cuts acting on the voice to be recognized according to the biological verification text.

Step S402, the voice to be recognized provided by the user during the biological verification is converted into a corresponding voice digital signal.

Step S403, the voice digital signal is segmented according to the segmentation number of the voice segments, and a plurality of voice segments to be recognized are obtained.

In step S404, a first speech recognition model in the speech recognition model library, which matches the language of the first speech segment of the plurality of speech segments to be recognized, is determined.

Step S405, performing speech recognition on the plurality of speech segments to be recognized sequentially by using the first speech recognition model.

In step S406, when the first speech recognition model is successful in recognizing the current speech segment, the first speech recognition model is continuously used for recognizing the next speech segment.

In step S407, when the first speech recognition model fails to recognize a speech segment, a second speech recognition model matching the language of the speech segment in the speech recognition model library is determined.

In step S408, if the second speech recognition model is the same as the first speech recognition model, the first speech recognition model is used to continue speech recognition on the unrecognized speech segment.

In step S409, if the second speech recognition model is different from the first speech recognition model, the unrecognized speech segment is subjected to speech recognition using all the speech recognition models in the speech recognition model library.

Step S410, if the speech recognition model matching the language of the speech segment is not successfully obtained from the speech recognition model library, ending the speech recognition of the speech to be recognized.

In the embodiment, the voice recognition of the voice to be recognized is converted into the voice recognition of the isolated words by dividing the voice to be recognized provided by the user into a plurality of voice segments to be recognized, so that the recognition efficiency is improved. And then determining a first voice recognition model according to the language matching result of the first voice section to be recognized, if the languages of the other voice sections to be recognized are the languages matched with the first voice section, completing voice recognition of all the voice sections to be recognized according to the first voice recognition model, reducing the cost of language matching and improving the efficiency of voice recognition. If the number of languages of the voice section to be recognized is two or more, voice recognition can be realized according to the method, so that the biological verification voice recognition requirement under a multilingual scene is met; meanwhile, after the first voice recognition model fails to recognize, all the voice recognition models are directly adopted to recognize the voice of the subsequent voice to be recognized, so that the language matching cost is reduced to a certain extent.

To facilitate understanding of the embodiments of the present application by those skilled in the art, the following description will be given of a specific example of a biometric voice recognition method, followed by a training and application thereof:

in the training process of a speech recognition model, namely a mixed model of an HMM (Hidden Markov Model ) and a GMM (Gaussian mixture model), firstly, the mean value and the variance in the GMM model are initially adjusted based on a K-means algorithm, and then the parameters of the whole speech recognition model are adjusted according to an EM algorithm (Expectation Maximization algorithm, maximum expectation algorithm) to obtain the trained speech recognition model. When a user logs in or initiates a transaction in a mobile phone bank, the mobile phone terminal displays a randomly generated biological verification text, such as 514623, the user provides voice to be recognized based on the biological verification text, after the mobile phone terminal acquires a continuous analog signal of the voice to be recognized in a time domain at a sampling rate of 8KHz, the background server determines a discrete analog signal based on shannon sampling theorem, and quantitatively converts the discrete analog signal into a digital signal in the frequency domain at 16 bits. The energy of the voice starting point and the end point of different numbers is concentrated in a small range, and the voice to be recognized is segmented by using an end point detection method based on short-time energy, so that 6 voice segments to be recognized are obtained. The method comprises the steps of framing by utilizing the short-time stationarity of a voice signal, extracting a typical voice characteristic MFCC (Mel Frequency Cepstrum Coefficient ) as voice characteristic information, inputting the voice characteristic information into a voice recognition model, obtaining the probability that a voice segment to be recognized belongs to a certain number according to a Viterbi algorithm (Viterbi algorithm), and taking the number with the highest probability as a recognition result of the voice segment to be recognized.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiments of the present application also provide a voice recognition apparatus for performing biometric authentication of the above-mentioned related biometric authentication voice recognition method. The implementation of the solution provided by the device is similar to that described in the above method, so specific limitations in the embodiments of the voice recognition device for biometric authentication provided below can be found in the above limitations of the voice recognition method for biometric authentication, and will not be repeated here.

In one embodiment, as shown in fig. 5, there is provided a voice recognition apparatus for biometric authentication, comprising: a speech segmentation module 501, a model matching module 502, a speech recognition module 503, a first recognition processing module 504, a second recognition processing module 505, and a third recognition processing module 506, wherein:

the voice segmentation module 501 is configured to segment a voice to be recognized provided by a user during biometric authentication into a plurality of voice segments to be recognized.

The model matching module 502 is configured to determine a first speech recognition model in the speech recognition model library, where the first speech recognition model matches a language of a first speech segment of the plurality of speech segments to be recognized; the speech recognition model library includes a plurality of speech recognition models each adapted for a different language.

The voice recognition module 503 is configured to sequentially perform voice recognition on the plurality of voice segments to be recognized by using the first voice recognition model.

The first recognition processing module 504 is configured to determine, in a case where the first speech recognition model fails to recognize a speech segment, a second speech recognition model in the speech recognition model library that matches the language of the speech segment.

The second recognition processing module 505 is configured to, if the second speech recognition model is the same as the first speech recognition model, continue speech recognition on the unrecognized speech segment using the first speech recognition model.

And a third recognition processing module 506, configured to perform speech recognition on the unrecognized speech segment using all the speech recognition models in the speech recognition model library when the second speech recognition model is different from the first speech recognition model.

In one embodiment, the above-mentioned voice recognition module 503 is further configured to, in case that the first voice recognition model is successful in recognizing the current voice segment, continue to use the first voice recognition model to perform voice recognition on the next voice segment.

In one embodiment, the first recognition processing module 504 is further configured to, when the first speech recognition model fails to recognize a speech segment, end the speech recognition of the speech to be recognized if the speech recognition model matching the language of the speech segment is not successfully obtained from the speech recognition model library.

In one embodiment, the voice segmentation module 501 is further configured to obtain a biometric text corresponding to biometric authentication; determining the segmentation number of the voice segments acting on the voice to be recognized according to the biological verification text; and dividing the voice to be recognized into a plurality of voice segments to be recognized according to the voice segment dividing number.

In one embodiment, the above-mentioned voice segmentation module 501 is further configured to convert the voice to be recognized into a corresponding voice digital signal; and segmenting the voice digital signal according to the segmentation number of the voice segments to obtain a plurality of voice segments to be recognized.

In one embodiment, the voice recognition device for biological verification further includes a model training module, configured to obtain a voice training set corresponding to a target language from a pre-established voice library; the target language is the language matched with the speech recognition model to be trained; the voice library comprises voice training sets respectively corresponding to different languages; extracting each text from the voice training set according to each text required for biological verification to obtain a voice training set corresponding to each text; and training the speech recognition model to be trained according to the speech training set corresponding to each text.

In one embodiment, the voice recognition device for biometric authentication further includes a voice library building module for collecting voices generated when different speakers read texts required for biometric authentication respectively using different languages; and constructing a voice library according to voices generated when different speakers read texts required by biological verification respectively by using different languages.

The respective modules in the above-described biometric voice recognition apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store speech recognition model data for a plurality of languages. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a voice recognition method of biometric authentication.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

determining a first voice recognition model matched with the language of the first voice segment in the plurality of voice segments to be recognized in the voice recognition model library; the voice recognition model library comprises a plurality of voice recognition models which are respectively applicable to different languages;

sequentially carrying out voice recognition on a plurality of voice segments to be recognized by using a first voice recognition model;

if the second speech recognition model is different from the first speech recognition model, the unrecognized speech segment is subjected to speech recognition by using all the speech recognition models in the speech recognition model library.

In one embodiment, the processor when executing the computer program further performs the steps of: and when the first voice recognition model is successful in recognizing the current voice segment, continuing to use the first voice recognition model to perform voice recognition on the next voice segment.

In one embodiment, the processor when executing the computer program further performs the steps of: when the first speech recognition model fails to recognize a speech segment, if the speech recognition model matched with the language of the speech segment is not successfully obtained from the speech recognition model library, ending the speech recognition of the speech to be recognized.

In one embodiment, the processor when executing the computer program further performs the steps of: acquiring a biological verification text corresponding to biological verification; determining the segmentation number of the voice segments acting on the voice to be recognized according to the biological verification text; and dividing the voice to be recognized into a plurality of voice segments to be recognized according to the voice segment dividing number.

In one embodiment, the processor when executing the computer program further performs the steps of: converting the voice to be recognized into a corresponding voice digital signal; and segmenting the voice digital signal according to the segmentation number of the voice segments to obtain a plurality of voice segments to be recognized.

In one embodiment, the processor when executing the computer program further performs the steps of: acquiring a voice training set corresponding to a target language from a pre-established voice library; the target language is the language matched with the speech recognition model to be trained; the voice library comprises voice training sets respectively corresponding to different languages; extracting each text from the voice training set according to each text required for biological verification to obtain a voice training set corresponding to each text; and training the speech recognition model to be trained according to the speech training set corresponding to each text.

In one embodiment, the processor when executing the computer program further performs the steps of: collecting voices generated when different speakers read texts required by biological verification respectively by using different languages; and constructing a voice library according to voices generated when different speakers read texts required by biological verification respectively by using different languages.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: and when the first voice recognition model is successful in recognizing the current voice segment, continuing to use the first voice recognition model to perform voice recognition on the next voice segment.

In one embodiment, the computer program when executed by the processor further performs the steps of: when the first speech recognition model fails to recognize a speech segment, if the speech recognition model matched with the language of the speech segment is not successfully obtained from the speech recognition model library, ending the speech recognition of the speech to be recognized.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a biological verification text corresponding to biological verification; determining the segmentation number of the voice segments acting on the voice to be recognized according to the biological verification text; and dividing the voice to be recognized into a plurality of voice segments to be recognized according to the voice segment dividing number.

In one embodiment, the computer program when executed by the processor further performs the steps of: converting the voice to be recognized into a corresponding voice digital signal; and segmenting the voice digital signal according to the segmentation number of the voice segments to obtain a plurality of voice segments to be recognized.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a voice training set corresponding to a target language from a pre-established voice library; the target language is the language matched with the speech recognition model to be trained; the voice library comprises voice training sets respectively corresponding to different languages; extracting each text from the voice training set according to each text required for biological verification to obtain a voice training set corresponding to each text; and training the speech recognition model to be trained according to the speech training set corresponding to each text.

In one embodiment, the computer program when executed by the processor further performs the steps of: collecting voices generated when different speakers read texts required by biological verification respectively by using different languages; and constructing a voice library according to voices generated when different speakers read texts required by biological verification respectively by using different languages.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A method of biometric voice recognition, the method comprising:

2. The method according to claim 1, wherein the method further comprises:

3. The method according to claim 1, wherein the method further comprises:

4. The method according to claim 1, wherein the segmenting the speech to be recognized provided by the user at the time of biometric authentication into a plurality of speech segments to be recognized comprises:

5. The method of claim 4, wherein the slicing the speech to be recognized into a plurality of speech segments to be recognized according to the number of speech segment slices, comprises:

6. The method according to claim 1, wherein the method further comprises:

7. The method of claim 6, wherein the method further comprises:

8. A biometric voice recognition device, the device comprising:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

11. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.